idnits 2.17.1 draft-ietf-clue-framework-21.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 126 has weird spacing: '...certain compa...' == Line 1983 has weird spacing: '...om left bot...' -- The document date (March 3, 2015) is 3340 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC4566' is mentioned on line 1605, but not defined

  ** Obsolete undefined reference: RFC 4566 (Obsoleted by RFC 8866)

  == Missing Reference: 'RFC6351' is mentioned on line 872, but not defined

  == Missing Reference: 'RFC6350' is mentioned on line 883, but not defined

  == Missing Reference: 'RFC 6503' is mentioned on line 3003, but not defined

  == Missing Reference: 'RFC 3261' is mentioned on line 3025, but not defined

  == Unused Reference: 'RFC4579' is defined on line 3458, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-18) exists of
     draft-ietf-clue-datachannel-05

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-clue-datachannel (ref. 'I-D.ietf-clue-datachannel')

  == Outdated reference: A later version (-17) exists of
     draft-ietf-clue-data-model-schema-07

  == Outdated reference: A later version (-19) exists of
     draft-ietf-clue-protocol-02

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-clue-protocol (ref. 'I-D.ietf-clue-protocol')

  == Outdated reference: A later version (-15) exists of
     draft-ietf-clue-signaling-04

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-clue-signaling (ref. 'I-D.ietf-clue-signaling')

  == Outdated reference: A later version (-14) exists of
     draft-ietf-clue-rtp-mapping-03

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 4 errors (**), 0 flaws (~~), 15 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	CLUE WG                                               M. Duckworth, Ed.
2	Internet Draft                                                  Polycom
3	Intended status: Standards Track                           A. Pepperell
4	Expires: September 3, 2015                                        Acano
5	                                                              S. Wenger
6	                                                                  Vidyo
7	                                                          March 3, 2015

9	                Framework for Telepresence Multi-Streams
10	                    draft-ietf-clue-framework-21.txt

12	Abstract

14	   This document defines a framework for a protocol to enable devices
15	   in a telepresence conference to interoperate.  The protocol enables
16	   communication of information about multiple media streams so a
17	   sending system and receiving system can make reasonable decisions
18	   about transmitting, selecting and rendering the media streams.
19	   This protocol is used in addition to SIP signaling and SDP
20	   negotiation for setting up a telepresence session.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current
30	   Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six
33	   months and may be updated, replaced, or obsoleted by other
34	   documents at any time.  It is inappropriate to use Internet-Drafts
35	   as reference material or to cite them other than as "work in
36	   progress."

38	   This Internet-Draft will expire on September 3, 2015.

40	Copyright Notice

42	   Copyright (c) 2013 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with
50	   respect to this document.  Code Components extracted from this
51	   document must include Simplified BSD License text as described in
52	   Section 4.e of the Trust Legal Provisions and are provided without
53	   warranty as described in the Simplified BSD License.

55	Table of Contents

57	   1. Introduction...................................................3
58	   2. Terminology....................................................4
59	   3. Definitions....................................................4
60	   4. Overview and Motivation........................................7
61	   5. Description of the Framework/Model.............................9
62	   6. Spatial Relationships.........................................15
63	   7. Media Captures and Capture Scenes.............................17
64	      7.1. Media Captures...........................................17
65	         7.1.1. Media Capture Attributes............................18
66	      7.2. Multiple Content Capture.................................23
67	         7.2.1. MCC Attributes......................................24
68	      7.3. Capture Scene............................................30
69	         7.3.1. Capture Scene attributes............................33
70	         7.3.2. Capture Scene View attributes.......................33
71	      7.4. Global View List.........................................34
72	   8. Simultaneous Transmission Set Constraints.....................35
73	   9. Encodings.....................................................37
74	      9.1. Individual Encodings.....................................37
75	      9.2. Encoding Group...........................................38
76	      9.3. Associating Captures with Encoding Groups................39
77	   10. Consumer's Choice of Streams to Receive from the Provider....40
78	      10.1. Local preference........................................43
79	      10.2. Physical simultaneity restrictions......................43
80	      10.3. Encoding and encoding group limits......................43
81	   11. Extensibility................................................44
82	   12. Examples - Using the Framework (Informative).................44
83	      12.1. Provider Behavior.......................................44
84	         12.1.1. Three screen Endpoint Provider.....................44
85	         12.1.2. Encoding Group Example.............................51
86	         12.1.3. The MCU Case.......................................52

88	      12.2. Media Consumer Behavior.................................53
89	         12.2.1. One screen Media Consumer..........................53
90	         12.2.2. Two screen Media Consumer configuring the example..54
91	         12.2.3. Three screen Media Consumer configuring the example55
92	      12.3. Multipoint Conference utilizing Multiple Content Captures55
93	         12.3.1. Single Media Captures and MCC in the same
94	         Advertisement..............................................55
95	         12.3.2. Several MCCs in the same Advertisement.............59
96	         12.3.3. Heterogeneous conference with switching and
97	         composition................................................60
98	         12.3.4. Heterogeneous conference with voice activated
99	         switching..................................................67
100	   13. Acknowledgements.............................................70
101	   14. IANA Considerations..........................................70
102	   15. Security Considerations......................................70
103	   16. Changes Since Last Version...................................72
104	   17. Normative References.........................................80
105	   18. Informative References.......................................81
106	   19. Authors' Addresses...........................................82

108	1. Introduction

110	   Current telepresence systems, though based on open standards such
111	   as RTP [RFC3550] and SIP [RFC3261], cannot easily interoperate with
112	   each other.  A major factor limiting the interoperability of
113	   telepresence systems is the lack of a standardized way to describe
114	   and negotiate the use of multiple audio and video streams
115	   comprising the media flows.  This document provides a framework for
116	   protocols to enable interoperability by handling multiple streams
117	   in a standardized way.  The framework is intended to support the
118	   use cases described in Use Cases for Telepresence Multistreams
119	   [RFC7205] and to meet the requirements in Requirements for
120	   Telepresence Multistreams [RFC7262]. This includes cases using
121	   multiple media streams that are not necessarily telepresence.

123	   This document occasionally refers to the term "CLUE", in capital
124	   letters.  CLUE is an acronym for "ControLling mUltiple streams for
125	   tElepresence", which is the name of the IETF working group in which
126	   this document and certain  companion documents have been developed.
127	   Often, CLUE-something refers to something that has been designed by
128	   the CLUE working group; for example, this document may be called
129	   the CLUE-framework.

131	   The basic session setup for the use cases is based on SIP [RFC3261]
132	   and SDP offer/answer [RFC3264].  In addition to basic SIP & SDP
133	   offer/answer, CLUE specific signaling is required to exchange the
134	   information describing the multiple media streams.  The motivation
135	   for this framework, an overview of the signaling, and information
136	   required to be exchanged is described in subsequent sections of
137	   this document.  Companion documents describe the signaling details
138	   [I-D.ietf-clue-signaling] and the data model [I-D.ietf-clue-data-
139	   model-schema] and protocol [I-D.ietf-clue-protocol].

141	2. Terminology

143	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
144	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
145	   this document are to be interpreted as described in RFC 2119
146	   [RFC2119].

148	3. Definitions

150	   The terms defined below are used throughout this document and
151	   companion documents.  In order to easily identify the use of a
152	   defined term, those terms are capitalized.

154	   Advertisement: a CLUE message a Media Provider sends to a Media
155	   Consumer describing specific aspects of the content of the media,
156	   and any restrictions it has in terms of being able to provide
157	   certain Streams simultaneously.

159	   Audio Capture: Media Capture for audio.  Denoted as ACn in the
160	   examples in this document.

162	   Capture: Same as Media Capture.

164	   Capture Device: A device that converts physical input, such as
165	   audio, video or text, into an electrical signal, in most cases to
166	   be fed into a media encoder.

168	   Capture Encoding: A specific encoding of a Media Capture, to be
169	   sent by a Media Provider to a Media Consumer via RTP.

171	   Capture Scene: a structure representing a spatial region captured
172	   by one or more Capture Devices, each capturing media representing a
173	   portion of the region. The spatial region represented by a Capture
174	   Scene MAY correspond to a real region in physical space, such as a
175	   room.  A Capture Scene includes attributes and one or more Capture
176	   Scene Views, with each view including one or more Media Captures.

178	   Capture Scene View (CSV): a list of Media Captures of the same
179	   media type that together form one way to represent the entire
180	   Capture Scene.

182	   CLUE-capable device: A device that supports the CLUE data channel
183	   [I-D.ietf-clue-datachannel], the CLUE protocol [I-D.ietf-clue-
184	   protocol] and the principles of CLUE negotiation, and seeks CLUE-
185	   enabled calls.

187	   CLUE-enabled call: A call in which two CLUE-capable devices have
188	   successfully negotiated support for a CLUE data channel in SDP
189	   [RFC4566]. A CLUE-enabled call is not necessarily immediately able
190	   to send CLUE-controlled media; negotiation of the data channel and
191	   of the CLUE protocol must complete first. Calls between two CLUE-
192	   capable devices which have not yet successfully completed
193	   negotiation of support for the CLUE data channel in SDP are not
194	   considered CLUE- enabled.

196	   Conference: used as defined in [RFC4353], A Framework for
197	   Conferencing within the Session Initiation Protocol (SIP).

199	   Configure Message: A CLUE message a Media Consumer sends to a Media
200	   Provider specifying which content and Media Streams it wants to
201	   receive, based on the information in a corresponding Advertisement
202	   message.

204	   Consumer: short for Media Consumer.

206	   Encoding: short for Individual Encoding.

208	   Encoding Group: A set of encoding parameters representing a total
209	   media encoding capability to be sub-divided across potentially
210	   multiple Individual Encodings.

212	   Endpoint: A CLUE-capable device which is the logical point of final
213	   termination through receiving, decoding and rendering, and/or
214	   initiation through capturing, encoding, and sending of media
215	   streams.  An endpoint consists of one or more physical devices
216	   which source and sink media streams, and exactly one [RFC4353]
217	   Participant (which, in turn, includes exactly one SIP User Agent).
218	   Endpoints can be anything from multiscreen/multicamera rooms to
219	   handheld devices.

221	   Global View: A set of references to one or more Capture Scene Views
222	   of the same media type that are defined within Scenes of the same
223	   advertisement.  A Global View is a suggestion from the Provider to
224	   the Consumer for one set of CSVs that provide a useful
225	   representation of all the scenes in the advertisement.

227	   Global View List: A list of Global Views included in an
228	   Advertisement.  A Global View List may include Global Views of
229	   different media types.

231	   Individual Encoding: a set of parameters representing a way to
232	   encode a Media Capture to become a Capture Encoding.

234	   Multipoint Control Unit (MCU): a CLUE-capable device that connects
235	   two or more endpoints together into one single multimedia
236	   conference [RFC5117].  An MCU includes an [RFC4353] like Mixer,
237	   without the [RFC4353] requirement to send media to each
238	   participant.

240	   Media: Any data that, after suitable encoding, can be conveyed over
241	   RTP, including audio, video or timed text.

243	   Media Capture: a source of Media, such as from one or more Capture
244	   Devices or constructed from other Media streams.

246	   Media Consumer: a CLUE-capable device that intends to receive
247	   Capture Encodings.

249	   Media Provider: a CLUE-capable device that intends to send Capture
250	   Encodings.

252	   Multiple Content Capture (MCC): A Capture that mixes and/or
253	   switches other Captures of a single type. (E.g. all audio or all
254	   video.) Particular Media Captures may or may not be present in the
255	   resultant Capture Encoding depending on time or space.  Denoted as
256	   MCCn in the example cases in this document.

258	   Plane of Interest: The spatial plane within a scene containing the
259	   most relevant subject matter.

261	   Provider: Same as Media Provider.

263	   Render: the process of generating a representation from media, such
264	   as displayed motion video or sound emitted from loudspeakers.

266	   Scene: Same as Capture Scene

268	   Simultaneous Transmission Set: a set of Media Captures that can be
269	   transmitted simultaneously from a Media Provider.

271	   Single Media Capture: A capture which contains media from a single
272	   source capture device, e.g. an audio capture from a single
273	   microphone, a video capture from a single camera.

275	   Spatial Relation: The arrangement in space of two objects, in
276	   contrast to relation in time or other relationships.

278	   Stream: a Capture Encoding sent from a Media Provider to a Media
279	   Consumer via RTP [RFC3550].

281	   Stream Characteristics: the media stream attributes commonly used
282	   in non-CLUE SIP/SDP environments (such as: media codec, bit rate,
283	   resolution, profile/level etc.) as well as CLUE specific
284	   attributes, such as the Capture ID or a spatial location.

286	   Video Capture: Media Capture for video.  Denoted as VCn in the
287	   example cases in this document.

289	   Video Composite: A single image that is formed, normally by an RTP
290	   mixer inside an MCU, by combining visual elements from separate
291	   sources.

293	4. Overview and Motivation

295	   This section provides an overview of the functional elements
296	   defined in this document to represent a telepresence or
297	   multistream system.  The motivations for the framework described
298	   in this document are also provided.

300	   Two key concepts introduced in this document are the terms "Media
301	   Provider" and "Media Consumer". A Media Provider represents the
302	   entity that sends the media and a Media Consumer represents the
303	   entity that receives the media. A Media Provider provides Media in
304	   the form of RTP packets, a Media Consumer consumes those RTP
305	   packets.  Media Providers and Media Consumers can reside in
306	   Endpoints or in Multipoint Control Units (MCUs).  A Media Provider
307	   in an Endpoint is usually associated with the generation of media
308	   for Media Captures; these Media Captures are typically sourced
309	   from cameras, microphones, and the like.  Similarly, the Media
310	   Consumer in an Endpoint is usually associated with renderers, such
311	   as screens and loudspeakers.  In MCUs, Media Providers and
312	   Consumers can have the form of outputs and inputs, respectively,
313	   of RTP mixers, RTP translators, and similar devices.  Typically,
314	   telepresence devices such as Endpoints and MCUs would perform as
315	   both Media Providers and Media Consumers, the former being
316	   concerned with those devices' transmitted media and the latter
317	   with those devices' received media.  In a few circumstances, a
318	   CLUE-capable device includes only Consumer or Provider
319	   functionality, such as recorder-type Consumers or webcam-type
320	   Providers.

322	   The motivations for the framework outlined in this document
323	   include the following:

325	   (1) Endpoints in telepresence systems typically have multiple Media
326	   Capture and Media Render devices, e.g., multiple cameras and
327	   screens. While previous system designs were able to set up calls
328	   that would capture media using all cameras and display media on all
329	   screens, for example, there was no mechanism that could associate
330	   these Media Captures with each other in space and time, in a cross-
331	   vendor interoperable way.

333	   (2) The mere fact that there are multiple capturing and rendering
334	   devices, each of which may be configurable in aspects such as zoom,
335	   leads to the difficulty that a variable number of such devices can
336	   be used to capture different aspects of a region.  The Capture
337	   Scene concept allows for the description of multiple setups for
338	   those multiple capture devices that could represent sensible
339	   operation points of the physical capture devices in a room, chosen
340	   by the operator.  A Consumer can pick and choose from those
341	   configurations based on its rendering abilities and inform the
342	   Provider about its choices.  Details are provided in section 7.

344	   (3) In some cases, physical limitations or other reasons disallow
345	   the concurrent use of a device in more than one setup.  For
346	   example, the center camera in a typical three-camera conference
347	   room can set its zoom objective either to capture only the middle
348	   few seats, or all seats of a room, but not both concurrently.  The
349	   Simultaneous Transmission Set concept allows a Provider to signal
350	   such limitations.  Simultaneous Transmission Sets are part of the
351	   Capture Scene description, and are discussed in section 8.

353	   (4) Often, the devices in a room do not have the computational
354	   complexity or connectivity to deal with multiple encoding options
355	   simultaneously, even if each of these options is sensible in
356	   certain scenarios, and even if the simultaneous transmission is
357	   also sensible (i.e. in case of multicast media distribution to
358	   multiple endpoints).   Such constraints can be expressed by the
359	   Provider using the Encoding Group concept, described in section 9.

361	   (5) Due to the potentially large number of RTP streams required for
362	   a Multimedia Conference involving potentially many Endpoints, each
363	   of which can have many Media Captures and media renderers, it has
364	   become common to multiplex multiple RTP streams onto the same
365	   transport address, so to avoid using the port number as a
366	   multiplexing point and the associated shortcomings such as
367	   NAT/firewall traversal.  The large number of possible permutations
368	   of sensible options a Media Provider can make available to a Media
369	   Consumer makes a mechanism desirable that allows it to narrow down
370	   the number of possible options that a SIP offer/answer exchange has
371	   to consider.  Such information is made available using protocol
372	   mechanisms specified in this document and companion documents,
373	   although it should be stressed that its use in an implementation is
374	   OPTIONAL.  Also, there are aspects of the control of both Endpoints
375	   and MCUs that dynamically change during the progress of a call,
376	   such as audio-level based screen switching, layout changes, and so
377	   on, which need to be conveyed.  Note that these control aspects are
378	   complementary to those specified in traditional SIP based
379	   conference management such as BFCP.  An exemplary call flow can be
380	   found in section 5.

382	   Finally, all this information needs to be conveyed, and the notion
383	   of support for it needs to be established.  This is done by the
384	   negotiation of a "CLUE channel", a data channel negotiated early
385	   during the initiation of a call.  An Endpoint or MCU that rejects
386	   the establishment of this data channel, by definition, does not
387	   support CLUE based mechanisms, whereas an Endpoint or MCU that
388	   accepts it is REQUIRED to use it to the extent specified in this
389	   document and its companion documents.

391	5. Description of the Framework/Model

393	   The CLUE framework specifies how multiple media streams are to be
394	   handled in a telepresence conference.

396	   A Media Provider (transmitting Endpoint or MCU) describes specific
397	   aspects of the content of the media and the media stream encodings
398	   it can send in an Advertisement; and the Media Consumer responds to
399	   the Media Provider by specifying which content and media streams it
400	   wants to receive in a Configure message.  The Provider then
401	   transmits the asked-for content in the specified streams.

403	   This Advertisement and Configure typically occur during call
404	   initiation, after CLUE has been enabled in a call, but MAY also
405	   happen at any time throughout the call, whenever there is a change
406	   in what the Consumer wants to receive or (perhaps less common) the
407	   Provider can send.

409	   An Endpoint or MCU typically act as both Provider and Consumer at
410	   the same time, sending Advertisements and sending Configurations in
411	   response to receiving Advertisements.  (It is possible to be just
412	   one or the other.)

414	   The data model [I-D.ietf-clue-data-model-schema]is based around two
415	   main concepts: a Capture and an Encoding.  A Media Capture (MC),
416	   such as of type audio or video, has attributes to describe the
417	   content a Provider can send.  Media Captures are described in terms
418	   of CLUE-defined attributes, such as spatial relationships and
419	   purpose of the capture.  Providers tell Consumers which Media
420	   Captures they can provide, described in terms of the Media Capture
421	   attributes.

423	   A Provider organizes its Media Captures into one or more Capture
424	   Scenes, each representing a spatial region, such as a room.  A
425	   Consumer chooses which Media Captures it wants to receive from the
426	   Capture Scenes.

428	   In addition, the Provider can send the Consumer a description of
429	   the Individual Encodings it can send in terms of identifiers which
430	   relate to items in SDP [RFC4566].

432	   The Provider can also specify constraints on its ability to provide
433	   Media, and a sensible design choice for a Consumer is to take these
434	   into account when choosing the content and Capture Encodings it
435	   requests in the later offer/answer exchange.  Some constraints are
436	   due to the physical limitations of devices--for example, a camera
437	   may not be able to provide zoom and non-zoom views simultaneously.
438	   Other constraints are system based, such as maximum bandwidth.

440	   The following diagram illustrates the information contained in an
441	   Advertisement.

443	   ...................................................................
444	   .  Provider Advertisement             +--------------------+      .
445	   .                                     | Simultaneous Sets  |      .
446	   .        +------------------------+   +--------------------+      .
447	   .        |       Capture Scene N  |   +--------------------+      .
448	   .      +-+----------------------+ |   | Global View List   |      .
449	   .      |       Capture Scene 2  | |   +--------------------+      .
450	   .    +-+----------------------+ | |      +----------------------+ .
451	   .    |  Capture Scene 1       | | |      |  Encoding Group N    | .
452	   .    |    +---------------+   | | |    +-+--------------------+ | .
453	   .    |    | Attributes    |   | | |    |   Encoding Group 2   | | .
454	   .    |    +---------------+   | | |  +-+--------------------+ | | .
455	   .    |                        | | |  |   Encoding Group 1   | | | .
456	   .    |    +----------------+  | | |  |     parameters       | | | .
457	   .    |    |  V i e w s     |  | | |  |      bandwidth       | | | .
458	   .    |    |  +---------+   |  | | |  | +-------------------+| | | .
459	   .    |    |  |Attribute|   |  | | |  | | V i d e o         || | | .
460	   .    |    |  +---------+   |  | | |  | | E n c o d i n g s || | | .
461	   .    |    |                |  | | |  | | Encoding 1        || | | .
462	   .    |    | View 1         |  | | |  | |                   || | | .
463	   .    |    |  (list of MCs) |  | |-+  | +-------------------+| | | .
464	   .    |    +----|-|--|------+  |-+    |                      | | | .
465	   .    +---------|-|--|---------+      | +-------------------+| | | .
466	   .              | |  |                | | A u d i o         || | | .
467	   .              | |  |                | | E n c o d i n g s || | | .
468	   .              v |  |                | | Encoding 1        || | | .
469	   .      +---------|--|--------+       | |                   || | | .
470	   .      | Media Capture N     |------>| +-------------------+| | | .
471	   .    +-+---------v--|------+ |       |                      | | | .
472	   .    | Media Capture 2     | |       |                      | |-+ .
473	   .  +-+--------------v----+ |-------->|                      | |   .
474	   .  | Media Capture  1    | | |       |                      |-+   .
475	   .  |  +----------------+ |---------->|                      |     .
476	   .  |  | Attributes     | | |_+       +----------------------+     .
477	   .  |  +----------------+ |_+                                      .
478	   .  +---------------------+                                        .
479	   .                                                                 .
480	   ...................................................................

482	                   Figure 1:   Advertisement Structure

484	   A very brief outline of the call flow used by a simple system (two
485	   Endpoints) in compliance with this document can be described as
486	   follows, and as shown in the following figure.

488	         +-----------+                     +-----------+
489	         | Endpoint1 |                     | Endpoint2 |
490	         +----+------+                     +-----+-----+
491	              | INVITE (BASIC SDP+CLUECHANNEL)   |
492	              |--------------------------------->|
493	              |    200 0K (BASIC SDP+CLUECHANNEL)|
494	              |<---------------------------------|
495	              | ACK                              |
496	              |--------------------------------->|
497	              |                                  |
498	              |<################################>|
499	              |     BASIC SDP MEDIA SESSION      |
500	              |<################################>|
501	              |                                  |
502	              |    CONNECT (CLUE CTRL CHANNEL)   |
503	              |=================================>|
504	              |            ...                   |
505	              |<================================>|
506	              |   CLUE CTRL CHANNEL ESTABLISHED  |
507	              |<================================>|
508	              |                                  |
509	              | ADVERTISEMENT 1                  |
510	              |*********************************>|
511	              |                  ADVERTISEMENT 2 |
512	              |<*********************************|
513	              |                                  |
514	              |                      CONFIGURE 1 |
515	              |<*********************************|
516	              | CONFIGURE 2                      |
517	              |*********************************>|
518	              |                                  |
519	              | REINVITE (UPDATED SDP)           |
520	              |--------------------------------->|
521	              |              200 0K (UPDATED SDP)|
522	              |<---------------------------------|
523	              | ACK                              |
524	              |--------------------------------->|
525	              |                                  |
526	              |<################################>|
527	              |   UPDATED SDP MEDIA SESSION      |
528	              |<################################>|
529	              |                                  |
530	              v                                  v

532	                   Figure 2:   Basic Information Flow

534	   An initial offer/answer exchange establishes a basic media session,
535	   for example audio-only, and a CLUE channel between two Endpoints.
536	   With the establishment of that channel, the endpoints have
537	   consented to use the CLUE protocol mechanisms and, therefore, MUST
538	   adhere to the CLUE protocol suite as outlined herein.

540	   Over this CLUE channel, the Provider in each Endpoint conveys its
541	   characteristics and capabilities by sending an Advertisement as
542	   specified herein.  The Advertisement is typically not sufficient to
543	   set up all media.  The Consumer in the Endpoint receives the
544	   information provided by the Provider, and can use it for several
545	   purposes.  It uses it, along with information from an offer/answer
546	   exchange, to construct a CLUE Configure message to tell the
547	   Provider what the Consumer wishes to receive.  Also, the Consumer
548	   MAY use the information provided to tailor the SDP it is going to
549	   send during any following SIP offer/answer exchange, and its
550	   reaction to SDP it receives in that step.  It is often a sensible
551	   implementation choice to do so.  Spatial relationships associated
552	   with the Media can be included in the Advertisement, and it is
553	   often sensible for the Media Consumer to take those spatial
554	   relationships into account when tailoring the SDP.  The Consumer
555	   can also limit the number of encodings it must set up resources to
556	   receive, and not waste resources on unwanted encodings, because it
557	   has the Provider's Advertisement information ahead of time to
558	   determine what it really wants to receive.  The Consumer can also
559	   use the Advertisement information for local rendering decisions.

561	   This initial CLUE exchange is followed by an SDP offer/answer
562	   exchange that not only establishes those aspects of the media that
563	   have not been "negotiated" over CLUE, but has also the side effect
564	   of setting up the media transmission itself, involving potentially
565	   security exchanges, ICE, and whatnot.  This step is plain vanilla
566	   SIP.

568	   During the lifetime of a call, further exchanges MAY occur over the
569	   CLUE channel.  In some cases, those further exchanges lead to a
570	   modified system behavior of Provider or Consumer (or both) without
571	   any other protocol activity such as further offer/answer exchanges.
572	   For example, a Configure Message requesting the Provider to place a
573	   different Capture source into a Capture Encoding, signaled over the
574	   CLUE channel, ought not to lead to heavy-handed mechanisms like SIP
575	   re-invites.  However, in other cases, after the CLUE negotiation an
576	   additional offer/answer exchange becomes necessary.  For example,
577	   if both sides decide to upgrade the call from a single screen to a
578	   multi-screen call and more bandwidth is required for the additional
579	   video channels compared to what was previously negotiated using
580	   offer/answer, a new O/A exchange is REQUIRED.

582	   One aspect of the protocol outlined herein and specified in more
583	   detail in companion documents is that it makes available, to the
584	   Consumer, information regarding the Provider's capabilities to
585	   deliver Media, and attributes related to that Media such as their
586	   spatial relationship.  The operation of the renderer inside the
587	   Consumer is unspecified in that it can choose to ignore some
588	   information provided by the Provider, and/or not render media
589	   streams available from the Provider (although it MUST follow the
590	   CLUE protocol and, therefore, MUST gracefully receive and respond
591	   (through a Configure) to the Provider's information).

593	   A CLUE-capable device interoperates with a device that does not
594	   support CLUE.  The CLUE-capable device can determine, by the result
595	   of the initial offer/answer exchange, if the other device supports
596	   and wishes to use CLUE. The specific mechanism for this is
597	   described in [I-D.ietf-clue-signaling].  If the other device does
598	   not use CLUE, then the CLUE-capable device falls back to behavior
599	   that does not require CLUE.

601	   As for the media, Provider and Consumer have an end-to-end
602	   communication relationship with respect to (RTP transported) media;
603	   and the mechanisms described herein and in companion documents do
604	   not change the aspects of setting up those RTP flows and sessions.
605	   In other words, the RTP media sessions conform to the negotiated
606	   SDP whether or not CLUE is used.

608	6. Spatial Relationships

610	   In order for a Consumer to perform a proper rendering, it is often
611	   necessary or at least helpful for the Consumer to have received
612	   spatial information about the streams it is receiving.  CLUE
613	   defines a coordinate system that allows Media Providers to describe
614	   the spatial relationships of their Media Captures to enable proper
615	   scaling and spatially sensible rendering of their streams.  The
616	   coordinate system is based on a few principles:

618	   o  Each Capture Scene has a distinct coordinate system, unrelated
619	      to the coordinate systems of other scenes.

621	   o  Simple systems which do not have multiple Media Captures to
622	      associate spatially need not use the coordinate model, although
623	      it can still be useful to provide an Area of Capture.

625	   o  Coordinates can be either in real, physical units (millimeters),
626	      have an unknown scale or have no physical scale.  Systems which
627	      know their physical dimensions (for example professionally
628	      installed Telepresence room systems) MUST provide those real-
629	      world measurements to enable the best user experience for
630	      advanced receiving systems that can utilize this information.
631	      Systems which don't know specific physical dimensions but still
632	      know relative distances MUST use 'unknown scale'.  'No scale' is
633	      intended to be used only where Media Captures from different
634	      devices (with potentially different scales) will be forwarded
635	      alongside one another (e.g. in the case of an MCU).

637	      *  "Millimeters" means the scale is in millimeters.

639	      *  "Unknown" means the scale is not necessarily millimeters, but
640	         the scale is the same for every Capture in the Capture Scene.

642	      *  "No Scale" means the scale could be different for each
643	         capture- an MCU Provider that advertises two adjacent
644	         captures and picks sources (which can change quickly) from
645	         different endpoints might use this value; the scale could be
646	         different and changing for each capture.  But the areas of
647	         capture still represent a spatial relation between captures.

649	   o  The coordinate system is right-handed Cartesian X, Y, Z with the
650	      origin at a spatial location of the Provider's choosing.  The
651	      Provider MUST use the same coordinate system with the same scale
652	      and origin for all coordinates within the same Capture Scene.

654	   The direction of increasing coordinate values is:
655	   X increases from left to right, from the point of view of an
656	   observer at the front of the room looking toward the back
657	   Y increases from the front of the room to the back of the room
658	   Z increases from low to high (i.e. floor to ceiling)

660	   Cameras in a scene typically point in the direction of increasing
661	   Y, from front to back.  But there could be multiple cameras
662	   pointing in different directions.  If the physical space does not
663	   have a well-defined front and back, the provider chooses any
664	   direction for X and Y consistent with right-handed coordinates.

666	7. Media Captures and Capture Scenes

668	   This section describes how Providers can describe the content of
669	   media to Consumers.

671	7.1. Media Captures

673	   Media Captures are the fundamental representations of streams that
674	   a device can transmit.  What a Media Capture actually represents is
675	   flexible:

677	   o  It can represent the immediate output of a physical source (e.g.
678	      camera, microphone) or 'synthetic' source (e.g. laptop computer,
679	      DVD player)

681	   o  It can represent the output of an audio mixer or video composer

683	   o  It can represent a concept such as 'the loudest speaker'

685	   o  It can represent a conceptual position such as 'the leftmost
686	      stream'

688	   To identify and distinguish between multiple Capture instances
689	   Captures have a unique identity.  For instance: VC1, VC2 and AC1,
690	   AC2, where VC1 and VC2 refer to two different video captures and
691	   AC1 and AC2 refer to two different audio captures.

693	   Some key points about Media Captures:

695	     . A Media Capture is of a single media type (e.g. audio or
696	        video)
697	     . A Media Capture is defined in a Capture Scene and is given an
698	        Advertisement unique identity.  The identity may be referenced
699	        outside the Capture Scene that defines it through a Multiple
700	        Content Capture (MCC)
701	     . A Media Capture may be associated with one or more Capture
702	        Scene Views
703	     . A Media Capture has exactly one set of spatial information
704	     . A Media Capture can be the source of at most one Capture
705	        Encoding

707	   Each Media Capture can be associated with attributes to describe
708	   what it represents.

710	7.1.1. Media Capture Attributes

712	   Media Capture Attributes describe information about the Captures.
713	   A Provider can use the Media Capture Attributes to describe the
714	   Captures for the benefit of the Consumer of the Advertisement
715	   message.  All these attributes are optional.  Media Capture
716	   Attributes include:

718	     . Spatial information, such as point of capture, point on line
719	        of capture, and area of capture, all of which, in combination
720	        define the capture field of, for example, a camera
721	     . Other descriptive information to help the Consumer choose
722	        between captures (e.g. description, presentation, view,
723	        priority, language, person information and type)

725	   The sub-sections below define the Capture attributes.

727	7.1.1.1. Point of Capture

729	   The Point of Capture attribute is a field with a single Cartesian
730	   (X, Y, Z) point value which describes the spatial location of the
731	   capturing device (such as camera).  For an Audio Capture with
732	   multiple microphones, the Point of Capture defines the nominal mid-
733	   point of the microphones.

735	7.1.1.2. Point on Line of Capture

737	   The Point on Line of Capture attribute is a field with a single
738	   Cartesian (X, Y, Z) point value which describes a position in space
739	   of a second point on the axis of the capturing device, toward the
740	   direction it is pointing; the first point being the Point of
741	   Capture (see above).

743	   Together, the Point of Capture and Point on Line of Capture define
744	   the direction and axis of the capturing device, for example the
745	   optical axis of a camera or the axis of a microphone.  The Media
746	   Consumer can use this information to adjust how it renders the
747	   received media if it so chooses.

749	   For an Audio Capture, the Media Consumer can use this information
750	   along with the Audio Capture Sensitivity Pattern to define a 3-
751	   dimensional volume of capture where sounds can be expected to be
752	   picked up by the microphone providing this specific audio capture.
753	   If the Consumer wants to associate an Audio Capture with a Video
754	   Capture, it can compare this volume with the area of capture for
755	   video media to provide a check on whether the audio capture is
756	   indeed spatially associated with the video capture. For example, a
757	   video area of capture that fails to intersect at all with the audio
758	   volume of capture, or is at such a long radial distance from the
759	   microphone point of capture that the audio level would be very low,
760	   would be inappropriate.

762	7.1.1.3. Area of Capture

764	   The Area of Capture is a field with a set of four (X, Y, Z) points
765	   as a value which describes the spatial location of what is being
766	   "captured".  This attribute applies only to video captures, not
767	   other types of media. By comparing the Area of Capture for
768	   different Video Captures within the same Capture Scene a Consumer
769	   can determine the spatial relationships between them and render
770	   them correctly.

772	   The four points MUST be co-planar, forming a quadrilateral, which
773	   defines the Plane of Interest for the particular Media Capture.

775	   If the Area of Capture is not specified, it means the Video Capture
776	   might be spatially related to other Captures in the same Scene, but
777	   there is no detailed information on the relationship.For a switched
778	   Capture that switches between different sections within a larger
779	   area, the area of capture MUST use coordinates for the larger
780	   potential area.

782	7.1.1.4. Mobility of Capture

784	   The Mobility of Capture attribute indicates whether or not the
785	   point of capture, line on point of capture, and area of capture
786	   values stay the same over time, or are expected to change
787	   (potentially frequently).  Possible values are static, dynamic, and
788	   highly dynamic.

790	   An example for "dynamic" is a camera mounted on a stand which is
791	   occasionally hand-carried and placed at different positions in
792	   order to provide the best angle to capture a work task.  A camera
793	   worn by a person who moves around the room is an example for
794	   "highly dynamic". In either case, the effect is that the capture
795	   point, capture axis and area of capture change with time.

797	   The capture point of a static Capture MUST NOT move for the life of
798	   the CLUE session. The capture point of dynamic Captures is
799	   categorized by a change in position followed by a reasonable period
800	   of stability--in the order of magnitude of minutes. High dynamic
801	   captures are categorized by a capture point that is constantly
802	   moving.  If the "area of capture", "capture point" and "line of
803	   capture" attributes are included with dynamic or highly dynamic
804	   Captures they indicate spatial information at the time of the
805	   Advertisement.

807	7.1.1.5. Audio Capture Sensitivity Pattern

809	   The Audio Capture Sensitivity Pattern attribute applies only to
810	   audio captures.  This attribute gives information about the nominal
811	   sensitivity pattern of the microphone which is the source of the
812	   Capture.  Possible values include patterns such as omni, shotgun,
813	   cardioid, hyper-cardioid.

815	7.1.1.6. Description

817	   The Description attribute is a human-readable description (which
818	   could be in multiple languages) of the Capture.

820	7.1.1.7. Presentation

822	   The Presentation attribute indicates that the capture originates
823	   from a presentation device, that is one that provides supplementary
824	   information to a conference through slides, video, still images,
825	   data etc.  Where more information is known about the capture it MAY
826	   be expanded hierarchically to indicate the different types of
827	   presentation media, e.g. presentation.slides, presentation.image
828	   etc.

830	   Note: It is expected that a number of keywords will be defined that
831	   provide more detail on the type of presentation.

833	7.1.1.8. View

835	   The View attribute is a field with enumerated values, indicating
836	   what type of view the Capture relates to.  The Consumer can use
837	   this information to help choose which Media Captures it wishes to
838	   receive.  The value MUST be one of:

840	   Room - Captures the entire scene

842	   Table - Captures the conference table with seated people

844	   Individual - Captures an individual person
845	   Lectern - Captures the region of the lectern including the
846	   presenter, for example in a classroom style conference room

848	   Audience - Captures a region showing the audience in a classroom
849	   style conference room

851	7.1.1.9. Language

853	   The Language attribute indicates one or more languages used in the
854	   content of the Media Capture.  Captures MAY be offered in different
855	   languages in case of multilingual and/or accessible conferences.  A
856	   Consumer can use this attribute to differentiate between them and
857	   pick the appropriate one.

859	   Note that the Language attribute is defined and meaningful both for
860	   audio and video captures.  In case of audio captures, the meaning
861	   is obvious.  For a video capture, "Language" could, for example, be
862	   sign interpretation or text.

864	   The Language attribute is coded per [RFC5646].

866	7.1.1.10. Person Information

868	   The Person Information attribute allows a Provider to provide
869	   specific information regarding the people in a Capture (regardless
870	   of whether or not the capture has a Presentation attribute). The
871	   Provider may gather the information automatically or manually from
872	   a variety of sources however the xCard [RFC6351] format is used to
873	   convey the information. This allows various information such as
874	   Identification information (section 6.2/[RFC6350]), Communication
875	   Information (section 6.4/[RFC6350]) and Organizational information
876	   (section 6.6/[RFC6350]) to be communicated. A Consumer may then
877	   automatically (i.e. via a policy) or manually select Captures
878	   based on information about who is in a Capture. It also allows a
879	   Consumer to render information regarding the people participating
880	   in the conference or to use it for further processing.

882	   The Provider may supply a minimal set of information or a larger
883	   set of information. However it MUST be compliant to [RFC6350] and
884	   supply a "VERSION" and "FN" property. A Provider may supply
885	   multiple xCards per Capture of any KIND (section 6.1.4/[RFC6350]).

887	   In order to keep CLUE messages compact the Provider SHOULD use a
888	   URI to point to any LOGO, PHOTO or SOUND contained in the xCARD
889	   rather than transmitting the LOGO, PHOTO or SOUND data in a CLUE
890	   message.

892	7.1.1.11. Person Type

894	   The Person Type attribute indicates the type of people contained in
895	   the capture with respect to the meeting agenda (regardless of
896	   whether or not the capture has a Presentation attribute). As a
897	   capture may include multiple people the attribute may contain
898	   multiple values. However values MUST NOT be repeated within the
899	   attribute.

901	   An Advertiser associates the person type with an individual capture
902	   when it knows that a particular type is in the capture. If an
903	   Advertiser cannot link a particular type with some certainty to a
904	   capture then it is not included. A Consumer on reception of a
905	   capture with a person type attribute knows with some certainly that
906	   the capture contains that person type. The capture may contain
907	   other person types but the Advertiser has not been able to
908	   determine that this is the case.

910	   The types of Captured people include:

912	     . Chairman - the person responsible for running the meeting
913	        according to the agenda.
914	     . Vice-Chairman - the person responsible for assisting the
915	        chairman in running the meeting.
916	     . Minute Taker - the person responsible for recording the
917	        minutes of the meeting.
918	     . Attendee - the person has no particular responsibilities with
919	        respect to running the meeting.
920	     . Observer - an Attendee without the right to influence the
921	        discussion.
922	     . Presenter - the person is scheduled on the agenda to make a
923	        presentation in the meeting. Note: This is not related to any
924	        "active speaker" functionality.
925	     . Translator - the person is providing some form of translation
926	        or commentary in the meeting.
927	     . Timekeeper - the person is responsible for maintaining the
928	        meeting schedule.

930	   Furthermore the person type attribute may contain one or more
931	   strings allowing the Provider to indicate custom meeting specific
932	   types.

934	7.1.1.12. Priority

936	   The Priority attribute indicates a relative priority between
937	   different Media Captures.  The Provider sets this priority, and the
938	   Consumer MAY use the priority to help decide which Captures it
939	   wishes to receive.

941	   The "priority" attribute is an integer which indicates a relative
942	   priority between Captures. For example it is possible to assign a
943	   priority between two presentation Captures that would allow a
944	   remote Endpoint to determine which presentation is more important.
945	   Priority is assigned at the individual Capture level. It represents
946	   the Provider's view of the relative priority between Captures with
947	   a priority. The same priority number MAY be used across multiple
948	   Captures. It indicates they are equally important. If no priority
949	   is assigned no assumptions regarding relative important of the
950	   Capture can be assumed.

952	7.1.1.13. Embedded Text

954	   The Embedded Text attribute indicates that a Capture provides
955	   embedded textual information. For example the video Capture MAY
956	   contain speech to text information composed with the video image.

958	7.1.1.14. Related To

960	   The Related To attribute indicates the Capture contains additional
961	   complementary information related to another Capture.  The value
962	   indicates the identity of the other Capture to which this Capture
963	   is providing additional information.

965	   For example, a conference can utilize translators or facilitators
966	   that provide an additional audio stream (i.e. a translation or
967	   description or commentary of the conference).  Where multiple
968	   captures are available, it may be advantageous for a Consumer to
969	   select a complementary Capture instead of or in addition to a
970	   Capture it relates to.

972	7.2. Multiple Content Capture

974	   The MCC indicates that one or more Single Media Captures are
975	   multiplexed (temporally and/or spatially) or mixed in one Media
976	   Capture.  Only one Capture type (i.e. audio, video, etc.) is
977	   allowed in each MCC instance.  The MCC may contain a reference to
978	   the Single Media Captures (which may have their own attributes) as
979	   well as attributes associated with the MCC itself.  A MCC may also
980	   contain other MCCs.  The MCC MAY reference Captures from within the
981	   Capture Scene that defines it or from other Capture Scenes.  No
982	   ordering is implied by the order that Captures appear within a MCC.
983	   A MCC MAY contain no references to other Captures to indicate that
984	   the MCC contains content from multiple sources but no information
985	   regarding those sources is given. MCCs either contain the
986	   referenced Captures and no others, or have no referenced captures
987	   and therefore may contain any Capture.

989	   One or more MCCs may also be specified in a CSV.  This allows an
990	   Advertiser to indicate that several MCC captures are used to
991	   represent a capture scene.  Table 14 provides an example of this
992	   case.

994	   As outlined in section 7.1. each instance of the MCC has its own
995	   Capture identity i.e. MCC1. It allows all the individual captures
996	   contained in the MCC to be referenced by a single MCC identity.

998	   The example below shows the use of a Multiple Content Capture:

1000	        +-----------------------+---------------------------------+
1001	        | Capture Scene #1      |                                 |
1002	        +-----------------------|---------------------------------+
1003	        | VC1                   | {MC attributes}                 |
1004	        | VC2                   | {MC attributes}                 |
1005	        | VC3                   | {MC attributes}                 |
1006	        | MCC1(VC1,VC2,VC3)     | {MC and MCC attributes}         |
1007	        | CSV(MCC1)             |                                 |
1008	        +---------------------------------------------------------+

1010	                Table 1: Multiple Content Capture concept

1012	   This indicates that MCC1 is a single capture that contains the
1013	   Captures VC1, VC2 and VC3 according to any MCC1 attributes.

1015	7.2.1. MCC Attributes

1017	   Media Capture Attributes may be associated with the MCC instance
1018	   and the Single Media Captures that the MCC references.  A Provider
1019	   should avoid providing conflicting attribute values between the MCC
1020	   and Single Media Captures. Where there is conflict the attributes
1021	   of the MCC override any that may be present in the individual
1022	   Captures.

1024	   A Provider MAY include as much or as little of the original source
1025	   Capture information as it requires.

1027	   There are MCC specific attributes that MUST only be used with
1028	   Multiple Content Captures. These are described in the sections
1029	   below. The attributes described in section 7.1.1. MAY also be used
1030	   with MCCs.

1032	   The spatial related attributes of an MCC indicate its area of
1033	   capture and point of capture within the scene, just like any other
1034	   media capture.  The spatial information does not imply anything
1035	   about how other captures are composed within an MCC.

1037	   For example:  A virtual scene could be constructed for the MCC
1038	   capture with two Video Captures with a "MaxCaptures" attribute set
1039	   to 2 and an "Area of Capture" attribute provided with an overall
1040	   area.  Each of the individual Captures could then also include an
1041	   "Area of Capture" attribute with a sub-set of the overall area.
1042	   The Consumer would then know how each capture is related to others
1043	   within the scene, but not the relative position of the individual
1044	   captures within the composed capture.

1046	        +-----------------------+---------------------------------+
1047	        | Capture Scene #1      |                                 |
1048	        +-----------------------|---------------------------------+
1049	        | VC1                   | AreaofCapture=(0,0,0)(9,0,0)    |
1050	        |                       |               (0,0,9)(9,0,9)    |
1051	        | VC2                   | AreaofCapture=(10,0,0)(19,0,0)  |
1052	        |                       |               (10,0,9)(19,0,9)  |
1053	        | MCC1(VC1,VC2)         | MaxCaptures=2                   |
1054	        |                       | AreaofCapture=(0,0,0)(19,0,0)   |
1055	        |                       |               (0,0,9)(19,0,9)   |
1056	        | CSV(MCC1)             |                                 |
1057	        +---------------------------------------------------------+

1059	        Table 2: Example of MCC and Single Media Capture attributes

1061	   The sub-sections below describe the MCC only attributes.

1063	7.2.1.1. Maximum Number of Captures within a MCC

1065	   The Maximum Number of Captures MCC attribute indicates the maximum
1066	   number of individual Captures that may appear in a Capture Encoding
1067	   at a time.  The actual number at any given time can be less than or
1068	   equal to this maximum.  It may be used to derive how the Single
1069	   Media Captures within the MCC are composed / switched with regards
1070	   to space and time.

1072	   A Provider can indicate that the number of Captures in a MCC
1073	   Capture Encoding is equal "=" to the MaxCaptures value or that
1074	   there may be any number of Captures up to and including "<=" the
1075	   MaxCaptures value. This allows a Provider to distinguish between a
1076	   MCC that purely represents a composition of sources versus a MCC
1077	   that represents switched or switched and composed sources.

1079	   MaxCaptures MAY be set to one so that only content related to one
1080	   of the sources are shown in the MCC Capture Encoding at a time or
1081	   it may be set to any value up to the total number of Source Media
1082	   Captures in the MCC.

1084	   The bullets below describe how the setting of MaxCapture versus the
1085	   number of Captures in the MCC affects how sources appear in a
1086	   Capture Encoding:

1088	     . When MaxCaptures is set to <= 1 and the number of Captures in
1089	        the MCC is greater than 1 (or not specified) in the MCC this
1090	        is a switched case. Zero or 1 Captures may be switched into
1091	        the Capture Encoding. Note: zero is allowed because of the
1092	        "<=".
1093	     . When MaxCaptures is set to = 1 and the number of Captures in
1094	        the MCC is greater than 1 (or not specified) in the MCC this
1095	        is a switched case. Only one Capture source is contained in a
1096	        Capture Encoding at a time.
1097	     . When MaxCaptures is set to <= N (with N > 1) and the number of
1098	        Captures in the MCC is greater than N (or not specified) this
1099	        is a switched and composed case. The Capture Encoding may
1100	        contain purely switched sources (i.e. <=2 allows for 1 source
1101	        on its own), or may contain composed and switched sources
1102	        (i.e. a composition of 2 sources switched between the
1103	        sources).
1104	     . When MaxCaptures is set to = N (with N > 1) and the number of
1105	        Captures in the MCC is greater than N (or not specified) this
1106	        is a switched and composed case. The Capture Encoding contains
1107	        composed and switched sources (i.e. a composition of N sources
1108	        switched between the sources). It is not possible to have a
1109	        single source.
1110	     . When MaxCaptures is set to <= to the number of Captures in the
1111	        MCC this is a switched and composed case. The Capture Encoding
1112	        may contain media switched between any number (up to the
1113	        MaxCaptures) of composed sources.

1115	     . When MaxCaptures is set to = to the number of Captures in the
1116	        MCC this is a composed case. All the sources are composed into
1117	        a single Capture Encoding.

1119	   If this attribute is not set then as default it is assumed that all
1120	   source media capture content can appear concurrently in the Capture
1121	   Encoding associated with the MCC.

1123	   For example: The use of MaxCaptures equal to 1 on a MCC with three
1124	   Video Captures VC1, VC2 and VC3 would indicate that the Advertiser
1125	   in the Capture Encoding would switch between VC1, VC2 or VC3 as
1126	   there may be only a maximum of one Capture at a time.

1128	7.2.1.2. Policy

1130	   The Policy MCC Attribute indicates the criteria that the Provider
1131	   uses to determine when and/or where media content appears in the
1132	   Capture Encoding related to the MCC.

1134	   The attribute is in the form of a token that indicates the policy
1135	   and an index representing an instance of the policy.  The same
1136	   index value can be used for multiple MCCs.

1138	   The tokens are:

1140	   SoundLevel - This indicates that the content of the MCC is
1141	   determined by a sound level detection algorithm. The loudest
1142	   (active) speaker (or a previous speaker, depending on the index
1143	   value) is contained in the MCC.

1145	   RoundRobin - This indicates that the content of the MCC is
1146	   determined by a time based algorithm. For example: the Provider
1147	   provides content from a particular source for a period of time and
1148	   then provides content from another source and so on.

1150	   An index is used to represent an instance in the policy setting. An
1151	   index of 0 represents the most current instance of the policy, i.e.
1152	   the active speaker, 1 represents the previous instance, i.e. the
1153	   previous active speaker and so on.

1155	   The following example shows a case where the Provider provides two
1156	   media streams, one showing the active speaker and a second stream
1157	   showing the previous speaker.

1159	        +-----------------------+---------------------------------+
1160	        | Capture Scene #1      |                                 |
1161	        +-----------------------|---------------------------------+
1162	        | VC1                   |                                 |
1163	        | VC2                   |                                 |
1164	        | MCC1(VC1,VC2)         | Policy=SoundLevel:0             |
1165	        |                       | MaxCaptures=1                   |
1166	        | MCC2(VC1,VC2)         | Policy=SoundLevel:1             |
1167	        |                       | MaxCaptures=1                   |
1168	        | CSV(MCC1,MCC2)        |                                 |
1169	        +---------------------------------------------------------+

1171	                Table 3: Example Policy MCC attribute usage

1173	7.2.1.3. Synchronisation Identity

1175	   The Synchronisation Identity MCC attribute indicates how the
1176	   individual Captures in multiple MCC Captures are synchronised.  To
1177	   indicate that the Capture Encodings associated with MCCs contain
1178	   Captures from the same source at the same time a Provider should
1179	   set the same Synchronisation Identity on each of the concerned
1180	   MCCs.  It is the Provider that determines what the source for the
1181	   Captures is, so a Provider can choose how to group together Single
1182	   Media Captures into a combined "source" for the purpose of
1183	   switching them together to keep them synchronized according to the
1184	   SynchronisationID attribute.  For example when the Provider is in
1185	   an MCU it may determine that each separate CLUE Endpoint is a
1186	   remote source of media. The Synchronisation Identity may be used
1187	   across media types, i.e. to synchronize audio and video related
1188	   MCCs.

1190	   Without this attribute it is assumed that multiple MCCs may provide
1191	   content from different sources at any particular point in time.

1193	   For example:

1195	        +=======================+=================================+
1196	        | Capture Scene #1      |                                 |
1197	        +-----------------------|---------------------------------+
1198	        | VC1                   | Description=Left                |
1199	        | VC2                   | Description=Centre              |
1200	        | VC3                   | Description=Right               |
1201	        | AC1                   | Description=Room                |
1202	        | CSV(VC1,VC2,VC3)      |                                 |
1203	        | CSV(AC1)              |                                 |
1204	        +=======================+=================================+
1205	        | Capture Scene #2      |                                 |
1206	        +-----------------------|---------------------------------+
1207	        | VC4                   | Description=Left                |
1208	        | VC5                   | Description=Centre              |
1209	        | VC6                   | Description=Right               |
1210	        | AC2                   | Description=Room                |
1211	        | CSV(VC4,VC5,VC6)      |                                 |
1212	        | CSV(AC2)              |                                 |
1213	        +=======================+=================================+
1214	        | Capture Scene #3      |                                 |
1215	        +-----------------------|---------------------------------+
1216	        | VC7                   |                                 |
1217	        | AC3                   |                                 |
1218	        +=======================+=================================+
1219	        | Capture Scene #4      |                                 |
1220	        +-----------------------|---------------------------------+
1221	        | VC8                   |                                 |
1222	        | AC4                   |                                 |
1223	        +=======================+=================================+
1224	        | Capture Scene #5      |                                 |
1225	        +-----------------------|---------------------------------+
1226	        | MCC1(VC1,VC4,VC7)     | SynchronisationID=1             |
1227	        |                       | MaxCaptures=1                   |
1228	        | MCC2(VC2,VC5,VC8)     | SynchronisationID=1             |
1229	        |                       | MaxCaptures=1                   |
1230	        | MCC3(VC3,VC6)         | MaxCaptures=1                   |
1231	        | MCC4(AC1,AC2,AC3,AC4) | SynchronisationID=1             |
1232	        |                       | MaxCaptures=1                   |
1233	        | CSV(MCC1,MCC2,MCC3)   |                                 |
1234	        | CSV(MCC4)             |                                 |
1235	        +=======================+=================================+

1237	       Table 4: Example Synchronisation Identity MCC attribute usage

1239	   The above Advertisement would indicate that MCC1, MCC2, MCC3 and
1240	   MCC4 make up a Capture Scene.  There would be four Capture
1241	   Encodings (one for each MCC).  Because MCC1 and MCC2 have the same
1242	   SynchronisationID, each Encoding from MCC1 and MCC2 respectively
1243	   would together have content from only Capture Scene 1 or only
1244	   Capture Scene 2 or the combination of VC7 and VC8 at a particular
1245	   point in time.  In this case the Provider has decided the sources
1246	   to be synchronized are Scene #1, Scene #2, and Scene #3 and #4
1247	   together. The Encoding from MCC3 would not be synchronised with
1248	   MCC1 or MCC2. As MCC4 also has the same Synchronisation Identity
1249	   as MCC1 and MCC2 the content of the audio Encoding will be
1250	   synchronised with the video content.

1252	7.2.1.4. Allow Subset Choice

1254	   The Allow Subset Choice MCC attribute is a boolean value,
1255	   indicating whether or not the Provider allows the Consumer to
1256	   choose a specific subset of the Captures referenced by the MCC.
1257	   If this attribute is true, and the MCC references other Captures,
1258	   then the Consumer MAY select (in a Configure message) a specific
1259	   subset of those Captures to be included in the MCC, and the
1260	   Provider MUST then include only that subset.  If this attribute is
1261	   false, or the MCC does not reference other Captures, then the
1262	   Consumer MUST NOT select a subset.

1264	7.3. Capture Scene

1266	   In order for a Provider's individual Captures to be used
1267	   effectively by a Consumer, the Provider organizes the Captures into
1268	   one or more Capture Scenes, with the structure and contents of
1269	   these Capture Scenes being sent from the Provider to the Consumer
1270	   in the Advertisement.

1272	   A Capture Scene is a structure representing a spatial region
1273	   containing one or more Capture Devices, each capturing media
1274	   representing a portion of the region.  A Capture Scene includes one
1275	   or more Capture Scene Views (CSV), with each CSV including one or
1276	   more Media Captures of the same media type.  There can also be
1277	   Media Captures that are not included in a Capture Scene View. A
1278	   Capture Scene represents, for example, the video image of a group
1279	   of people seated next to each other, along with the sound of their
1280	   voices, which could be represented by some number of VCs and ACs in
1281	   the Capture Scene Views.  An MCU can also describe in Capture
1282	   Scenes what it constructs from media Streams it receives.

1284	   A Provider MAY advertise one or more Capture Scenes.  What
1285	   constitutes an entire Capture Scene is up to the Provider.  A
1286	   simple Provider might typically use one Capture Scene for
1287	   participant media (live video from the room cameras) and another
1288	   Capture Scene for a computer generated presentation.  In more
1289	   complex systems, the use of additional Capture Scenes is also
1290	   sensible.  For example, a classroom may advertise two Capture
1291	   Scenes involving live video, one including only the camera
1292	   capturing the instructor (and associated audio), the other
1293	   including camera(s) capturing students (and associated audio).

1295	   A Capture Scene MAY (and typically will) include more than one type
1296	   of media.  For example, a Capture Scene can include several Capture
1297	   Scene Views for Video Captures, and several Capture Scene Views for
1298	   Audio Captures.  A particular Capture MAY be included in more than
1299	   one Capture Scene View.

1301	   A Provider MAY express spatial relationships between Captures that
1302	   are included in the same Capture Scene.  However, there is no
1303	   spatial relationship between Media Captures from different Capture
1304	   Scenes.  In other words, Capture Scenes each use their own spatial
1305	   measurement system as outlined above in section 6.

1307	   A Provider arranges Captures in a Capture Scene to help the
1308	   Consumer choose which captures it wants to render.  The Capture
1309	   Scene Views in a Capture Scene are different alternatives the
1310	   Provider is suggesting for representing the Capture Scene.  Each
1311	   Capture Scene View is given an advertisement unique identity.  The
1312	   order of Capture Scene Views within a Capture Scene has no
1313	   significance.  The Media Consumer can choose to receive all Media
1314	   Captures from one Capture Scene View for each media type (e.g.
1315	   audio and video), or it can pick and choose Media Captures
1316	   regardless of how the Provider arranges them in Capture Scene
1317	   Views.  Different Capture Scene Views of the same media type are
1318	   not necessarily mutually exclusive alternatives.  Also note that
1319	   the presence of multiple Capture Scene Views (with potentially
1320	   multiple encoding options in each view) in a given Capture Scene
1321	   does not necessarily imply that a Provider is able to serve all the
1322	   associated media simultaneously (although the construction of such
1323	   an over-rich Capture Scene is probably not sensible in many cases).
1324	   What a Provider can send simultaneously is determined through the
1325	   Simultaneous Transmission Set mechanism, described in section 8.

1327	   Captures within the same Capture Scene View MUST be of the same
1328	   media type - it is not possible to mix audio and video captures in
1329	   the same Capture Scene View, for instance.  The Provider MUST be
1330	   capable of encoding and sending all Captures (that have an encoding
1331	   group) in a single Capture Scene View simultaneously.  The order of
1332	   Captures within a Capture Scene View has no significance.  A
1333	   Consumer can decide to receive all the Captures in a single Capture
1334	   Scene View, but a Consumer could also decide to receive just a
1335	   subset of those captures.  A Consumer can also decide to receive
1336	   Captures from different Capture Scene Views, all subject to the
1337	   constraints set by Simultaneous Transmission Sets, as discussed in
1338	   section 8.

1340	   When a Provider advertises a Capture Scene with multiple CSVs, it
1341	   is essentially signaling that there are multiple representations of
1342	   the same Capture Scene available.  In some cases, these multiple
1343	   views would be used simultaneously (for instance a "video view" and
1344	   an "audio view").  In some cases the views would conceptually be
1345	   alternatives (for instance a view consisting of three Video
1346	   Captures covering the whole room versus a view consisting of just a
1347	   single Video Capture covering only the center of a room).  In this
1348	   latter example, one sensible choice for a Consumer would be to
1349	   indicate (through its Configure and possibly through an additional
1350	   offer/answer exchange) the Captures of that Capture Scene View that
1351	   most closely matched the Consumer's number of display devices or
1352	   screen layout.

1354	   The following is an example of 4 potential Capture Scene Views for
1355	   an endpoint-style Provider:

1357	   1.  (VC0, VC1, VC2) - left, center and right camera Video Captures

1359	   2.  (MCC3) - Video Capture associated with loudest room segment

1361	   3.  (VC4) - Video Capture zoomed out view of all people in the room

1363	   4.  (AC0) - main audio

1365	   The first view in this Capture Scene example is a list of Video
1366	   Captures which have a spatial relationship to each other.
1367	   Determination of the order of these captures (VC0, VC1 and VC2) for
1368	   rendering purposes is accomplished through use of their Area of
1369	   Capture attributes.  The second view (MCC3) and the third view
1370	   (VC4) are alternative representations of the same room's video,
1371	   which might be better suited to some Consumers' rendering
1372	   capabilities.  The inclusion of the Audio Capture in the same
1373	   Capture Scene indicates that AC0 is associated with all of those
1374	   Video Captures, meaning it comes from the same spatial region.
1375	   Therefore, if audio were to be rendered at all, this audio would be
1376	   the correct choice irrespective of which Video Captures were
1377	   chosen.

1379	7.3.1. Capture Scene attributes

1381	   Capture Scene Attributes can be applied to Capture Scenes as well
1382	   as to individual media captures.  Attributes specified at this
1383	   level apply to all constituent Captures.  Capture Scene attributes
1384	   include

1386	     . Human-readable description of the Capture Scene, which could
1387	        be in multiple languages;
1388	     . xCard scene information
1389	     . Scale information (millimeters, unknown, no scale), as
1390	        described in Section 6.

1392	7.3.1.1. Scene Information

1394	   The Scene information attribute provides information regarding the
1395	   Capture Scene rather than individual participants. The Provider
1396	   may gather the information automatically or manually from a
1397	   variety of sources. The scene information attribute allows a
1398	   Provider to indicate information such as: organizational or
1399	   geographic information allowing a Consumer to determine which
1400	   Capture Scenes are of interest in order to then perform Capture
1401	   selection. It also allows a Consumer to render information
1402	   regarding the Scene or to use it for further processing.

1404	   As per 7.1.1.10. the xCard format is used to convey this
1405	   information and the Provider may supply a minimal set of
1406	   information or a larger set of information.

1408	   In order to keep CLUE messages compact the Provider SHOULD use a
1409	   URI to point to any LOGO, PHOTO or SOUND contained in the xCARD
1410	   rather than transmitting the LOGO, PHOTO or SOUND data in a CLUE
1411	   message.

1413	7.3.2. Capture Scene View attributes

1415	   A Capture Scene can include one or more Capture Scene Views in
1416	   addition to the Capture Scene wide attributes described above.
1417	   Capture Scene View attributes apply to the Capture Scene View as a
1418	   whole, i.e. to all Captures that are part of the Capture Scene
1419	   View.

1421	   Capture Scene View attributes include:

1423	     . Human-readable description (which could be in multiple
1424	        languages) of the Capture Scene View

1426	7.4. Global View List

1428	   An Advertisement can include an optional Global View list.  Each
1429	   item in this list is a Global View.  The Provider can include
1430	   multiple Global Views, to allow a Consumer to choose sets of
1431	   captures appropriate to its capabilities or application.  The
1432	   choice of how to make these suggestions in the Global View list
1433	   for what represents all the scenes for which the Provider can send
1434	   media is up to the Provider.  This is very similar to how each CSV
1435	   represents a particular scene.

1437	   As an example, suppose an advertisement has three scenes, and each
1438	   scene has three CSVs, ranging from one to three video captures in
1439	   each CSV.  The Provider is advertising a total of nine video
1440	   Captures across three scenes.  The Provider can use the Global
1441	   View list to suggest alternatives for Consumers that can't receive
1442	   all nine video Captures as separate media streams.  For
1443	   accommodating a Consumer that wants to receive three video
1444	   Captures, a Provider might suggest a Global View containing just a
1445	   single CSV with three Captures and nothing from the other two
1446	   scenes.  Or a Provider might suggest a Global View containing
1447	   three different CSVs, one from each scene, with a single video
1448	   Capture in each.

1450	   Some additional rules:

1452	     . The ordering of Global Views in the Global View list is
1453	        insignificant.
1454	     . The ordering of CSVs within each Global View is
1455	        insignificant.
1456	     . A particular CSV may be used in multiple Global Views.
1457	     . The Provider must be capable of encoding and sending all
1458	        Captures within the CSVs of a given Global View
1459	        simultaneously.

1461	   The following figure shows an example of the structure of Global
1462	   Views in a Global View List.

1464	      ........................................................
1465	      . Advertisement                                        .
1466	      .                                                      .
1467	      . +--------------+         +-------------------------+ .
1468	      . |Scene 1       |         |Global View List         | .
1469	      . |              |         |                         | .
1470	      . | CSV1 (v)<----------------- Global View (CSV 1)   | .
1471	      . |         <-------.      |                         | .
1472	      . |              |  *--------- Global View (CSV 1,5) | .
1473	      . | CSV2 (v)     |  |      |                         | .
1474	      . |              |  |      |                         | .
1475	      . | CSV3 (v)<---------*------- Global View (CSV 3,5) | .
1476	      . |              |  | |    |                         | .
1477	      . | CSV4 (a)<----------------- Global View (CSV 4)   | .
1478	      . |         <-----------.  |                         | .
1479	      . +--------------+  | | *----- Global View (CSV 4,6) | .
1480	      .                   | | |  |                         | .
1481	      . +--------------+  | | |  +-------------------------+ .
1482	      . |Scene 2       |  | | |                              .
1483	      . |              |  | | |                              .
1484	      . | CSV5 (v)<-------' | |                              .
1485	      . |         <---------' |                              .
1486	      . |              |      |        (v) = video           .
1487	      . | CSV6 (a)<-----------'        (a) = audio           .
1488	      . |              |                                     .
1489	      . +--------------+                                     .
1490	      `......................................................'

1492	                 Figure 3:   Global View List Structure

1494	8. Simultaneous Transmission Set Constraints

1496	   In many practical cases, a Provider has constraints or limitations
1497	   on its ability to send Captures simultaneously.  One type of
1498	   limitation is caused by the physical limitations of capture
1499	   mechanisms; these constraints are represented by a Simultaneous
1500	   Transmission Set.  The second type of limitation reflects the
1501	   encoding resources available, such as bandwidth or video encoding
1502	   throughput (macroblocks/second).  This type of constraint is
1503	   captured by Individual Encodings and Encoding Groups, discussed
1504	   below.

1506	   Some Endpoints or MCUs can send multiple Captures simultaneously;
1507	   however sometimes there are constraints that limit which Captures
1508	   can be sent simultaneously with other Captures.  A device may not
1509	   be able to be used in different ways at the same time.  Provider
1510	   Advertisements are made so that the Consumer can choose one of
1511	   several possible mutually exclusive usages of the device.  This
1512	   type of constraint is expressed in a Simultaneous Transmission Set,
1513	   which lists all the Captures of a particular media type (e.g.
1514	   audio, video, text) that can be sent at the same time.  There are
1515	   different Simultaneous Transmission Sets for each media type in the
1516	   Advertisement.  This is easier to show in an example.

1518	   Consider the example of a room system where there are three cameras
1519	   each of which can send a separate Capture covering two persons
1520	   each- VC0, VC1, VC2.  The middle camera can also zoom out (using an
1521	   optical zoom lens) and show all six persons, VC3.  But the middle
1522	   camera cannot be used in both modes at the same time - it has to
1523	   either show the space where two participants sit or the whole six
1524	   seats, but not both at the same time.  As a result, VC1 and VC3
1525	   cannot be sent simultaneously.

1527	   Simultaneous Transmission Sets are expressed as sets of the Media
1528	   Captures that the Provider could transmit at the same time (though,
1529	   in some cases, it is not intuitive to do so).  If a Multiple
1530	   Content Capture is included in a Simultaneous Transmission Set it
1531	   indicates that the Capture Encoding associated with it could be
1532	   transmitted as the same time as the other Captures within the
1533	   Simultaneous Transmission Set. It does not imply that the Single
1534	   Media Captures contained in the Multiple Content Capture could all
1535	   be transmitted at the same time.

1537	   In this example the two Simultaneous Transmission Sets are shown in
1538	   Table 5.  If a Provider advertises one or more mutually exclusive
1539	   Simultaneous Transmission Sets, then for each media type the
1540	   Consumer MUST ensure that it chooses Media Captures that lie wholly
1541	   within one of those Simultaneous Transmission Sets.

1543	                           +-------------------+
1544	                           | Simultaneous Sets |
1545	                           +-------------------+
1546	                           | {VC0, VC1, VC2}   |
1547	                           | {VC0, VC3, VC2}   |
1548	                           +-------------------+

1550	                Table 5: Two Simultaneous Transmission Sets

1552	   A Provider OPTIONALLY can include the Simultaneous Transmission
1553	   Sets in its Advertisement.  These constraints apply across all the
1554	   Capture Scenes in the Advertisement.  It is a syntax conformance
1555	   requirement that the Simultaneous Transmission Sets MUST allow all
1556	   the media Captures in any particular Capture Scene View to be used
1557	   simultaneously.  Similarly, the Simultaneous Transmission Sets MUST
1558	   reflect the simultaneity expressed by any Global View.

1560	   For shorthand convenience, a Provider MAY describe a Simultaneous
1561	   Transmission Set in terms of Capture Scene Views and Capture
1562	   Scenes.  If a Capture Scene View is included in a Simultaneous
1563	   Transmission Set, then all Media Captures in the Capture Scene View
1564	   are included in the Simultaneous Transmission Set.  If a Capture
1565	   Scene is included in a Simultaneous Transmission Set, then all its
1566	   Capture Scene Views (of the corresponding media type) are included
1567	   in the Simultaneous Transmission Set.  The end result reduces to a
1568	   set of Media Captures, of a particular media type, in either case.

1570	   If an Advertisement does not include Simultaneous Transmission
1571	   Sets, then the Provider MUST be able to simultaneously provide all
1572	   the Captures from any one CSV of each media type from each Capture
1573	   Scene.  Likewise, if there are no Simultaneous Transmission Sets
1574	   and there is a Global View list, then the Provider MUST be able to
1575	   simultaneously provide all the Captures from any particular Global
1576	   View (of each media type) from the Global View list.

1578	   If an Advertisement includes multiple Capture Scene Views in a
1579	   Capture Scene then the Consumer MAY choose one Capture Scene View
1580	   for each media type, or MAY choose individual Captures based on the
1581	   Simultaneous Transmission Sets.

1583	9. Encodings

1585	   Individual encodings and encoding groups are CLUE's mechanisms
1586	   allowing a Provider to signal its limitations for sending Captures,
1587	   or combinations of Captures, to a Consumer.  Consumers can map the
1588	   Captures they want to receive onto the Encodings, with the encoding
1589	   parameters they want.  As for the relationship between the CLUE-
1590	   specified mechanisms based on Encodings and the SIP offer/answer
1591	   exchange, please refer to section 5.

1593	9.1. Individual Encodings

1595	   An Individual Encoding represents a way to encode a Media Capture
1596	   as a Capture Encoding, to be sent as an encoded media stream from
1597	   the Provider to the Consumer.  An Individual Encoding has a set of
1598	   parameters characterizing how the media is encoded.

1600	   Different media types have different parameters, and different
1601	   encoding algorithms may have different parameters.  An Individual
1602	   Encoding can be assigned to at most one Capture Encoding at any
1603	   given time.

1605	   Individual Encoding parameters are represented in SDP [RFC4566],
1606	   not in CLUE messages.  For example, for a video encoding using
1607	   H.26x compression technologies, this can include parameters such
1608	   as:

1610	     . Maximum bandwidth;
1611	     . Maximum picture size in pixels;
1612	     . Maximum number of pixels to be processed per second;

1614	   The bandwidth parameter is the only one that specifically relates
1615	   to a CLUE Advertisement, as it can be further constrained by the
1616	   maximum group bandwidth in an Encoding Group.

1618	9.2. Encoding Group

1620	   An Encoding Group includes a set of one or more Individual
1621	   Encodings, and parameters that apply to the group as a whole.  By
1622	   grouping multiple individual Encodings together, an Encoding Group
1623	   describes additional constraints on bandwidth for the group. A
1624	   single Encoding Group MAY refer to Encodings for different media
1625	   types.

1627	   The Encoding Group data structure contains:

1629	     . Maximum bitrate for all encodings in the group combined;
1630	     . A list of identifiers for the Individual Encodings belonging
1631	        to the group.

1633	   When the Individual Encodings in a group are instantiated into
1634	   Capture Encodings, each Capture Encoding has a bitrate that MUST be
1635	   less than or equal to the max bitrate for the particular Individual
1636	   Encoding.  The "maximum bitrate for all encodings in the group"
1637	   parameter gives the additional restriction that the sum of all the
1638	   individual Capture Encoding bitrates MUST be less than or equal to
1639	   this group value.

1641	   The following diagram illustrates one example of the structure of a
1642	   media Provider's Encoding Groups and their contents.

1644	   ,-------------------------------------------------.
1645	   |             Media Provider                      |
1646	   |                                                 |
1647	   |  ,--------------------------------------.       |
1648	   |  | ,--------------------------------------.     |
1649	   |  | | ,--------------------------------------.   |
1650	   |  | | |          Encoding Group              |   |
1651	   |  | | | ,-----------.                        |   |
1652	   |  | | | |           | ,---------.            |   |
1653	   |  | | | |           | |         | ,---------.|   |
1654	   |  | | | | Encoding1 | |Encoding2| |Encoding3||   |
1655	   |  `.| | |           | |         | `---------'|   |
1656	   |    `.| `-----------' `---------'            |   |
1657	   |      `--------------------------------------'   |
1658	   `-------------------------------------------------'

1660	                  Figure 4:   Encoding Group Structure

1662	   A Provider advertises one or more Encoding Groups.  Each Encoding
1663	   Group includes one or more Individual Encodings.  Each Individual
1664	   Encoding can represent a different way of encoding media.  For
1665	   example one Individual Encoding may be 1080p60 video, another could
1666	   be 720p30, with a third being CIF, all in, for example, H.264
1667	   format.
1668	   While a typical three codec/display system might have one Encoding
1669	   Group per "codec box" (physical codec, connected to one camera and
1670	   one screen), there are many possibilities for the number of
1671	   Encoding Groups a Provider may be able to offer and for the
1672	   encoding values in each Encoding Group.

1674	   There is no requirement for all Encodings within an Encoding Group
1675	   to be instantiated at the same time.

1677	9.3. Associating Captures with Encoding Groups

1679	   Each Media Capture, including MCCs, MAY be associated with one
1680	   Encoding Group. To be eligible for configuration, a Media Capture
1681	   MUST be associated with one Encoding Group, which is used to
1682	   instantiate that Capture into a Capture Encoding. When an MCC is
1683	   configured all the Media Captures referenced by the MCC will appear
1684	   in the Capture Encoding according to the attributes of the chosen
1685	   encoding of the MCC. This allows an Advertiser to specify encoding
1686	   attributes associated with the Media Captures without the need to
1687	   provide an individual Capture Encoding for each of the inputs.

1689	   If an Encoding Group is assigned to a Media Capture referenced by
1690	   the MCC it indicates that this Capture may also have an individual
1691	   Capture Encoding.

1693	   For example:

1695	        +--------------------+------------------------------------+
1696	        | Capture Scene #1   |                                    |
1697	        +--------------------+------------------------------------+
1698	        | VC1                | EncodeGroupID=1                    |
1699	        | VC2                |                                    |
1700	        | MCC1(VC1,VC2)      | EncodeGroupID=2                    |
1701	        | CSV(VC1)           |                                    |
1702	        | CSV(MCC1)          |                                    |
1703	        +--------------------+------------------------------------+

1705	     Table 6: Example usage of Encoding with MCC and source Captures

1707	   This would indicate that VC1 may be sent as its own Capture
1708	   Encoding from EncodeGroupID=1 or that it may be sent as part of a
1709	   Capture Encoding from EncodeGroupID=2 along with VC2.

1711	   More than one Capture MAY use the same Encoding Group.

1713	   The maximum number of Capture Encodings that can result from a
1714	   particular Encoding Group constraint is equal to the number of
1715	   individual Encodings in the group.  The actual number of Capture
1716	   Encodings used at any time MAY be less than this maximum.  Any of
1717	   the Captures that use a particular Encoding Group can be encoded
1718	   according to any of the Individual Encodings in the group.

1720	   It is a protocol conformance requirement that the Encoding Groups
1721	   MUST allow all the Captures in a particular Capture Scene View to
1722	   be used simultaneously.

1724	10. Consumer's Choice of Streams to Receive from the Provider

1726	   After receiving the Provider's Advertisement message (that includes
1727	   media captures and associated constraints), the Consumer composes
1728	   its reply to the Provider in the form of a Configure message.  The
1729	   Consumer is free to use the information in the Advertisement as it
1730	   chooses, but there are a few obviously sensible design choices,
1731	   which are outlined below.

1733	   If multiple Providers connect to the same Consumer (i.e. in an MCU-
1734	   less multiparty call), it is the responsibility of the Consumer to
1735	   compose Configures for each Provider that both fulfill each
1736	   Provider's constraints as expressed in the Advertisement, as well
1737	   as its own capabilities.

1739	   In an MCU-based multiparty call, the MCU can logically terminate
1740	   the Advertisement/Configure negotiation in that it can hide the
1741	   characteristics of the receiving endpoint and rely on its own
1742	   capabilities (transcoding/transrating/...) to create Media Streams
1743	   that can be decoded at the Endpoint Consumers.  The timing of an
1744	   MCU's sending of Advertisements (for its outgoing ports) and
1745	   Configures (for its incoming ports, in response to Advertisements
1746	   received there) is up to the MCU and implementation dependent.

1748	   As a general outline, a Consumer can choose, based on the
1749	   Advertisement it has received, which Captures it wishes to receive,
1750	   and which Individual Encodings it wants the Provider to use to
1751	   encode the Captures.

1753	   On receipt of an Advertisement with an MCC the Consumer treats the
1754	   MCC as per other non-MCC Captures with the following differences:

1756	   - The Consumer would understand that the MCC is a Capture that
1757	   includes the referenced individual Captures (or any Captures, if
1758	   none are referenced) and that these individual Captures are
1759	   delivered as part of the MCC's Capture Encoding.

1761	   - The Consumer may utilise any of the attributes associated with
1762	   the referenced individual Captures and any Capture Scene attributes
1763	   from where the individual Captures were defined to choose Captures
1764	   and for rendering decisions.

1766	   - If the MCC attribute Allow Subset Choice is true, then the
1767	   Consumer may or may not choose to receive all the indicated
1768	   Captures.  It can choose to receive a sub-set of Captures indicated
1769	   by the MCC.

1771	   For example if the Consumer receives:

1773	           MCC1(VC1,VC2,VC3){attributes}

1775	   A Consumer could choose all the Captures within a MCC however if
1776	   the Consumer determines that it doesn't want VC3 it can return
1777	   MCC1(VC1,VC2).  If it wants all the individual Captures then it
1778	   returns only the MCC identity (i.e. MCC1).  If the MCC in the
1779	   advertisement does not reference any individual captures, or the
1780	   Allow Subset Choice attribute is false, then the Consumer cannot
1781	   choose what is included in the MCC, it is up to the Provider to
1782	   decide.

1784	   A Configure Message includes a list of Capture Encodings.  These
1785	   are the Capture Encodings the Consumer wishes to receive from the
1786	   Provider.  Each Capture Encoding refers to one Media Capture and
1787	   one Individual Encoding.

1789	   For each Capture the Consumer wants to receive, it configures one
1790	   of the Encodings in that Capture's Encoding Group.  The Consumer
1791	   does this by telling the Provider, in its Configure Message, which
1792	   Encoding to use for each chosen Capture.  Upon receipt of this
1793	   Configure from the Consumer, common knowledge is established
1794	   between Provider and Consumer regarding sensible choices for the
1795	   media streams.  The setup of the actual media channels, at least in
1796	   the simplest case, is left to a following offer/answer exchange.
1797	   Optimized implementations MAY speed up the reaction to the
1798	   offer/answer exchange by reserving the resources at the time of
1799	   finalization of the CLUE handshake.

1801	   CLUE advertisements and configure messages don't necessarily
1802	   require a new SDP offer/answer for every CLUE message
1803	   exchange.  But the resulting encodings sent via RTP must conform to
1804	   the most recent SDP offer/answer result.

1806	   In order to meaningfully create and send an initial Configure, the
1807	   Consumer needs to have received at least one Advertisement, and an
1808	   SDP offer defining the Individual Encodings, from the Provider.

1810	   In addition, the Consumer can send a Configure at any time during
1811	   the call.  The Configure MUST be valid according to the most
1812	   recently received Advertisement.  The Consumer can send a Configure
1813	   either in response to a new Advertisement from the Provider or on
1814	   its own, for example because of a local change in conditions
1815	   (people leaving the room, connectivity changes, multipoint related
1816	   considerations).

1818	   When choosing which Media Streams to receive from the Provider, and
1819	   the encoding characteristics of those Media Streams, the Consumer
1820	   advantageously takes several things into account: its local
1821	   preference, simultaneity restrictions, and encoding limits.

1823	10.1. Local preference

1825	   A variety of local factors influence the Consumer's choice of
1826	   Media Streams to be received from the Provider:

1828	   o  if the Consumer is an Endpoint, it is likely that it would
1829	      choose, where possible, to receive video and audio Captures that
1830	      match the number of display devices and audio system it has

1832	   o  if the Consumer is an MCU, it MAY choose to receive loudest
1833	      speaker streams (in order to perform its own media composition)
1834	      and avoid pre-composed video Captures

1836	   o  user choice (for instance, selection of a new layout) MAY result
1837	      in a different set of Captures, or different encoding
1838	      characteristics, being required by the Consumer

1840	10.2. Physical simultaneity restrictions

1842	   Often there are physical simultaneity constraints of the Provider
1843	   that affect the Provider's ability to simultaneously send all of
1844	   the captures the Consumer would wish to receive.  For instance, an
1845	   MCU, when connected to a multi-camera room system, might prefer to
1846	   receive both individual video streams of the people present in the
1847	   room and an overall view of the room from a single camera.  Some
1848	   Endpoint systems might be able to provide both of these sets of
1849	   streams simultaneously, whereas others might not (if the overall
1850	   room view were produced by changing the optical zoom level on the
1851	   center camera, for instance).

1853	10.3. Encoding and encoding group limits

1855	   Each of the Provider's encoding groups has limits on bandwidth,
1856	   and the constituent potential encodings have limits on the
1857	   bandwidth, computational complexity, video frame rate, and
1858	   resolution that can be provided.  When choosing the Captures to be
1859	   received from a Provider, a Consumer device MUST ensure that the
1860	   encoding characteristics requested for each individual Capture
1861	   fits within the capability of the encoding it is being configured
1862	   to use, as well as ensuring that the combined encoding
1863	   characteristics for Captures fit within the capabilities of their
1864	   associated encoding groups.  In some cases, this could cause an
1865	   otherwise "preferred" choice of capture encodings to be passed
1866	   over in favor of different Capture Encodings--for instance, if a
1867	   set of three Captures could only be provided at a low resolution
1868	   then a three screen device could switch to favoring a single,
1869	   higher quality, Capture Encoding.

1871	11. Extensibility

1873	   One important characteristics of the Framework is its
1874	   extensibility.  The standard for interoperability and handling
1875	   multiple streams must be future-proof. The framework itself is
1876	   inherently extensible through expanding the data model types.  For
1877	   example:

1879	   o  Adding more types of media, such as telemetry, can done by
1880	      defining additional types of Captures in addition to audio and
1881	      video.

1883	   o  Adding new functionalities, such as 3-D video Captures, say, may
1884	      require additional attributes describing the Captures.

1886	   The infrastructure is designed to be extended rather than
1887	   requiring new infrastructure elements.  Extension comes through
1888	   adding to defined types.

1890	12. Examples - Using the Framework (Informative)

1892	   This section gives some examples, first from the point of view of
1893	   the Provider, then the Consumer, then some multipoint scenarios

1895	12.1. Provider Behavior

1897	   This section shows some examples in more detail of how a Provider
1898	   can use the framework to represent a typical case for telepresence
1899	   rooms.  First an endpoint is illustrated, then an MCU case is
1900	   shown.

1902	12.1.1. Three screen Endpoint Provider

1904	   Consider an Endpoint with the following description:

1906	   3 cameras, 3 displays, a 6 person table

1908	   o  Each camera can provide one Capture for each 1/3 section of the
1909	      table

1911	   o  A single Capture representing the active speaker can be provided
1912	      (voice activity based camera selection to a given encoder input
1913	      port implemented locally in the Endpoint)

1915	   o  A single Capture representing the active speaker with the other
1916	      2 Captures shown picture in picture within the stream can be
1917	      provided (again, implemented inside the endpoint)

1919	   o  A Capture showing a zoomed out view of all 6 seats in the room
1920	      can be provided

1922	   The video and audio Captures for this Endpoint can be described as
1923	   follows.

1925	   Video Captures:

1927	   o  VC0- (the left camera stream), encoding group=EG0, view=table

1929	   o  VC1- (the center camera stream), encoding group=EG1, view=table

1931	   o  VC2- (the right camera stream), encoding group=EG2, view=table

1933	   o  MCC3- (the loudest panel stream), encoding group=EG1,
1934	      view=table, MaxCaptures=1, policy=SoundLevel

1936	   o  MCC4- (the loudest panel stream with PiPs), encoding group=EG1,
1937	      view=room, MaxCaptures=3, policy=SoundLevel

1939	   o  VC5- (the zoomed out view of all people in the room), encoding
1940	      group=EG1, view=room

1942	   o  VC6- (presentation stream), encoding group=EG1, presentation

1944	   The following diagram is a top view of the room with 3 cameras, 3
1945	   displays, and 6 seats.  Each camera captures 2 people.  The six
1946	   seats are not all in a straight line.

1948	      ,-. d
1949	     (   )`--.__        +---+
1950	      `-' /     `--.__  |   |
1951	    ,-.  |            `-.._ |_-+Camera 2 (VC2)
1952	   (   ).'     <--(AC1)-+-''`+-+
1953	    `-' |_...---''      |   |
1954	    ,-.c+-..__          +---+
1955	   (   )|     ``--..__  |   |
1956	    `-' |             ``+-..|_-+Camera 1 (VC1)
1957	    ,-. |      <--(AC2)..--'|+-+                          ^
1958	   (   )|     __..--'   |   |                             |
1959	    `-'b|..--'          +---+                             |X
1960	    ,-. |``---..___     |   |                             |
1961	   (   )\          ```--..._|_-+Camera 0 (VC0)            |
1962	    `-'  \     <--(AC0) ..-''`-+                          |
1963	     ,-. \      __.--'' |   |                  <----------+
1964	    (   ) |..-''        +---+                     Y
1965	     `-' a                          (0,0,0) origin is under Camera 1

1967	                    Figure 5:   Room Layout Top View

1969	   The two points labeled b and c are intended to be at the midpoint
1970	   between the seating positions, and where the fields of view of the
1971	   cameras intersect.

1973	   The plane of interest for VC0 is a vertical plane that intersects
1974	   points 'a' and 'b'.

1976	   The plane of interest for VC1 intersects points 'b' and 'c'. The
1977	   plane of interest for VC2 intersects points 'c' and 'd'.

1979	   This example uses an area scale of millimeters.

1981	   Areas of capture:

1983	       bottom left    bottom right  top left         top right
1984	   VC0 (-2011,2850,0) (-673,3000,0) (-2011,2850,757) (-673,3000,757)
1985	   VC1 ( -673,3000,0) ( 673,3000,0) ( -673,3000,757) ( 673,3000,757)
1986	   VC2 (  673,3000,0) (2011,2850,0) (  673,3000,757) (2011,3000,757)
1987	   MCC3(-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1988	   MCC4(-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1989	   VC5 (-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1990	   VC6 none

1992	   Points of capture:
1993	   VC0 (-1678,0,800)
1994	   VC1 (0,0,800)
1995	   VC2 (1678,0,800)
1996	   MCC3 none
1997	   MCC4 none
1998	   VC5 (0,0,800)
1999	   VC6 none

2001	   In this example, the right edge of the VC0 area lines up with the
2002	   left edge of the VC1 area.  It doesn't have to be this way.  There
2003	   could be a gap or an overlap.  One additional thing to note for
2004	   this example is the distance from a to b is equal to the distance
2005	   from b to c and the distance from c to d.  All these distances are
2006	   1346 mm. This is the planar width of each area of capture for VC0,
2007	   VC1, and VC2.

2009	   Note the text in parentheses (e.g. "the left camera stream") is
2010	   not explicitly part of the model, it is just explanatory text for
2011	   this example, and is not included in the model with the media
2012	   captures and attributes.  Also, MCC4 doesn't say anything about
2013	   how a capture is composed, so the media consumer can't tell based
2014	   on this capture that MCC4 is composed of a "loudest panel with
2015	   PiPs".

2017	   Audio Captures:

2019	   Three ceiling microphones are located between the cameras and the
2020	   table, at the same height as the cameras.  The microphones point
2021	   down at an angle toward the seating positions.

2023	   o  AC0 (left), encoding group=EG3

2025	   o  AC1 (right), encoding group=EG3

2027	   o  AC2 (center) encoding group=EG3

2029	   o  AC3 being a simple pre-mixed audio stream from the room (mono),
2030	      encoding group=EG3

2032	   o  AC4 audio stream associated with the presentation video (mono)
2033	      encoding group=EG3, presentation

2035	       Point of capture:      Point on Line of Capture:

2037	   AC0 (-1342,2000,800)       (-1342,2925,379)
2038	   AC1 ( 1342,2000,800)       ( 1342,2925,379)
2039	   AC2 (    0,2000,800)       (    0,3000,379)
2040	   AC3 (    0,2000,800)       (    0,3000,379)
2041	   AC4 none

2043	   The physical simultaneity information is:

2045	      Simultaneous transmission set #1 {VC0, VC1, VC2, MCC3, MCC4,
2046	   VC6}

2048	      Simultaneous transmission set #2 {VC0, VC2, VC5, VC6}

2050	   This constraint indicates it is not possible to use all the VCs at
2051	   the same time.  VC5 cannot be used at the same time as VC1 or MCC3
2052	   or MCC4.  Also, using every member in the set simultaneously may
2053	   not make sense - for example MCC3(loudest) and MCC4 (loudest with
2054	   PIP).  In addition, there are encoding constraints that make
2055	   choosing all of the VCs in a set impossible.  VC1, MCC3, MCC4,
2056	   VC5, VC6 all use EG1 and EG1 has only 3 ENCs.  This constraint
2057	   shows up in the encoding groups, not in the simultaneous
2058	   transmission sets.

2060	   In this example there are no restrictions on which Audio Captures
2061	   can be sent simultaneously.

2063	   Encoding Groups:

2065	   This example has three encoding groups associated with the video
2066	   captures.  Each group can have 3 encodings, but with each
2067	   potential encoding having a progressively lower specification.  In
2068	   this example, 1080p60 transmission is possible (as ENC0 has a
2069	   maxPps value compatible with that).  Significantly, as up to 3
2070	   encodings are available per group, it is possible to transmit some
2071	   video Captures simultaneously that are not in the same view in the
2072	   Capture Scene.  For example VC1 and MCC3 at the same time.  The
2073	   information below about Encodings is a summary of what would be
2074	   conveyed in SDP, not directly in the CLUE Advertisement.

2076	   encodeGroupID=EG0, maxGroupBandwidth=6000000
2077	       encodeID=ENC0, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2078	                      maxPps=124416000, maxBandwidth=4000000
2079	       encodeID=ENC1, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2080	                      maxPps=27648000, maxBandwidth=4000000
2081	       encodeID=ENC2, maxWidth=960, maxHeight=544, maxFrameRate=30,
2082	                      maxPps=15552000, maxBandwidth=4000000
2083	   encodeGroupID=EG1  maxGroupBandwidth=6000000
2084	       encodeID=ENC3, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2085	                      maxPps=124416000, maxBandwidth=4000000
2086	       encodeID=ENC4, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2087	                      maxPps=27648000, maxBandwidth=4000000
2088	       encodeID=ENC5, maxWidth=960, maxHeight=544, maxFrameRate=30,
2089	                      maxPps=15552000, maxBandwidth=4000000
2090	   encodeGroupID=EG2  maxGroupBandwidth=6000000
2091	       encodeID=ENC6, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2092	                      maxPps=124416000, maxBandwidth=4000000
2093	       encodeID=ENC7, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2094	                      maxPps=27648000, maxBandwidth=4000000
2095	       encodeID=ENC8, maxWidth=960, maxHeight=544, maxFrameRate=30,
2096	                      maxPps=15552000, maxBandwidth=4000000

2098	              Figure 6:   Example Encoding Groups for Video

2100	   For audio, there are five potential encodings available, so all
2101	   five Audio Captures can be encoded at the same time.

2103	   encodeGroupID=EG3, maxGroupBandwidth=320000
2104	       encodeID=ENC9, maxBandwidth=64000
2105	       encodeID=ENC10, maxBandwidth=64000
2106	       encodeID=ENC11, maxBandwidth=64000
2107	       encodeID=ENC12, maxBandwidth=64000
2108	       encodeID=ENC13, maxBandwidth=64000

2110	              Figure 7:   Example Encoding Group for Audio

2112	   Capture Scenes:

2114	   The following table represents the Capture Scenes for this
2115	   Provider. Recall that a Capture Scene is composed of alternative
2116	   Capture Scene Views covering the same spatial region.  Capture
2117	   Scene #1 is for the main people captures, and Capture Scene #2 is
2118	   for presentation.

2120	   Each row in the table is a separate Capture Scene View

2122	                           +------------------+
2123	                           | Capture Scene #1 |
2124	                           +------------------+
2125	                           | VC0, VC1, VC2    |
2126	                           | MCC3             |
2127	                           | MCC4             |
2128	                           | VC5              |
2129	                           | AC0, AC1, AC2    |
2130	                           | AC3              |
2131	                           +------------------+

2133	                           +------------------+
2134	                           | Capture Scene #2 |
2135	                           +------------------+
2136	                           | VC6              |
2137	                           | AC4              |
2138	                           +------------------+

2140	                Table 7: Example Capture Scene Views

2142	   Different Capture Scenes are distinct from each other, and are
2143	   non-overlapping. A Consumer can choose a view from each Capture
2144	   Scene.  In this case the three Captures VC0, VC1, and VC2 are one
2145	   way of representing the video from the Endpoint.  These three
2146	   Captures should appear adjacent next to each other.
2147	   Alternatively, another way of representing the Capture Scene is
2148	   with the capture MCC3, which automatically shows the person who is
2149	   talking.  Similarly for the MCC4 and VC5 alternatives.

2151	   As in the video case, the different views of audio in Capture
2152	   Scene #1 represent the "same thing", in that one way to receive
2153	   the audio is with the 3 Audio Captures (AC0, AC1, AC2), and
2154	   another way is with the mixed AC3.  The Media Consumer can choose
2155	   an audio CSV it is capable of receiving.

2157	   The spatial ordering is understood by the Media Capture attributes
2158	   Area of Capture, Point of Capture and Point on Line of Capture.

2160	   A Media Consumer would likely want to choose a Capture Scene View
2161	   to receive based in part on how many streams it can simultaneously
2162	   receive.  A consumer that can receive three video streams would
2163	   probably prefer to receive the first view of Capture Scene #1
2164	   (VC0, VC1, VC2) and not receive the other views.  A consumer that
2165	   can receive only one video stream would probably choose one of the
2166	   other views.

2168	   If the consumer can receive a presentation stream too, it would
2169	   also choose to receive the only view from Capture Scene #2 (VC6).

2171	12.1.2. Encoding Group Example

2173	   This is an example of an Encoding Group to illustrate how it can
2174	   express dependencies between Encodings.  The information below
2175	   about Encodings is a summary of what would be conveyed in SDP, not
2176	   directly in the CLUE Advertisement.

2178	   encodeGroupID=EG0 maxGroupBandwidth=6000000
2179	       encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
2180	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2181	       encodeID=VIDENC1, maxWidth=1920, maxHeight=1088,
2182	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2183	       encodeID=AUDENC0, maxBandwidth=96000
2184	       encodeID=AUDENC1, maxBandwidth=96000
2185	       encodeID=AUDENC2, maxBandwidth=96000

2187	   Here, the Encoding Group is EG0.  Although the Encoding Group is
2188	   capable of transmitting up to 6Mbit/s, no individual video
2189	   Encoding can exceed 4Mbit/s.

2191	   This encoding group also allows up to 3 audio encodings, AUDENC<0-
2192	   2>. It is not required that audio and video encodings reside
2193	   within the same encoding group, but if so then the group's overall
2194	   maxBandwidth value is a limit on the sum of all audio and video
2195	   encodings configured by the consumer.  A system that does not wish
2196	   or need to combine bandwidth limitations in this way should
2197	   instead use separate encoding groups for audio and video in order
2198	   for the bandwidth limitations on audio and video to not interact.

2200	   Audio and video can be expressed in separate encoding groups, as
2201	   in this illustration.

2203	   encodeGroupID=EG0 maxGroupBandwidth=6000000
2204	       encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
2205	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2206	       encodeID=VIDENC1, maxWidth=1920, maxHeight=1088,
2207	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2208	   encodeGroupID=EG1 maxGroupBandwidth=500000
2209	       encodeID=AUDENC0, maxBandwidth=96000
2210	       encodeID=AUDENC1, maxBandwidth=96000
2211	       encodeID=AUDENC2, maxBandwidth=96000

2213	12.1.3. The MCU Case

2215	   This section shows how an MCU might express its Capture Scenes,
2216	   intending to offer different choices for consumers that can handle
2217	   different numbers of streams. Each MCC is for video. A single
2218	   Audio Capture is provided for all single and multi-screen
2219	   configurations that can be associated (e.g. lip-synced) with any
2220	   combination of Video Captures (the MCCs) at the consumer.

2222	        +-----------------------+---------------------------------+
2223	        | Capture Scene #1      |                                 |
2224	        +-----------------------|---------------------------------+
2225	        | MCC                   | for a single screen consumer    |
2226	        | MCC1, MCC2            | for a two screen consumer       |
2227	        | MCC3, MCC4, MCC5      | for a three screen consumer     |
2228	        | MCC6, MCC7, MCC8, MCC9| for a four screen consumer      |
2229	        | AC0                   | AC representing all participants|
2230	        | CSV(MCC0)             |                                 |
2231	        | CSV(MCC1,MCC2)        |                                 |
2232	        | CSV(MCC3,MCC4,MCC5)   |                                 |
2233	        | CSV(MCC6,MCC7,        |                                 |
2234	        |     MCC8,MCC9)        |                                 |
2235	        | CSV(AC0)              |                                 |
2236	        +-----------------------+---------------------------------+
2237	                Table 8: MCU main Capture Scenes

2239	   If / when a presentation stream becomes active within the
2240	   conference the MCU might re-advertise the available media as:

2242	        +------------------+--------------------------------------+
2243	        | Capture Scene #2 | note                                 |
2244	        +------------------+--------------------------------------+
2245	        | VC10             | video capture for presentation       |
2246	        | AC1              | presentation audio to accompany VC10 |
2247	        | CSV(VC10)        |                                      |
2248	        | CSV(AC1)         |                                      |
2249	        +------------------+--------------------------------------+

2251	                Table 9: MCU presentation Capture Scene

2253	12.2. Media Consumer Behavior

2255	   This section gives an example of how a Media Consumer might behave
2256	   when deciding how to request streams from the three screen
2257	   endpoint described in the previous section.

2259	   The receive side of a call needs to balance its requirements,
2260	   based on number of screens and speakers, its decoding capabilities
2261	   and available bandwidth, and the provider's capabilities in order
2262	   to optimally configure the provider's streams.  Typically it would
2263	   want to receive and decode media from each Capture Scene
2264	   advertised by the Provider.

2266	   A sane, basic, algorithm might be for the consumer to go through
2267	   each Capture Scene View in turn and find the collection of Video
2268	   Captures that best matches the number of screens it has (this
2269	   might include consideration of screens dedicated to presentation
2270	   video display rather than "people" video) and then decide between
2271	   alternative views in the video Capture Scenes based either on
2272	   hard-coded preferences or user choice.  Once this choice has been
2273	   made, the consumer would then decide how to configure the
2274	   provider's encoding groups in order to make best use of the
2275	   available network bandwidth and its own decoding capabilities.

2277	12.2.1. One screen Media Consumer

2279	   MCC3, MCC4 and VC5 are all different views by themselves, not
2280	   grouped together in a single view, so the receiving device should
2281	   choose between one of those.  The choice would come down to
2282	   whether to see the greatest number of participants simultaneously
2283	   at roughly equal precedence (VC5), a switched view of just the
2284	   loudest region (MCC3) or a switched view with PiPs (MCC4).  An
2285	   endpoint device with a small amount of knowledge of these
2286	   differences could offer a dynamic choice of these options, in-
2287	   call, to the user.

2289	12.2.2. Two screen Media Consumer configuring the example

2291	   Mixing systems with an even number of screens, "2n", and those
2292	   with "2n+1" cameras (and vice versa) is always likely to be the
2293	   problematic case.  In this instance, the behavior is likely to be
2294	   determined by whether a "2 screen" system is really a "2 decoder"
2295	   system, i.e., whether only one received stream can be displayed
2296	   per screen or whether more than 2 streams can be received and
2297	   spread across the available screen area.  To enumerate 3 possible
2298	   behaviors here for the 2 screen system when it learns that the far
2299	   end is "ideally" expressed via 3 capture streams:

2301	   1. Fall back to receiving just a single stream (MCC3, MCC4 or VC5
2302	      as per the 1 screen consumer case above) and either leave one
2303	      screen blank or use it for presentation if / when a
2304	      presentation becomes active.

2306	   2. Receive 3 streams (VC0, VC1 and VC2) and display across 2
2307	      screens (either with each capture being scaled to 2/3 of a
2308	      screen and the center capture being split across 2 screens) or,
2309	      as would be necessary if there were large bezels on the
2310	      screens, with each stream being scaled to 1/2 the screen width
2311	      and height and there being a 4th "blank" panel.  This 4th panel
2312	      could potentially be used for any presentation that became
2313	      active during the call.

2315	   3. Receive 3 streams, decode all 3, and use control information
2316	      indicating which was the most active to switch between showing
2317	      the left and center streams (one per screen) and the center and
2318	      right streams.

2320	   For an endpoint capable of all 3 methods of working described
2321	   above, again it might be appropriate to offer the user the choice
2322	   of display mode.

2324	12.2.3. Three screen Media Consumer configuring the example

2326	   This is the most straightforward case - the Media Consumer would
2327	   look to identify a set of streams to receive that best matched its
2328	   available screens and so the VC0 plus VC1 plus VC2 should match
2329	   optimally.  The spatial ordering would give sufficient information
2330	   for the correct Video Capture to be shown on the correct screen,
2331	   and the consumer would either need to divide a single encoding
2332	   group's capability by 3 to determine what resolution and frame
2333	   rate to configure the provider with or to configure the individual
2334	   Video Captures' Encoding Groups with what makes most sense (taking
2335	   into account the receive side decode capabilities, overall call
2336	   bandwidth, the resolution of the screens plus any user preferences
2337	   such as motion vs. sharpness).

2339	12.3. Multipoint Conference utilizing Multiple Content Captures

2341	   The use of MCCs allows the MCU to construct outgoing Advertisements
2342	   describing complex media switching and composition scenarios.  The
2343	   following sections provide several examples.

2345	   Note: In the examples the identities of the CLUE elements (e.g.
2346	   Captures, Capture Scene) in the incoming Advertisements overlap.
2347	   This is because there is no co-ordination between the endpoints.
2348	   The MCU is responsible for making these unique in the outgoing
2349	   advertisement.

2351	12.3.1. Single Media Captures and MCC in the same Advertisement

2353	   Four endpoints are involved in a Conference where CLUE is used. An
2354	   MCU acts as a middlebox between the endpoints with a CLUE channel
2355	   between each endpoint and the MCU. The MCU receives the following
2356	   Advertisements.

2358	        +-----------------------+---------------------------------+
2359	        | Capture Scene #1      | Description=AustralianConfRoom  |
2360	        +-----------------------|---------------------------------+
2361	        | VC1                   | Description=Audience            |
2362	        |                       | EncodeGroupID=1                 |
2363	        | CSV(VC1)              |                                 |
2364	        +---------------------------------------------------------+

2366	            Table 10: Advertisement received from Endpoint A

2368	        +-----------------------+---------------------------------+
2369	        | Capture Scene #1      | Description=ChinaConfRoom       |
2370	        +-----------------------|---------------------------------+
2371	        | VC1                   | Description=Speaker             |
2372	        |                       | EncodeGroupID=1                 |
2373	        | VC2                   | Description=Audience            |
2374	        |                       | EncodeGroupID=1                 |
2375	        | CSV(VC1, VC2)         |                                 |
2376	        +---------------------------------------------------------+

2378	            Table 11: Advertisement received from Endpoint B

2380	        +-----------------------+---------------------------------+
2381	        | Capture Scene #1      | Description=USAConfRoom         |
2382	        +-----------------------|---------------------------------+
2383	        | VC1                   | Description=Audience            |
2384	        |                       | EncodeGroupID=1                 |
2385	        | CSV(VC1)              |                                 |
2386	        +---------------------------------------------------------+

2388	            Table 12: Advertisement received from Endpoint C

2390	   Note: Endpoint B above indicates that it sends two streams.

2392	   If the MCU wanted to provide a Multiple Content Capture containing
2393	   a round robin switched view of the audience from the 3 endpoints
2394	   and the speaker it could construct the following advertisement:

2396	   Advertisement sent to Endpoint F
2397	        +=======================+=================================+
2398	        | Capture Scene #1      | Description=AustralianConfRoom  |
2399	        +-----------------------|---------------------------------+
2400	        | VC1                   | Description=Audience            |
2401	        | CSV(VC1)              |                                 |
2402	        +=======================+=================================+
2403	        | Capture Scene #2      | Description=ChinaConfRoom       |
2404	        +-----------------------|---------------------------------+
2405	        | VC2                   | Description=Speaker             |
2406	        | VC3                   | Description=Audience            |
2407	        | CSV(VC2, VC3)         |                                 |
2408	        +=======================+=================================+
2409	        | Capture Scene #3      | Description=USAConfRoom         |
2410	        +-----------------------|---------------------------------+
2411	        | VC4                   | Description=Audience            |
2412	        | CSV(VC4)              |                                 |
2413	        +=======================+=================================+
2414	        | Capture Scene #4      |                                 |
2415	        +-----------------------|---------------------------------+
2416	        | MCC1(VC1,VC2,VC3,VC4) | Policy=RoundRobin:1             |
2417	        |                       | MaxCaptures=1                   |
2418	        |                       | EncodingGroup=1                 |
2419	        | CSV(MCC1)             |                                 |
2420	        +=======================+=================================+

2422	         Table 13: Advertisement sent to Endpoint F - One Encoding

2424	   Alternatively if the MCU wanted to provide the speaker as one media
2425	   stream and the audiences as another it could assign an encoding
2426	   group to VC2 in Capture Scene 2 and provide a CSV in Capture Scene
2427	   #4 as per the example below.

2429	   Advertisement sent to Endpoint F
2430	        +=======================+=================================+
2431	        | Capture Scene #1      | Description=AustralianConfRoom  |
2432	        +-----------------------|---------------------------------+
2433	        | VC1                   | Description=Audience            |
2434	        | CSV(VC1)              |                                 |
2435	        +=======================+=================================+
2436	        | Capture Scene #2      | Description=ChinaConfRoom       |
2437	        +-----------------------|---------------------------------+
2438	        | VC2                   | Description=Speaker             |
2439	        |                       | EncodingGroup=1                 |
2440	        | VC3                   | Description=Audience            |
2441	        | CSV(VC2, VC3)         |                                 |
2442	        +=======================+=================================+
2443	        | Capture Scene #3      | Description=USAConfRoom         |
2444	        +-----------------------|---------------------------------+
2445	        | VC4                   | Description=Audience            |
2446	        | CSV(VC4)              |                                 |
2447	        +=======================+=================================+
2448	        | Capture Scene #4      |                                 |
2449	        +-----------------------|---------------------------------+
2450	        | MCC1(VC1,VC3,VC4)     | Policy=RoundRobin:1             |
2451	        |                       | MaxCaptures=1                   |
2452	        |                       | EncodingGroup=1                 |
2453	        |                       | AllowSubset=True                |
2454	        | MCC2(VC2)             | MaxCaptures=1                   |
2455	        |                       | EncodingGroup=1                 |
2456	        | CSV2(MCC1,MCC2)       |                                 |
2457	        +=======================+=================================+

2459	        Table 14: Advertisement sent to Endpoint F - Two Encodings

2461	   Therefore a Consumer could choose whether or not to have a separate
2462	   speaker related stream and could choose which endpoints to see.  If
2463	   it wanted the second stream but not the Australian conference room
2464	   it could indicate the following captures in the Configure message:

2466	        +-----------------------+---------------------------------+
2467	        | MCC1(VC3,VC4)         | Encoding                        |
2468	        | VC2                   | Encoding                        |
2469	        +-----------------------|---------------------------------+
2470	                      Table 15: MCU case: Consumer Response

2472	12.3.2. Several MCCs in the same Advertisement

2474	   Multiple MCCs can be used where multiple streams are used to carry
2475	   media from multiple endpoints.  For example:

2477	   A conference has three endpoints D, E and F. Each end point has
2478	   three video captures covering the left, middle and right regions of
2479	   each conference room.  The MCU receives the following
2480	   advertisements from D and E.

2482	        +-----------------------+---------------------------------+
2483	        | Capture Scene #1      | Description=AustralianConfRoom  |
2484	        +-----------------------|---------------------------------+
2485	        | VC1                   | CaptureArea=Left                |
2486	        |                       | EncodingGroup=1                 |
2487	        | VC2                   | CaptureArea=Centre              |
2488	        |                       | EncodingGroup=1                 |
2489	        | VC3                   | CaptureArea=Right               |
2490	        |                       | EncodingGroup=1                 |
2491	        | CSV(VC1,VC2,VC3)      |                                 |
2492	        +---------------------------------------------------------+

2494	            Table 16: Advertisement received from Endpoint D

2496	        +-----------------------+---------------------------------+
2497	        | Capture Scene #1      | Description=ChinaConfRoom       |
2498	        +-----------------------|---------------------------------+
2499	        | VC1                   | CaptureArea=Left                |
2500	        |                       | EncodingGroup=1                 |
2501	        | VC2                   | CaptureArea=Centre              |
2502	        |                       | EncodingGroup=1                 |
2503	        | VC3                   | CaptureArea=Right               |
2504	        |                       | EncodingGroup=1                 |
2505	        | CSV(VC1,VC2,VC3)      |                                 |
2506	        +---------------------------------------------------------+

2508	            Table 17: Advertisement received from Endpoint E

2510	   The MCU wants to offer Endpoint F three Capture Encodings.  Each
2511	   Capture Encoding would contain all the Captures from either
2512	   Endpoint D or Endpoint E depending based on the active speaker.
2513	   The MCU sends the following Advertisement:

2515	        +=======================+=================================+
2516	        | Capture Scene #1      | Description=AustralianConfRoom  |
2517	        +-----------------------|---------------------------------+
2518	        | VC1                   |                                 |
2519	        | VC2                   |                                 |
2520	        | VC3                   |                                 |
2521	        | CSV(VC1,VC2,VC3)      |                                 |
2522	        +=======================+=================================+
2523	        | Capture Scene #2      | Description=ChinaConfRoom       |
2524	        +-----------------------|---------------------------------+
2525	        | VC4                   |                                 |
2526	        | VC5                   |                                 |
2527	        | VC6                   |                                 |
2528	        | CSV(VC4,VC5,VC6)      |                                 |
2529	        +=======================+=================================+
2530	        | Capture Scene #3      |                                 |
2531	        +-----------------------|---------------------------------+
2532	        | MCC1(VC1,VC4)         | CaptureArea=Left                |
2533	        |                       | MaxCaptures=1                   |
2534	        |                       | SynchronisationID=1             |
2535	        |                       | EncodingGroup=1                 |
2536	        | MCC2(VC2,VC5)         | CaptureArea=Centre              |
2537	        |                       | MaxCaptures=1                   |
2538	        |                       | SynchronisationID=1             |
2539	        |                       | EncodingGroup=1                 |
2540	        | MCC3(VC3,VC6)         | CaptureArea=Right               |
2541	        |                       | MaxCaptures=1                   |
2542	        |                       | SynchronisationID=1             |
2543	        |                       | EncodingGroup=1                 |
2544	        | CSV(MCC1,MCC2,MCC3)   |                                 |
2545	        +=======================+=================================+

2547	            Table 18: Advertisement sent to Endpoint F

2549	12.3.3. Heterogeneous conference with switching and composition

2551	   Consider a conference between endpoints with the following
2552	   characteristics:

2554	      Endpoint A - 4 screens, 3 cameras

2556	      Endpoint B - 3 screens, 3 cameras

2558	      Endpoint C - 3 screens, 3 cameras
2559	      Endpoint D - 3 screens, 3 cameras

2561	      Endpoint E - 1 screen, 1 camera

2563	      Endpoint F - 2 screens, 1 camera

2565	      Endpoint G - 1 screen, 1 camera

2567	   This example focuses on what the user in one of the 3-camera multi-
2568	   screen endpoints sees.  Call this person User A, at Endpoint A.
2569	   There are 4 large display screens at Endpoint A. Whenever somebody
2570	   at another site is speaking, all the video captures from that
2571	   endpoint are shown on the large screens.  If the talker is at a 3-
2572	   camera site, then the video from those 3 cameras fills 3 of the
2573	   screens.  If the talker is at a single-camera site, then video from
2574	   that camera fills one of the screens, while the other screens show
2575	   video from other single-camera endpoints.

2577	   User A hears audio from the 4 loudest talkers.

2579	   User A can also see video from other endpoints, in addition to the
2580	   current talker, although much smaller in size.  Endpoint A has 4
2581	   screens, so one of those screens shows up to 9 other Media Captures
2582	   in a tiled fashion.  When video from a 3 camera endpoint appears in
2583	   the tiled area, video from all 3 cameras appears together across
2584	   the screen with correct spatial relationship among those 3 images.

2586	      +---+---+---+ +-------------+ +-------------+ +-------------+
2587	      |   |   |   | |             | |             | |             |
2588	      +---+---+---+ |             | |             | |             |
2589	      |   |   |   | |             | |             | |             |
2590	      +---+---+---+ |             | |             | |             |
2591	      |   |   |   | |             | |             | |             |
2592	      +---+---+---+ +-------------+ +-------------+ +-------------+
2593	                Figure 8:   Endpoint A - 4 Screen Display

2595	   User B at Endpoint B sees a similar arrangement, except there are
2596	   only 3 screens, so the 9 other Media Captures are spread out across
2597	   the bottom of the 3 displays, in a picture-in-picture (PIP) format.
2598	   When video from a 3 camera endpoint appears in the PIP area, video
2599	   from all 3 cameras appears together across a single screen with
2600	   correct spatial relationship.

2602	              +-------------+ +-------------+ +-------------+
2603	              |             | |             | |             |
2604	              |             | |             | |             |
2605	              |             | |             | |             |
2606	              | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
2607	              | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
2608	              +-------------+ +-------------+ +-------------+
2609	           Figure 9:   Endpoint B - 3 Screen Display with PiPs

2611	   When somebody at a different endpoint becomes the current talker,
2612	   then User A and User B both see the video from the new talker
2613	   appear on their large screen area, while the previous talker takes
2614	   one of the smaller tiled or PIP areas.  The person who is the
2615	   current talker doesn't see themselves; they see the previous talker
2616	   in their large screen area.

2618	   One of the points of this example is that endpoints A and B each
2619	   want to receive 3 capture encodings for their large display areas,
2620	   and 9 encodings for their smaller areas.  A and B are be able to
2621	   each send the same Configure message to the MCU, and each receive
2622	   the same conceptual Media Captures from the MCU.  The differences
2623	   are in how they are rendered and are purely a local matter at A and
2624	   B.

2626	   The Advertisements for such a scenario are described below.

2628	        +-----------------------+---------------------------------+
2629	        | Capture Scene #1      | Description=Endpoint x          |
2630	        +-----------------------|---------------------------------+
2631	        | VC1                   | EncodingGroup=1                 |
2632	        | VC2                   | EncodingGroup=1                 |
2633	        | VC3                   | EncodingGroup=1                 |
2634	        | AC1                   | EncodingGroup=2                 |
2635	        | CSV1(VC1, VC2, VC3)   |                                 |
2636	        | CSV2(AC1)             |                                 |
2637	        +---------------------------------------------------------+

2639	   Table 19: Advertisement received at the MCU from Endpoints A to D
2640	        +-----------------------+---------------------------------+
2641	        | Capture Scene #1      | Description=Endpoint y          |
2642	        +-----------------------|---------------------------------+
2643	        | VC1                   | EncodingGroup=1                 |
2644	        | AC1                   | EncodingGroup=2                 |
2645	        | CSV1(VC1)             |                                 |
2646	        | CSV2(AC1)             |                                 |
2647	        +---------------------------------------------------------+

2649	   Table 20: Advertisement received at the MCU from Endpoints E to G

2651	   Rather than considering what is displayed CLUE concentrates more
2652	   on what the MCU sends. The MCU doesn't know anything about the
2653	   number of screens an endpoint has.

2655	   As Endpoints A to D each advertise that three Captures make up a
2656	   Capture Scene, the MCU offers these in a "site" switching mode.
2657	   That is that there are three Multiple Content Captures (and
2658	   Capture Encodings) each switching between Endpoints. The MCU
2659	   switches in the applicable media into the stream based on voice
2660	   activity. Endpoint A will not see a capture from itself.

2662	   Using the MCC concept the MCU would send the following
2663	   Advertisement to endpoint A:

2665	        +=======================+=================================+
2666	        | Capture Scene #1      | Description=Endpoint B          |
2667	        +-----------------------|---------------------------------+
2668	        | VC4                   | CaptureArea=Left                |
2669	        | VC5                   | CaptureArea=Center              |
2670	        | VC6                   | CaptureArea=Right               |
2671	        | AC1                   |                                 |
2672	        | CSV(VC4,VC5,VC6)      |                                 |
2673	        | CSV(AC1)              |                                 |
2674	        +=======================+=================================+
2675	        | Capture Scene #2      | Description=Endpoint C          |
2676	        +-----------------------|---------------------------------+
2677	        | VC7                   | CaptureArea=Left                |
2678	        | VC8                   | CaptureArea=Center              |
2679	        | VC9                   | CaptureArea=Right               |
2680	        | AC2                   |                                 |
2681	        | CSV(VC7,VC8,VC9)      |                                 |
2682	        | CSV(AC2)              |                                 |
2683	        +=======================+=================================+
2684	        | Capture Scene #3      | Description=Endpoint D          |
2685	        +-----------------------|---------------------------------+
2686	        | VC10                  | CaptureArea=Left                |
2687	        | VC11                  | CaptureArea=Center              |
2688	        | VC12                  | CaptureArea=Right               |
2689	        | AC3                   |                                 |
2690	        | CSV(VC10,VC11,VC12)   |                                 |
2691	        | CSV(AC3)              |                                 |
2692	        +=======================+=================================+
2693	        | Capture Scene #4      | Description=Endpoint E          |
2694	        +-----------------------|---------------------------------+
2695	        | VC13                  |                                 |
2696	        | AC4                   |                                 |
2697	        | CSV(VC13)             |                                 |
2698	        | CSV(AC4)              |                                 |
2699	        +=======================+=================================+
2700	        | Capture Scene #5      | Description=Endpoint F          |
2701	        +-----------------------|---------------------------------+
2702	        | VC14                  |                                 |
2703	        | AC5                   |                                 |
2704	        | CSV(VC14)             |                                 |
2705	        | CSV(AC5)              |                                 |
2706	        +=======================+=================================+
2707	        | Capture Scene #6      | Description=Endpoint G          |
2708	        +-----------------------|---------------------------------+
2709	        | VC15                  |                                 |
2710	        | AC6                   |                                 |
2711	        | CSV(VC15)             |                                 |
2712	        | CSV(AC6)              |                                 |
2713	        +=======================+=================================+

2715	         Table 21: Advertisement sent to endpoint A - Source Part

2717	   The above part of the Advertisement presents information about the
2718	   sources to the MCC. The information is effectively the same as the
2719	   received Advertisements except that there are no Capture Encodings
2720	   associated with them and the identities have been re-numbered.

2722	   In addition to the source Capture information the MCU advertises
2723	   "site" switching of Endpoints B to G in three streams.

2725	        +=======================+=================================+
2726	        | Capture Scene #7      | Description=Output3streammix    |
2727	        +-----------------------|---------------------------------+
2728	        | MCC1(VC4,VC7,VC10,    | CaptureArea=Left                |
2729	        |      VC13)            | MaxCaptures=1                   |
2730	        |                       | SynchronisationID=1             |
2731	        |                       | Policy=SoundLevel:0             |
2732	        |                       | EncodingGroup=1                 |
2733	        |                       |                                 |
2734	        | MCC2(VC5,VC8,VC11,    | CaptureArea=Center              |
2735	        |      VC14)            | MaxCaptures=1                   |
2736	        |                       | SynchronisationID=1             |
2737	        |                       | Policy=SoundLevel:0             |
2738	        |                       | EncodingGroup=1                 |
2739	        |                       |                                 |
2740	        | MCC3(VC6,VC9,VC12,    | CaptureArea=Right               |
2741	        |      VC15)            | MaxCaptures=1                   |
2742	        |                       | SynchronisationID=1             |
2743	        |                       | Policy=SoundLevel:0             |
2744	        |                       | EncodingGroup=1                 |
2745	        |                       |                                 |
2746	        | MCC4() (for audio)    | CaptureArea=whole scene         |
2747	        |                       | MaxCaptures=1                   |
2748	        |                       | Policy=SoundLevel:0             |
2749	        |                       | EncodingGroup=2                 |
2750	        |                       |                                 |
2751	        | MCC5() (for audio)    | CaptureArea=whole scene         |
2752	        |                       | MaxCaptures=1                   |
2753	        |                       | Policy=SoundLevel:1             |
2754	        |                       | EncodingGroup=2                 |
2755	        |                       |                                 |
2756	        | MCC6() (for audio)    | CaptureArea=whole scene         |
2757	        |                       | MaxCaptures=1                   |
2758	        |                       | Policy=SoundLevel:2             |
2759	        |                       | EncodingGroup=2                 |
2760	        |                       |                                 |
2761	        | MCC7() (for audio)    | CaptureArea=whole scene         |
2762	        |                       | MaxCaptures=1                   |
2763	        |                       | Policy=SoundLevel:3             |
2764	        |                       | EncodingGroup=2                 |
2765	        |                       |                                 |
2766	        | CSV(MCC1,MCC2,MCC3)   |                                 |
2767	        | CSV(MCC4,MCC5,MCC6,   |                                 |
2768	        |     MCC7)             |                                 |
2769	        +=======================+=================================+

2771	       Table 22: Advertisement send to endpoint A - switching part

2773	   The above part describes the switched 3 main streams that relate to
2774	   site switching. MaxCaptures=1 indicates that only one Capture from
2775	   the MCC is sent at a particular time. SynchronisationID=1 indicates
2776	   that the source sending is synchronised. The provider can choose to
2777	   group together VC13, VC14, and VC15 for the purpose of switching
2778	   according to the SynchronisationID.  Therefore when the provider
2779	   switches one of them into an MCC, it can also switch the others
2780	   even though they are not part of the same Capture Scene.

2782	   All the audio for the conference is included in this Scene #7.
2783	   There isn't necessarily a one to one relation between any audio
2784	   capture and video capture in this scene.  Typically a change in
2785	   loudest talker will cause the MCU to switch the audio streams more
2786	   quickly than switching video streams.

2788	   The MCU can also supply nine media streams showing the active and
2789	   previous eight speakers. It includes the following in the
2790	   Advertisement:

2792	        +=======================+=================================+
2793	        | Capture Scene #8      | Description=Output9stream       |
2794	        +-----------------------|---------------------------------+
2795	        | MCC8(VC4,VC5,VC6,VC7, | MaxCaptures=1                   |
2796	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:0             |
2797	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2798	        |                       |                                 |
2799	        | MCC9(VC4,VC5,VC6,VC7, | MaxCaptures=1                   |
2800	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:1             |
2801	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2802	        |                       |                                 |
2803	                    to                           to               |
2804	        |                       |                                 |
2805	        | MCC16(VC4,VC5,VC6,VC7,| MaxCaptures=1                   |
2806	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:8             |
2807	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2808	        |                       |                                 |
2809	        | CSV(MCC8,MCC9,MCC10,  |                                 |
2810	        |     MCC11,MCC12,MCC13,|                                 |
2811	        |     MCC14,MCC15,MCC16)|                                 |
2812	        +=======================+=================================+

2814	       Table 23: Advertisement sent to endpoint A - 9 switched part

2816	   The above part indicates that there are 9 capture encodings. Each
2817	   of the Capture Encodings may contain any captures from any source
2818	   site with a maximum of one Capture at a time. Which Capture is
2819	   present is determined by the policy.  The MCCs in this scene do not
2820	   have any spatial attributes.

2822	   Note: The Provider alternatively could provide each of the MCCs
2823	   above in its own Capture Scene.

2825	   If the MCU wanted to provide a composed Capture Encoding containing
2826	   all of the 9 captures it could advertise in addition:

2828	        +=======================+=================================+
2829	        | Capture Scene #9      | Description=NineTiles           |
2830	        +-----------------------|---------------------------------+
2831	        | MCC13(MCC8,MCC9,MCC10,| MaxCaptures=9                   |
2832	        |     MCC11,MCC12,MCC13,| EncodingGroup=1                 |
2833	        |     MCC14,MCC15,MCC16)|                                 |
2834	        |                       |                                 |
2835	        | CSV(MCC13)            |                                 |
2836	        +=======================+=================================+

2838	      Table 24: Advertisement sent to endpoint A - 9 composed part

2840	   As MaxCaptures is 9 it indicates that the capture encoding contains
2841	   information from 9 sources at a time.

2843	   The Advertisement to Endpoint B is identical to the above other
2844	   than the captures from Endpoint A would be added and the captures
2845	   from Endpoint B would be removed. Whether the Captures are rendered
2846	   on a four screen display or a three screen display is up to the
2847	   Consumer to determine.  The Consumer wants to place video captures
2848	   from the same original source endpoint together, in the correct
2849	   spatial order, but the MCCs do not have spatial attributes.  So the
2850	   Consumer needs to associate incoming media packets with the
2851	   original individual captures in the advertisement (such as VC4,
2852	   VC5, and VC6) in order to know the spatial information it needs for
2853	   correct placement on the screens.  The Provider can use the RTCP
2854	   CaptureId SDES item and associated RTP header extension, as
2855	   described in [I-D.ietf-clue-rtp-mapping], to convey this
2856	   information to the Consumer.

2858	12.3.4. Heterogeneous conference with voice activated switching

2860	   This example illustrates how multipoint "voice activated switching"
2861	   behavior can be realized, with an endpoint making its own decision
2862	   about which of its outgoing video streams is considered the "active
2863	   talker" from that endpoint.  Then an MCU can decide which is the
2864	   active talker among the whole conference.

2866	   Consider a conference between endpoints with the following
2867	   characteristics:

2869	      Endpoint A - 3 screens, 3 cameras

2871	      Endpoint B - 3 screens, 3 cameras

2873	      Endpoint C - 1 screen, 1 camera

2875	   This example focuses on what the user at endpoint C sees.  The
2876	   user would like to see the video capture of the current talker,
2877	   without composing it with any other video capture.  In this
2878	   example endpoint C is capable of receiving only a single video
2879	   stream.  The following tables describe advertisements from A and B
2880	   to the MCU, and from the MCU to C, that can be used to accomplish
2881	   this.

2883	        +-----------------------+---------------------------------+
2884	        | Capture Scene #1      | Description=Endpoint x          |
2885	        +-----------------------|---------------------------------+
2886	        | VC1                   | CaptureArea=Left                |
2887	        |                       | EncodingGroup=1                 |
2888	        | VC2                   | CaptureArea=Center              |
2889	        |                       | EncodingGroup=1                 |
2890	        | VC3                   | CaptureArea=Right               |
2891	        |                       | EncodingGroup=1                 |
2892	        | MCC1(VC1,VC2,VC3)     | MaxCaptures=1                   |
2893	        |                       | CaptureArea=whole scene         |
2894	        |                       | Policy=SoundLevel:0             |
2895	        |                       | EncodingGroup=1                 |
2896	        | AC1                   | CaptureArea=whole scene         |
2897	        |                       | EncodingGroup=2                 |
2898	        | CSV1(VC1, VC2, VC3)   |                                 |
2899	        | CSV2(MCC1)            |                                 |
2900	        | CSV3(AC1)             |                                 |
2901	        +---------------------------------------------------------+

2903	   Table 25: Advertisement received at the MCU from Endpoints A and B

2905	   Endpoints A and B are advertising each individual video capture,
2906	   and also a switched capture MCC1 which switches between the other
2907	   three based on who is the active talker.  These endpoints do not
2908	   advertise distinct audio captures associated with each individual
2909	   video capture, so it would be impossible for the MCU (as a media
2910	   consumer) to make its own determination of which video capture is
2911	   the active talker based just on information in the audio streams.

2913	        +-----------------------+---------------------------------+
2914	        | Capture Scene #1      | Description=conference          |
2915	        +-----------------------|---------------------------------+
2916	        | MCC1()                | CaptureArea=Left                |
2917	        |                       | MaxCaptures=1                   |
2918	        |                       | SynchronisationID=1             |
2919	        |                       | Policy=SoundLevel:0             |
2920	        |                       | EncodingGroup=1                 |
2921	        |                       |                                 |
2922	        | MCC2()                | CaptureArea=Center              |
2923	        |                       | MaxCaptures=1                   |
2924	        |                       | SynchronisationID=1             |
2925	        |                       | Policy=SoundLevel:0             |
2926	        |                       | EncodingGroup=1                 |
2927	        |                       |                                 |
2928	        | MCC3()                | CaptureArea=Right               |
2929	        |                       | MaxCaptures=1                   |
2930	        |                       | SynchronisationID=1             |
2931	        |                       | Policy=SoundLevel:0             |
2932	        |                       | EncodingGroup=1                 |
2933	        |                       |                                 |
2934	        | MCC4()                | CaptureArea=whole scene         |
2935	        |                       | MaxCaptures=1                   |
2936	        |                       | Policy=SoundLevel:0             |
2937	        |                       | EncodingGroup=1                 |
2938	        |                       |                                 |
2939	        | MCC5() (for audio)    | CaptureArea=whole scene         |
2940	        |                       | MaxCaptures=1                   |
2941	        |                       | Policy=SoundLevel:0             |
2942	        |                       | EncodingGroup=2                 |
2943	        |                       |                                 |
2944	        | MCC6() (for audio)    | CaptureArea=whole scene         |
2945	        |                       | MaxCaptures=1                   |
2946	        |                       | Policy=SoundLevel:1             |
2947	        |                       | EncodingGroup=2                 |
2948	        | CSV1(MCC1,MCC2,MCC3   |                                 |
2949	        | CSV2(MCC4)            |                                 |
2950	        | CSV3(MCC5,MCC6)       |                                 |
2951	        +---------------------------------------------------------+
2952	            Table 26: Advertisement sent from the MCU to C

2954	   The MCU advertises one scene, with four video MCCs.  Three of them
2955	   in CSV1 give a left, center, right view of the conference, with
2956	   "site switching". MCC4 provides a single video capture
2957	   representing a view of the whole conference.  The MCU intends for
2958	   MCC4 to be switched between all the other original source
2959	   captures.  In this example advertisement the MCU is not giving all
2960	   the information about all the other endpoints' scenes and which of
2961	   those captures is included in the MCCs.  The MCU could include all
2962	   that information if it wants to give the consumers more
2963	   information, but it is not necessary for this example scenario.

2965	   The Provider advertises MCC5 and MCC6 for audio.  Both are
2966	   switched captures, with different SoundLevel policies indicating
2967	   they are the top two dominant talkers.  The Provider advertises
2968	   CSV3 with both MCCs, suggesting the Consumer should use both if it
2969	   can.

2971	   Endpoint C, in its configure message to the MCU, requests to
2972	   receive MCC4 for video, and MCC5 and MCC6 for audio.  In order for
2973	   the MCU to get the information it needs to construct MCC4, it has
2974	   to send configure messages to A and B asking to receive MCC1 from
2975	   each of them, along with their AC1 audio.  Now the MCU can use
2976	   audio energy information from the two incoming audio streams from
2977	   A and B to determine which of those alternatives is the current
2978	   talker.  Based on that, the MCU uses either MCC1 from A or MCC1
2979	   from B as the source of MCC4 to send to C.

2981	13. Acknowledgements

2983	   Allyn Romanow and Brian Baldino were authors of early versions.
2984	   Mark Gorzynski also contributed much to the initial approach.
2985	   Many others also contributed, including Christian Groves, Jonathan
2986	   Lennox, Paul Kyzivat, Rob Hansen, Roni Even, Christer Holmberg,
2987	   Stephen Botzko, Mary Barnes, John Leslie, Paul Coverdale.

2989	14. IANA Considerations

2991	   None.

2993	15. Security Considerations

2995	   There are several potential attacks related to telepresence, and
2996	   specifically the protocols used by CLUE, in the case of
2997	   conferencing sessions, due to the natural involvement of multiple
2998	   endpoints and the many, often user-invoked, capabilities provided
2999	   by the systems.

3001	   An MCU involved in a CLUE session can experience many of the same
3002	   attacks as that of a conferencing system such as that enabled by
3003	   the XCON framework [RFC 6503]. Examples of attacks include the
3004	   following: an endpoint attempting to listen to sessions in which
3005	   it is not authorized to participate, an endpoint attempting to
3006	   disconnect or mute other users, and theft of service by an
3007	   endpoint in attempting to create telepresence sessions it is not
3008	   allowed to create. Thus, it is RECOMMENDED that an MCU
3009	   implementing the protocols necessary to support CLUE, follow the
3010	   security recommendations specified in the conference control
3011	   protocol documents.  In the case of CLUE, SIP is the conferencing
3012	   protocol, thus the security considerations in RFC 4579 MUST be
3013	   followed.

3015	   One primary security concern, surrounding the CLUE framework
3016	   introduced in this document, involves securing the actual
3017	   protocols and the associated authorization mechanisms.  These
3018	   concerns apply to endpoint to endpoint sessions, as well as
3019	   sessions involving multiple endpoints and MCUs. Figure 2 in
3020	   section 5 provides a basic flow of information exchange for CLUE
3021	   and the protocols involved.

3023	   As described in section 5, CLUE uses SIP/SDP to establish the
3024	   session prior to exchanging any CLUE specific information. Thus
3025	   the security mechanisms recommended for SIP [RFC 3261], including
3026	   user authentication and authorization, SHOULD be followed. In
3027	   addition, the media is based on RTP and thus existing RTP security
3028	   mechanisms SHOULD be supported, and DTLS/SRTP MUST be supported.
3029	   Media security is also discussed in [I-D.ietf-clue-signaling] and
3030	   [I-D.ietf-clue-rtp-mapping].

3032	   A separate data channel is established to transport the CLUE
3033	   protocol messages. The contents of the CLUE protocol messages are
3034	   based on information introduced in this document.  The CLUE data
3035	   model [I-D.ietf-clue-data-model-schema] defines through an XML
3036	   schema the syntax to be used. Some of the information which could
3037	   possibly introduce privacy concerns is the xCard information as
3038	   described in section 7.1.1.11.  In addition, the (text)
3039	   description field in the Media Capture attribute (section 7.1.1.7)
3040	   could possibly reveal sensitive information or specific
3041	   identities. The same would be true for the descriptions in the
3042	   Capture Scene (section 7.3.1) and Capture Scene View (7.3.2)
3043	   attributes.   One other important consideration for the
3044	   information in the xCard as well as the description field in the
3045	   Media Capture and Capture Scene View attributes is that while the
3046	   endpoints involved in the session have been authenticated, there
3047	   is no assurance that the information in the xCard or description
3048	   fields is authentic.  Thus, this information MUST NOT be used to
3049	   make any authorization decisions.

3051	   While other information in the CLUE protocol messages does not
3052	   reveal specific identities, it can reveal characteristics and
3053	   capabilities of the endpoints.  That information could possibly
3054	   uniquely identify specific endpoints.  It might also be possible
3055	   for an attacker to manipulate the information and disrupt the CLUE
3056	   sessions.  It would also be possible to mount a DoS attack on the
3057	   CLUE endpoints if a malicious agent has access to the data
3058	   channel.  Thus, it MUST be possible for the endpoints to establish
3059	   a channel which is secure against both message recovery and
3060	   message modification. Further details on this are provided in the
3061	   CLUE data channel solution document.

3063	   There are also security issues associated with the authorization
3064	   to perform actions at the CLUE endpoints to invoke specific
3065	   capabilities (e.g., re-arranging screens, sharing content, etc.).
3066	   However, the policies and security associated with these actions
3067	   are outside the scope of this document and the overall CLUE
3068	   solution.

3070	16. Changes Since Last Version

3072	   NOTE TO THE RFC-Editor: Please remove this section prior to
3073	   publication as an RFC.

3075	   Changes from 20 to 21:

3077	     1. Clarify CLUE can be useful for multi-stream non-telepresence
3078	        cases.
3079	     2. Remove unnecessary ambiguous sentence about optional use of
3080	        CLUE protocol.
3081	     3. Clarify meaning if Area of Capture is not specified.
3082	     4. Remove use of "conference" where it didn't fit according to
3083	        the definition.  Use "CLUE session" or "meeting" instead.

3085	     5. Embedded Text Attribute: Remove restriction it is for video
3086	        only.
3087	     6. Minor cleanup in section 12 examples.
3088	     7. Minor editorial corrections suggested by Christian Groves.

3090	   Changes from 19 to 20:

3092	     1. Define term "CLUE" in introduction.
3093	     2. Add MCC attribute Allow Subset Choice.
3094	     3. Remove phrase about reducing SDP size, replace with
3095	        potentially saving consumer resources.
3096	     4. Change example of a CLUE exchange that does not require SDP
3097	        exchange.
3098	     5. Language attribute uses RFC5646.
3099	     6. Change Member person type to Attendee.  Add Observer type.
3100	     7. Clarify DTLS/SRTP MUST be supported.
3101	     8. Change SHOULD NOT to MUST NOT regarding using xCard or
3102	        description information for authorization decisions.
3103	     9. Clarify definition of Global View.
3104	     10. Refer to signaling doc regarding interoperating with a
3105	        device that does not support CLUE.
3106	     11. Various minor editorial changes from working group last call
3107	        feedback.
3108	     12. Capitalize defined terms.

3110	   Changes from 18 to 19:

3112	     1. Remove the Max Capture Encodings media capture attribute.
3113	     2. Refer to RTP mapping document in the MCC example section.
3114	     3. Update references to current versions of drafts in progress.

3116	   Changes from 17 to 18:

3118	     1. Add separate definition of Global View List.
3119	     2. Add diagram for Global View List structure.
3120	     3. Tweak definitions of Media Consumer and Provider.

3122	   Changes from 16 to 17:

3124	     1. Ticket #59 - rename Capture Scene Entry (CSE) to Capture
3125	        Scene View (CSV)

3127	     2. Ticket #60 - rename Global CSE List to Global View List

3129	     3. Ticket #61 - Proposal for describing the coordinate system.
3130	        Describe it better, without conflicts if cameras point in
3131	        different directions.

3133	     4. Minor clarifications and improved wording for Synchronisation
3134	        Identity, MCC, Simultaneous Transmission Set.

3136	     5. Add definitions for CLUE-capable device and CLUE-enabled
3137	        call, taken from the signaling draft.

3139	     6. Update definitions of Capture Device, Media Consumer, Media
3140	        Provider, Endpoint, MCU, MCC.

3142	     7. Replace "middle box" with "MCU".

3144	     8. Explicitly state there can also be Media Captures that are
3145	        not included in a Capture Scene View.

3147	     9. Explicitly state "A single Encoding Group MAY refer to
3148	        encodings for different media types."

3150	     10. In example 12.1.1 add axes and audio captures to the
3151	        diagram, and describe placement of microphones.

3153	     11. Add references to data model and signaling drafts.

3155	     12. Split references into Normative and Informative sections.
3156	        Add heading number for references section.

3158	   Changes from 15 to 16:

3160	     1. Remove Audio Channel Format attribute

3162	     2. Add Audio Capture Sensitivity Pattern attribute

3164	     3. Clarify audio spatial information regarding point of capture
3165	        and point on line of capture.  Area of capture does not apply
3166	        to audio.

3168	     4. Update section 12 example for new treatment of audio spatial
3169	        information.

3171	     5. Clean up wording of some definitions, and various places in
3172	        sections 5 and 10.

3174	     6. Remove individual encoding parameter paragraph from section
3175	        9.

3177	     7. Update Advertisement diagram.

3179	     8. Update Acknowledgements.

3181	     9. References to use cases and requirements now refer to RFCs.

3183	     10. Minor editorial changes.

3185	   Changes from 14 to 15:

3187	     1. Add "=" and "<=" qualifiers to MaxCaptures attribute, and
3188	        clarify the meaning regarding switched and composed MCC.

3190	     2. Add section 7.3.3 Global Capture Scene Entry List, and a few
3191	        other sentences elsewhere that refer to global CSE sets.

3193	     3. Clarify: The Provider MUST be capable of encoding and sending
3194	        all Captures (*that have an encoding group*) in a single
3195	        Capture Scene Entry simultaneously.

3197	     4. Add voice activated switching example in section 12.

3199	     5. Change name of attributes Participant Info/Type to Person
3200	        Info/Type.

3202	     6. Clarify the Person Info/Type attributes have the same meaning
3203	        regardless of whether or not the capture has a Presentation
3204	        attribute.

3206	     7. Update example section 12.1 to be consistent with the rest of
3207	        the document, regarding MCC and capture attributes.

3209	     8. State explicitly each CSE has a unique ID.

3211	   Changes from 13 to 14:

3213	     1. Fill in section for Security Considerations.

3215	     2. Replace Role placeholder with Participant Information,
3216	        Participant Type, and Scene Information attributes.

3218	     3. Spatial information implies nothing about how constituent
3219	        media captures are combined into a composed MCC.

3221	     4. Clean up MCC example in Section 12.3.3.  Clarify behavior of
3222	        tiled and PIP display windows.  Add audio.  Add new open
3223	        issue about associating incoming packets to original source
3224	        capture.

3226	     5. Remove editor's note and associated statement about RTP
3227	        multiplexing at end of section 5.

3229	     6. Remove editor's note and associated paragraph about
3230	        overloading media channel with both CLUE and non-CLUE usage,
3231	        in section 5.

3233	     7. In section 10, clarify intent of media encodings conforming
3234	        to SDP, even with multiple CLUE message exchanges.  Remove
3235	        associated editor's note.

3237	   Changes from 12 to 13:

3239	     1. Added the MCC concept including updates to existing sections
3240	        to incorporate the MCC concept. New MCC attributes:
3241	        MaxCaptures, SynchronisationID and Policy.

3243	     2. Removed the "composed" and "switched" Capture attributes due
3244	        to overlap with the MCC concept.

3246	     3. Removed the "Scene-switch-policy" CSE attribute, replaced by
3247	        MCC and SynchronisationID.

3249	     4. Editorial enhancements including numbering of the Capture
3250	        attribute sections, tables, figures etc.

3252	   Changes from 11 to 12:

3254	     1. Ticket #44. Remove note questioning about requiring a
3255	        Consumer to send a Configure after receiving Advertisement.

3257	     2. Ticket #43. Remove ability for consumer to choose value of
3258	        attribute for scene-switch-policy.

3260	     3. Ticket #36. Remove computational complexity parameter,
3261	        MaxGroupPps, from Encoding Groups.

3263	     4. Reword the Abstract and parts of sections 1 and 4 (now 5)
3264	        based on Mary's suggestions as discussed on the list.  Move
3265	        part of the Introduction into a new section Overview &
3266	        Motivation.

3268	     5. Add diagram of an Advertisement, in the Overview of the
3269	        Framework/Model section.

3271	     6. Change Intended Status to Standards Track.

3273	     7. Clean up RFC2119 keyword language.

3275	   Changes from 10 to 11:

3277	     1. Add description attribute to Media Capture and Capture Scene
3278	        Entry.

3280	     2. Remove contradiction and change the note about open issue
3281	        regarding always responding to Advertisement with a Configure
3282	        message.

3284	     3. Update example section, to cleanup formatting and make the
3285	        media capture attributes and encoding parameters consistent
3286	        with the rest of the document.

3288	   Changes from 09 to 10:

3290	     1. Several minor clarifications such as about SDP usage, Media
3291	        Captures, Configure message.

3293	     2. Simultaneous Set can be expressed in terms of Capture Scene
3294	        and Capture Scene Entry.

3296	     3. Removed Area of Scene attribute.

3298	     4. Add attributes from draft-groves-clue-capture-attr-01.

3300	     5. Move some of the Media Capture attribute descriptions back
3301	        into this document, but try to leave detailed syntax to the
3302	        data model.  Remove the OUTSOURCE sections, which are already
3303	        incorporated into the data model document.

3305	   Changes from 08 to 09:

3307	     1. Use "document" instead of "memo".

3309	     2. Add basic call flow sequence diagram to introduction.

3311	     3. Add definitions for Advertisement and Configure messages.

3313	     4. Add definitions for Capture and Provider.

3315	     5. Update definition of Capture Scene.

3317	     6. Update definition of Individual Encoding.

3319	     7. Shorten definition of Media Capture and add key points in the
3320	        Media Captures section.

3322	     8. Reword a bit about capture scenes in overview.

3324	     9. Reword about labeling Media Captures.

3326	     10. Remove the Consumer Capability message.

3328	     11. New example section heading for media provider behavior

3330	     12. Clarifications in the Capture Scene section.

3332	     13. Clarifications in the Simultaneous Transmission Set section.

3334	     14. Capitalize defined terms.

3336	     15. Move call flow example from introduction to overview section

3338	     16. General editorial cleanup

3340	     17. Add some editors' notes requesting input on issues
3341	     18. Summarize some sections, and propose details be outsourced
3342	        to other documents.

3344	   Changes from 06 to 07:

3346	     1. Ticket #9.  Rename Axis of Capture Point attribute to Point
3347	        on Line of Capture.  Clarify the description of this
3348	        attribute.

3350	     2. Ticket #17.  Add "capture encoding" definition.  Use this new
3351	        term throughout document as appropriate, replacing some usage
3352	        of the terms "stream" and "encoding".

3354	     3. Ticket #18.  Add Max Capture Encodings media capture
3355	        attribute.

3357	     4. Add clarification that different capture scene entries are
3358	        not necessarily mutually exclusive.

3360	   Changes from 05 to 06:

3362	   1. Capture scene description attribute is a list of text strings,
3363	      each in a different language, rather than just a single string.

3365	   2. Add new Axis of Capture Point attribute.

3367	   3. Remove appendices A.1 through A.6.

3369	   4. Clarify that the provider must use the same coordinate system
3370	      with same scale and origin for all coordinates within the same
3371	      capture scene.

3373	   Changes from 04 to 05:

3375	   1. Clarify limitations of "composed" attribute.

3377	   2. Add new section "capture scene entry attributes" and add the
3378	      attribute "scene-switch-policy".

3380	   3. Add capture scene description attribute and description
3381	      language attribute.

3383	   4. Editorial changes to examples section for consistency with the
3384	      rest of the document.

3386	   Changes from 03 to 04:

3388	   1. Remove sentence from overview - "This constitutes a significant
3389	      change ..."

3391	   2. Clarify a consumer can choose a subset of captures from a
3392	      capture scene entry or a simultaneous set (in section "capture
3393	      scene" and "consumer's choice...").

3395	   3. Reword first paragraph of Media Capture Attributes section.

3397	   4. Clarify a stereo audio capture is different from two mono audio
3398	      captures (description of audio channel format attribute).

3400	   5. Clarify what it means when coordinate information is not
3401	      specified for area of capture, point of capture, area of scene.

3403	   6. Change the term "producer" to "provider" to be consistent (it
3404	      was just in two places).

3406	   7. Change name of "purpose" attribute to "content" and refer to
3407	      RFC4796 for values.

3409	   8. Clarify simultaneous sets are part of a provider advertisement,
3410	      and apply across all capture scenes in the advertisement.

3412	   9. Remove sentence about lip-sync between all media captures in a
3413	      capture scene.

3415	   10.   Combine the concepts of "capture scene" and "capture set"
3416	      into a single concept, using the term "capture scene" to
3417	      replace the previous term "capture set", and eliminating the
3418	      original separate capture scene concept.

3420	17. Normative References

3422	   [I-D.ietf-clue-datachannel]
3423	              Holmberg, C., "CLUE Protocol Data Channel", draft-
3424	              ietf-clue-datachannel-05 (work in progress), November
3425	              2014.

3427	   [I-D.ietf-clue-data-model-schema]
3428	              Presta, R., Romano, S P., "An XML Schema for the CLUE
3429	              data model", draft-ietf-clue-data-model-schema-07 (work
3430	              in progress), September 2014.

3432	   [I-D.ietf-clue-protocol]
3433	              Presta, R. and S. Romano, "CLUE protocol", draft-
3434	              ietf-clue-protocol-02 (work in progress), October 2014.

3436	   [I-D.ietf-clue-signaling]
3437	              Kyzivat, P., Xiao, L., Groves, C., Hansen, R., "CLUE
3438	              Signaling", draft-ietf-clue-signaling-04 (work in
3439	              progress), October 2014.

3441	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
3442	              Requirement Levels", BCP 14, RFC 2119, March 1997.

3444	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G.,
3445	   Johnston,
3446	              A., Peterson, J., Sparks, R., Handley, M., and E.
3447	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
3448	              June 2002.

3450	   [RFC3264]  Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model
3451	              with the Session Description Protocol (SDP)", RFC 3264,
3452	              June 2002.

3454	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
3455	              Jacobson, "RTP: A Transport Protocol for Real-Time
3456	              Applications", STD 64, RFC 3550, July 2003.

3458	   [RFC4579]  Johnston, A., Levin, O., "SIP Call Control -
3459	              Conferencing for User Agents", RFC 4579, August 2006

3461	18. Informative References

3463	   [I-D.ietf-clue-rtp-mapping]
3464	              Even, R., Lennox, J., "Mapping RP streams to CLUE media
3465	              captures", draft-ietf-clue-rtp-mapping-03 (work in
3466	              progress), October 2014.

3468	    [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
3469	              Session Initiation Protocol (SIP)", RFC 4353,
3470	              February 2006.

3472	   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC
3473	              5117, January 2008.

3475	   [RFC5646]  Phillips, A., Davis, M., "Tags for Identifying
3476	              Languages", RFC 5646, September 2009

3478	   [RFC7205]  Romanow, A., Botzko, S., Duckworth, M., Even, R.,
3479	              "Use Cases for Telepresence Multistreams", RFC 7205,
3480	              April 2014.

3482	   [RFC7262]  Romanow, A., Botzko, S., Barnes, M., "Requirements
3483	              for Telepresence Multistreams", RFC 7262, June 2014.

3485	19. Authors' Addresses

3487	   Mark Duckworth (editor)
3488	   Polycom
3489	   Andover, MA  01810
3490	   USA

3492	   Email: mark.duckworth@polycom.com

3494	   Andrew Pepperell
3495	   Acano
3496	   Uxbridge, England
3497	   UK

3499	   Email: apeppere@gmail.com

3501	   Stephan Wenger
3502	   Vidyo, Inc.
3503	   433 Hackensack Ave.
3504	   Hackensack, N.J. 07601
3505	   USA

3507	   Email: stewe@stewe.org