CLUE Working Group R. Presta Internet-Draft S P. Romano Intended status: Informational University of Napoli Expires: August 7, 2014 February 3, 2014 An XML Schema for the CLUE data model draft-ietf-clue-data-model-schema-03 Abstract This document provides an XML schema file for the definition of CLUE data model types. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 7, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents Presta & Romano Expires August 7, 2014 [Page 1] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. . . . . . . . . . . . . . . . . . . . . . . . 15 5. . . . . . . . . . . . . . . . . . . . . . . . . . 16 6. . . . . . . . . . . . . . . . . . . . . . . . 16 7. . . . . . . . . . . . . . . . . . . . . . . . 16 8. . . . . . . . . . . . . . . . . . . . . . . 16 9. . . . . . . . . . . . . . . . . . . . . . . 16 10. . . . . . . . . . . . . . . . . . . . . . . . . 16 10.1. . . . . . . . . . . . . . . . . . . . . . 18 10.2. . . . . . . . . . . . . . . . . . . . 18 10.3. . . . . . . . . . . . . . . . . . . . . . 18 10.4. . . . . . . . . . . . . . . . . . . 18 10.4.1. . . . . . . . . . . . . . . . . . . . 19 10.4.2. . . . . . . . . . . . . . . . . . . . . 20 10.5. . . . . . . . . . . . . . . . . . 21 10.6. . . . . . . . . . . . . . . . . . . . 21 10.7. . . . . . . . . . . . . . . . . . . . 22 10.8. . . . . . . . . . . . . . . . . . . . . . . . 22 10.9. . . . . . . . . . . . . . . . . . . . . . . . 22 10.10. . . . . . . . . . . . . . . . . . . . . . . . . 22 10.11. . . . . . . . . . . . . . . . . . . . . . . 22 10.12. . . . . . . . . . . . . . . . . . . . . . . . . 22 10.13. . . . . . . . . . . . . . . . . . . . . . . 23 10.14. . . . . . . . . . . . . . . . . . . . . . . . 23 10.15. . . . . . . . . . . . . . . . . . . . . . . . . . 23 10.16. . . . . . . . . . . . . . . . . . . . . . . . 23 10.17. . . . . . . . . . . . . . . . . . . 24 10.18. . . . . . . . . . . . . . . . . . . . . . . . 24 10.19. captureID attribute . . . . . . . . . . . . . . . . . . . 24 11. Audio captures . . . . . . . . . . . . . . . . . . . . . . . . 24 11.1. . . . . . . . . . . . . . . . . . . 25 12. Video captures . . . . . . . . . . . . . . . . . . . . . . . . 25 12.1. . . . . . . . . . . . . . . . . . . . . . 26 13. Text captures . . . . . . . . . . . . . . . . . . . . . . . . 26 14. . . . . . . . . . . . . . . . . . . . . . . . . 27 14.1. . . . . . . . . . . . . . . . . . . . . . 27 14.2. sceneID attribute . . . . . . . . . . . . . . . . . . . . 28 14.3. scale attribute . . . . . . . . . . . . . . . . . . . . . 28 15. . . . . . . . . . . . . . . . . . . . . . . . . . 28 15.1. . . . . . . . . . . . . . . . . . . . . 29 15.2. sceneEntryID attribute . . . . . . . . . . . . . . . . . 29 15.3. mediaType attribute . . . . . . . . . . . . . . . . . . . 30 16. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 16.1. . . . . . . . . . . . . . . . . . . . . . 30 16.2. . . . . . . . . . . . . . . . . . . . . . 30 16.3. encodingID attribute . . . . . . . . . . . . . . . . . . 30 Presta & Romano Expires August 7, 2014 [Page 2] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 17. Audio encodings . . . . . . . . . . . . . . . . . . . . . . . 31 18. Video encodings . . . . . . . . . . . . . . . . . . . . . . . 31 19. . . . . . . . . . . . . . . . . . . . . . . . 32 19.1. . . . . . . . . . . . . . . . . . . . 32 19.2. . . . . . . . . . . . . . . . . . . . . 32 19.3. encodingGroupID attribute . . . . . . . . . . . . . . . . 33 20. . . . . . . . . . . . . . . . . . . . . . . 33 20.1. . . . . . . . . . . . . . . . . . . . . . 33 20.2. . . . . . . . . . . . . . . . . . . . . 33 21. . . . . . . . . . . . . . . . . . . . . . . 33 21.1. . . . . . . . . . . . . . . . . . . . . 34 21.2. . . . . . . . . . . . . . . . . . . . . . . 34 22. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 23. Sample XML file . . . . . . . . . . . . . . . . . . . . . . . 35 24. MCC example . . . . . . . . . . . . . . . . . . . . . . . . . 42 25. Diff with draft-ietf-clue-data-model-schema-01 version . . . 49 26. Diff with draft-ietf-clue-data-model-schema-02 version . . . 50 27. Informative References . . . . . . . . . . . . . . . . . . . . 50 Presta & Romano Expires August 7, 2014 [Page 3] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 1. Introduction This document provides an XML schema file for the definition of CLUE data model types. The schema is based on information contained in [I-D.ietf-clue-framework]. It encodes information and constraints defined in the aforementioned document in order to provide a formal representation of the concepts therein presented. The schema definition is intended to be modified according to changes applied to the above mentioned CLUE document. The document actually represents a proposal aiming at the definition of a coherent structure for all the information associated with the description of a telepresence scenario. 2. Terminology This document refers to the same terminology used in [I-D.ietf-clue-framework]. We briefly recall herein some of the main terms exploited in the document. Audio Capture: Media Capture for audio. Denoted as ACn in the example cases in this document. Camera-Left and Right: For Media Captures, camera-left and cameraright are from the point of view of a person observing the rendered media. They are the opposite of Stage-Left and Stage- Right. Capture: Same as Media Capture. Capture Device: A device that converts audio and video input into an electrical signal, in most cases to be fed into a media encoder. Capture Encoding: A specific encoding of a Media Capture, to be sent by a Media Provider to a Media Consumer via RTP. Capture Scene: An abstraction grouping semantically-coupled Media Captures available at the Media Provider's side, representing a precise portion of the local scene that can be transmitted remotely. Capture Scene MAY correspond to a part of the telepresence room or MAY focus only on the presentation media. A Capture Scene is characterized by a set of attributes and by a set of Capture Scene Entries. Presta & Romano Expires August 7, 2014 [Page 4] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Capture Scene Entry: A list of Media Captures of the same media type that constitute a possible representation of a Capture Scene. Media Capture belonging to the same Capture Scene Entry can be sent simultaneously by the Media Provider. CLUE Participant: An entity able to use the CLUE protocol within a telepresence session. It can be an Endpoint or a MCU able to use the CLUE protocol. Consumer: Same as Media Consumer. Encoding or Individual Encoding: The representation of an encoding technology. In the CLUE datamodel, for each encoding it is provided a set of parameters representing the encoding constraints, like for example the maximum bandwidth of the Media Provider the encoding can consume. s Encoding Group: The representation of a group of encodings. For each group, it is provided a set of parameters representing the constraints to be applied to the group as a whole. An example is the maximum bandwidth that can be consumed when using the contained encodings together simultaneously. Endpoint The logical point of final termination through receiving, decoding and rendering, and/or initiation through capturing, encoding, and sending of media streams. An endpoint consists of one or more physical devices which source and sink media streams, and exactly one SIP Conferencing Framework Participant (which, in turn, includes exactly one SIP User Agent). Endpoints can be anything from multiscreen/multicamera room controllers to handheld devices. MCU: Multipoint Control Unit (MCU) - a device that connects two or more endpoints together into one single multimedia conference. An MCU may include a Mixer. Media: Any data that, after suitable encoding, can be conveyed over RTP, including audio, video or timed text. Media Capture: A "Media Capture", or simply "Capture", is a source of Media of a single type (i.e., audio or video or text). Media Stream: The term "Media Stream", or simply "Stream", is used as a synonymous of Capture Encoding. Presta & Romano Expires August 7, 2014 [Page 5] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Media Provider: A CLUE participant (i.e., an Endpoint or a MCU) able to send Media Streams. Media Consumer: A CLUE participant (i.e., an Endpoint or a MCU) able to receive Media Streams. Scene: Same as Capture Scene. Scene Entry: Same as Capture Scene Entry. Stream: Same of Media Stream. Multiple Content Capture: A Capture that can contain different Media Captures of the same media type. It is denoted as MCC in this document. In the Stream resulting from the MCC, the Stream coming from the encoding of the composing Media Captures can appear simultaneously, if the MCC is the result of a mixing operation, or can appear alternatively over the time, according to a certain switching policy. Plane of Interest: The spatial plane containing the most relevant subject matter. Provider: Same as Media Provider. Render: Simultaneous Transmission Set: a set of Media Captures that can be transmitted simultaneously from a Media Provider. Single Media Capture: A Capture representing the Media coming from a single-source Capture Device. Spatial Information: Data about the spatial position of a Capture Device that generate a Single Media Capture within the context of a Capture Scene representing a phisical portion of a Telepresence Room. Stream Characteristics: The union of the features used to describe a Stream in the CLUE environment and in the SIP-SDP environment Video Capture: A Media Capture for video. 3. XML Schema This section contains the proposed CLUE data model schema definition. The element and attribute definitions are formal representation of Presta & Romano Expires August 7, 2014 [Page 6] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 the concepts needed to describe the capabilities of a media provider and the current streams it is transmitting within a telepresence session. The main groups of information are: : the list of media captures available (Section 4) : the list of individual encodings (Section 5) : the list of encodings groups (Section 6) : the list of capture scenes (Section 7) : the list of simultaneous transmission sets(Section 8) : the list of instantiated capture encodings (Section 9) All of the above refers to concepts that have been introduced in [I-D.ietf-clue-framework] and further detailed in the following of this document. Presta & Romano Expires August 7, 2014 [Page 8] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 9] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 10] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 11] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 12] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 13] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 14] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Following sections describe the XML schema in more detail. 4. represents the list of one ore more media captures available on the media provider's side. Each media capture is represented by a element (Section 10). Presta & Romano Expires August 7, 2014 [Page 15] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 5. represents the list of individual encodings available on the media provider's side. Each individual encoding is represented by an element (Section 16). 6. represents the list of the encoding groups organized on the media provider's side. Each encoding group is represented by a element (Section 19). 7. represents the list of the capture scenes organized on the media provider's side. Each capture scene is represented by a element. (Section 14). 8. contains the simultaneous sets indicated by the media provider. Each simultaneous set is represented by a element. (Section 20). 9. is a list of capture encodings. It can represents the list of the desired capture encodings indicated by the media consumer or the list of instantiated captures on the provider's side. Each capture encoding is represented by a element. (Section 21). 10. According to the CLUE framework, a media capture is the fundamental representation of a media flow that is available on the provider's side. Media captures are characterized with a set of features that are independent from the specific type of medium, and with a set of feature that are media-specific. We design the media capture type as an abstract type, providing all the features that can be common to all media types. Media-specific captures, such as video captures, audio captures and others, are specialization of that media capture type, as in a typical generalization-specialization hierarchy. The following is the XML Schema definition of the media capture type: Presta & Romano Expires August 7, 2014 [Page 16] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 Presta & Romano Expires August 7, 2014 [Page 17] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 10.1. is a mandatory field specifying the media type of the capture ("audio", "video", "text",...). 10.2. is a mandatory field containing the identifier of the capture scene the media capture belongs to. Indeed, each media capture must be associated with one and only one capture scene. When a media capture is spatially definable, some spatial information is provided along with it in the form of point coordinates (see Section 10.4). Such coordinates refers to the space of coordinates defined for the capture scene containing the capture. 10.3. is a mandatory field containing the identifier of the encoding group the media capture is associated with. 10.4. Media captures are divided into two categories: non spatially definable captures and spatially definable captures. Non spatially definable captures are those that do not capture parts of the telepresence room. Capture of this case are for example those related to registrations, text captures, DVDs, registered presentation, or external streams, that are played in the telepresence room and transmitted to remote sites. Spatially definable captures are those that capture part of the telepresence room. The captured part of the telepresence room is described by means of the element. This is the definition of the spatial information type: Presta & Romano Expires August 7, 2014 [Page 18] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 The contains the coordinates of the capture device that is taking the capture, as well as, optionally, the pointing direction (see Section 10.4.1). It is a mandatory field when the media capture is spatially definable, independently from the media type. The is an optional field containing four points defining the captured area represented by the capture (see Section 10.4.2). 10.4.1. The element is used to represent the position and the line of capture of a capture device. The XML Schema definition of the element type is the following: Presta & Romano Expires August 7, 2014 [Page 19] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 The point type contains three spatial coordinates ("x","y","z") representing a point in the space associated with a certain capture scene. The capture point type extends the point type, i.e., it is represented by three coordinates identifying the position of the capture device, but can add further information. Such further information is conveyed by the , which is another point-type element representing the "point on line of capture", that gives the pointing direction of the capture device. If the point of capture is not specified, it means the consumer should not assume anything about the spatial location of the capturing device. The coordinates of the point on line of capture MUST NOT be identical to the capture point coordinates. If the point on line of capture is not specified, no assumptions are made about the axis of the capturing device. 10.4.2. is an optional element that can be contained within the spatial information associated with a media capture. It represents Presta & Romano Expires August 7, 2014 [Page 20] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 the spatial area captured by the media capture. The XML representation of that area is provided through a set of four point-type element, , , , and , as it can be seen from the following definition: , , , and should be co- planar. By comparing the capture area of different media captures within the same capture scene, a consumer can determine the spatial relationships between them and render them correctly. If the area of capture is not specified, it means the Media Capture is not spatially related to any other media capture. 10.5. When media captures are non spatially definable, they are marked with the boolean element set to "true". 10.6. A media capture can be alternatively a single media capture or a multiple content capture. A multiple content capture is made by different captures that can be arranged spatially (by a mixing operation), or temporally (by a switching operation), or that can result from the orchestration of both the techniques. If a media capture is a MCC, then it must show in its XML data model representation the . It is the identifiers list of the media captures that can be part of the content of the multiple content capture. [containedCaptureIDs or contentIDs or contentCaptureIDs?] Presta & Romano Expires August 7, 2014 [Page 21] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 10.7. is an optional element for multiple content captures that contains a numeric identifier. Multiple content captures marked with the same identifier in the contain at each time captures coming from the same room endpoint. 10.8. is an optional boolean element that can be used only for multiple content captures. It indicates wheter or not a multiple content capture is a mix (audio) or a composition (video) of streams. This attribute is useful for a media consumer for example to avoid nesting a composed video capture into another composed capture or rendering. [edt's note: proposal - discussion needed] 10.9. is an optional boolean element that can be used only for multiple content captures. It indicates wheter or not a multiple content capture switches over the time. [edt's note: proposal - discussion needed] 10.10. is an optional element that can be used only for multiple content captures. It indicates the criteria applied to build the multiple content capture using the media captures referenced in . Such element can assume a list of pre-defined values ([todo]). 10.11. is an optional element that can be used only for multiple content captures. It indicates the maximum number of media captures that can be represented in the multiple content capture at a time. 10.12. is a mandatory boolean element that must be used for single- content captures. Its value is fixed and set to "true". Such element indicates the capture that is being described is not a multiple content capture. Presta & Romano Expires August 7, 2014 [Page 22] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 [edt's note: proposal - discussion needed] 10.13. is used to provide optionally human-readable textual information. It is used to describe media captures, capture scenes and capture scene entries. A media capture can be described by using multiple elements, each one providing information in a different language. Indeed, the element definition is the following: As it can be seen, is a string element with an attribute ("lang") indicating the language used in the textual description. 10.14. is an optional unsigned integer field indicating the importance of a media capture according to the media provider's perspective. It can be used on the receiver's side to automatically identify the most "important" contribution available from the media provider. The higher the importance, the lower the contained value. When media captures are marked with a "0" priority value, it means that they are "not subject to priority". 10.15. is an optional element containing the language used in the capture, if any. 10.16. is an optional element indicating whether or not the capture device originating the capture may move during the Presta & Romano Expires August 7, 2014 [Page 23] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 telepresence session. That optional element can assume one of the three following values: static, dynamic or highly dynamic, defined as in [I-D.ietf-clue-framework]. 10.17. The optional contains an unsigned integer indicating the maximum number of capture encodings that can be simultaneously active for the media capture. If absent, this parameter defaults to 1. The minimum value for this attribute is 1. The number of simultaneous capture encodings is also limited by the restrictions of the encoding group the media capture refers to my means of the element. 10.18. The optional element contains the value of the ID attribute of the media capture it refers to. The media capture marked with a element can be for example the translation of a main media capture in a different language. 10.19. captureID attribute The "captureID" attribute is a mandatory field containing the identifier of the media capture. 11. Audio captures Audio captures inherit all the features of a generic media capture and present further audio-specific characteristics. The XML Schema definition of the audio capture type is reported below: Audio-specific information about the audio capture is contained in Presta & Romano Expires August 7, 2014 [Page 24] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 (Section 11.1). 11.1. The optional element is a field with enumerated values ("mono" and "stereo") which describes the method of encoding used for audio. A value of "mono" means the audio capture has one channel. A value of "stereo" means the audio capture has two audio channels, left and right. A single stereo capture is different from two mono captures that have a left-right spatial relationship. A stereo capture maps to a single RTP stream, while each mono audio capture maps to a separate RTP stream. The XML Schema definition of the element type is provided below: 12. Video captures Video captures, similarly to audio captures, extend the information of a generic media capture with video-specific features, such as (Section 12.1). The XML Schema representation of the video capture type is provided in the following: Presta & Romano Expires August 7, 2014 [Page 25] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 12.1. The element is a boolean element indicating that there is text embedded in the video capture. The language used in such embedded textual description is reported in "lang" attribute. The XML Schema definition of the element is: 13. Text captures Also text captures can be described by extending the generic media capture information, similarly to audio captures and video captures. The XML Schema representation of the text capture type is currently lacking text-specific information, as it can be seen by looking at the definition below: Presta & Romano Expires August 7, 2014 [Page 26] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 14. A media provider organizes the available capture in capture scenes in order to help the receiver both in the rendering and in the selection of the group of captures. Capture scenes are made of capture scene entries, that are set of media captures of the same media type. Each capture scene entry represents an alternative to represent completely a capture scene for a fixed media type. The XML Schema representation of a element is the following: The element can contain zero or more textual elements, defined as in Section 10.13. Besides , there the element (Section 14.1), which is the list of the capture scene entries. 14.1. The element is a mandatory field of a capture scene containing the list of scene entries. Each scene entry is represented by a element (Section 15). Presta & Romano Expires August 7, 2014 [Page 27] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 14.2. sceneID attribute The sceneID attribute is a mandatory attribute containing the identifier of the capture scene. 14.3. scale attribute The scale attribute is a mandatory attribute that specifies the scale of the coordinates provided in the spatial information of the media capture belonging to the considered capture scene. The scale attribute can assume three different values: "millimeters" - the scale is in millimeters. Systems which know their physical dimensions (for example professionally installed telepresence room systems) should always provide those real-world measurements. "unknown" - the scale is not necessarily millimeters, but the scale is the same for every media capture in the capture scene. Systems which don't know specific physical dimensions but still know relative distances should select "unknown" in the scale attribute of the capture scene to be described. "noscale" - there is no a common physical scale among the media captures of the capture scene. That means the scale could be different for each media capture. 15. A element represents a capture scene entry, which contains a set of media capture of the same media type describing a capture scene. Presta & Romano Expires August 7, 2014 [Page 28] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 A element is characterized as follows. One or more optional elements provide human-readable information about what the scene entry contains. is defined as already seen in Section 10.13. The remaining child elements are described in the following subsections. 15.1. The is the list of the identifiers of the media captures included in the scene entry. It is an element of the captureIDListType type, which is defined as a sequence of each one containing the identifier of a media capture listed within the element: 15.2. sceneEntryID attribute The sceneEntryID attribute is a mandatory attribute containing the identifier of the capture scene entry represented by the element. Presta & Romano Expires August 7, 2014 [Page 29] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 15.3. mediaType attribute The mediaType attribute contains the media type of the media captures included in the scene entry. 16. The element represents an individual encoding, i.e., a way to encode a media capture. Individual encodings can be characterized with features that are independent from the specific type of medium, and with features that are media-specific. We design the individual encoding type as an abstract type, providing all the features that can be common to all media types. Media-specific individual encodings, such as video encodings, audio encodings and others, are specialization of that type, as in a typical generalization- specialization hierarchy. 16.1. is a mandatory field containing the name of the encoding (e.g., G711, H264, ...). 16.2. represent the maximum bitrate the media provider can instantiate for that encoding. 16.3. encodingID attribute The encodingID attribute is a mandatory attribute containing the identifier of the individual encoding. Presta & Romano Expires August 7, 2014 [Page 30] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 17. Audio encodings Audio encodings inherit all the features of a generic individual encoding and can present further audio-specific encoding characteristics. The XML Schema definition of the audio encoding type is reported below: Up to now the only audio-specific information is the element containing the media type of the media captures that can be encoded with the considered individual encoding. In the case of audio encoding, that element is forced to "audio". 18. Video encodings Similarly to audio encodings, video encodings can extend the information of a generic individual encoding with video-specific encoding features. The element contains the media type of the media captures that can be encoded with the considered individual encoding. In the case of video encoding, that element is forced to "video". Presta & Romano Expires August 7, 2014 [Page 31] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 19. The element represents an encoding group, which is a set of one or more individual encodings, and parameters that apply to the group as a whole. The definition of the element is the following: In the following, the contained elements are further described. 19.1. is an optional field containing the maximum bitrate supported for all the individual encodings included in the encoding group. 19.2. is the list of the individual encoding grouped together. Each individual encoding is represented through its identifier contained within an element. Presta & Romano Expires August 7, 2014 [Page 32] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 19.3. encodingGroupID attribute The encodingGroupID attribute contains the identifier of the encoding group. 20. represents a simultaneous set, i.e. a list of capture of the same type that cab be transmitted at the same time by a media provider. There are different simultaneous transmission sets for each media type. [edt note: need to be checked] 20.1. contains the identifier of the media capture that belongs to the simultanous set. 20.2. contains the identifier of the scene entry containing a group of capture that are able to be sent simultaneously with the other capture of the simultaneous set. 21. A is given from the association of a media capture and an individual encoding, to form a capture stream as defined in [I-D.ietf-clue-framework]. A media consumer expresses for each capture encoding its preferences about the capture parameters (such as the desired switching policy) and the encoding parameters (such as the bandwidth). A possible solution to model such entity is provided in the following. Presta & Romano Expires August 7, 2014 [Page 33] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 21.1. contains the identifier of the media capture that has been encoded to form the capture encoding. 21.2. contains the identifier of the applied individual encoding. 22. The element has been left within the XML Schema for the sake of convenience when representing a prototype of ADVERTISEMENT message (see the example section). Presta & Romano Expires August 7, 2014 [Page 34] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 23. Sample XML file The following XML document represents a schema compliant example of a CLUE telepresence scenario. In the considered scenario, there are 5 video captures: VC0: the video from the left camera VC1: the video from the central camera VC2: the video from the right camera VC3: the overall view of the telepresence room taken from the central camera VC4: the video associated with the slide stream There are 2 audio captures: AC0: the overall room audio taken from the central camera AC1: the audio associated with the slide stream presentation The captures are organized into two capture scenes: Presta & Romano Expires August 7, 2014 [Page 35] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 CS1: this scene contains captures associated with the participants that are in the telepresence room. CS2: this scene contains captures associated with the slide presentation, which is a pre-registered presentation played within the context of the telepresence session. Within the capture scene CS1, there are three scene entries available: CS1_SE1: this entry contains the partipants' video captures taken from the three cameras (VC0, VC1, VC2). CS1_SE2: this entry contains the zoomed-out view of the overall telepresence room (VC3) CS1_SE3: this entry contains the overall telepresence room audio (AC0) On the other hand, capture scene CS2 presents two scene entries: CS2_SE1: this entry contains the presentation audio stream (AC1) CS2_SE2: this entry contains the presentation video stream (VC4) There are two encoding groups: EG0 This encoding groups involves video encodings ENC0, ENC1, ENC2. All of them are identical video encodings that can consume up to 128K bps. The total amount of available bandwidth for EG0 is 384Kbps. That means that only three video capture encodings can be issued at a time. EG1 This encoding groups involves audio encodings ENC3, ENC4. Both of them have a maximum bandwidth of 64Kbps. The EG1 maximum bandwidth is 128Kbps. That means that only two audio captures can be issued at a time. As to the simultaneous sets, only VC1 and VC3 cannot be transmitted simultaneously since they are captured by the same device. i.e. the central camera (VC3 is a zoomed-out view while VC1 is a focused view of the front participants). The simultaneous sets would then be the following: SS1 made by VC0, VC1, VC2, VC4 Presta & Romano Expires August 7, 2014 [Page 36] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 SS2 made by VC0, VC3, VC2, VC4 SS3 made by AC0 and AC1 video CS1 EG0 1.0 1.0 1.0 1.0 0.0 1.0 true left-most participants video Italian static individual video CS1 EG0 0.0 1.0 1.0 0.0 0.0 1.0 Presta & Romano Expires August 7, 2014 [Page 37] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 true central participants video Italian static individual video CS1 EG0 -1.0 1.0 1.0 -1.0 0.0 1.0 true right-most participants video Italian static individual video CS1 EG0 0.0 1.0 1.0 0.0 0.0 1.0 Presta & Romano Expires August 7, 2014 [Page 38] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 true overall room video Italian static room video cs2 EG0 true true slides video Italian slides audio CS1 EG1 0.0 1.0 1.0 0.0 0.0 1.0 true room audio Italian static room stereo audio cs2 EG1 true Presta & Romano Expires August 7, 2014 [Page 39] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 true slide presentation audio Italian slides mono vp8 128000 video vp8 128000 video vp8 128000 video g711 64000 audio g711 32000 audio 0 ENC0 ENC1 ENC2 Presta & Romano Expires August 7, 2014 [Page 40] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 0 ENC3 ENC4 main scene overall room audio ac0 overall room video vc3 participants video vc0 vc1 vc2 presentation scene ac1 vc4 Presta & Romano Expires August 7, 2014 [Page 41] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 vc0 vc3 vc2 vc4 vc0 vc1 vc2 vc4 ac0 ac1 24. MCC example In this example, the endpoint is equipped with three cameras capturing three individual captures (VC1, VC2, VC3). The central one is able to capture a zoomed-out view of the room (VC4). Moreover, the MP is able to advertise a composed MCC (MCC5) made by a big picture representing the current speaker (MCC0) and two picture-in- picture boxes representing the previous speakers (the previous one -MCC1- and the oldest one - MCC2-). A possible description for that scenario could be the following: video CS1 EG0 Presta & Romano Expires August 7, 2014 [Page 42] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 0.5 1.0 0.5 0.5 0.0 0.5 0.0 3.0 0.0 1.0 3.0 0.0 0.0 3.0 3.0 1.0 3.0 3.0 true left-most participants video Italian static individual video CS1 EG0 0.0 1.0 1.0 Presta & Romano Expires August 7, 2014 [Page 43] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 0.0 0.0 1.0 true central participants video Italian static individual video CS1 EG0 0.5 1.0 0.5 0.5 0.0 0.5 0.0 3.0 0.0 1.0 3.0 0.0 0.0 3.0 3.0 1.0 3.0 Presta & Romano Expires August 7, 2014 [Page 44] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 3.0 true right-most participants video Italian static individual video CS1 EG0 0.5 1.0 0.5 0.5 0.0 0.5 0.0 3.0 0.0 1.0 3.0 0.0 0.0 3.0 3.0 1.0 3.0 3.0 Presta & Romano Expires August 7, 2014 [Page 45] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 true table view video Italian static table video CS1 EG0 true VC1 vc2 vc3 false true SoundLevel:0 1 video of the current loudest speaker Italian static individual video CS1 EG0 true VC1 vc2 vc3 false true SoundLevel:1 1 video of the previous loudest speaker Italian static individual Presta & Romano Expires August 7, 2014 [Page 46] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 video CS1 EG0 true VC1 vc2 vc3 false true SoundLevel:2 1 video of the oldest loudest speaker Italian static individual video CS1 EG0 true mcc0 mcc1 mcc2 true true SoundLevel 1 big video of the current loudest speaker + PiPs of previous speakers Italian static individual vp8 128000 video Presta & Romano Expires August 7, 2014 [Page 47] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 vp8 128000 video vp8 128000 video 384000 ENC0 ENC1 ENC2 main scene participants video VC1 vc2 vc3 overall room video vc4 multi-content captures mcc0 mcc1 mcc2 Presta & Romano Expires August 7, 2014 [Page 48] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 composed capture mcc5 VC1 vc2 vc3 CS1_SE3 CS1_SE4 VC1 vc4 vc3 CS1_SE3 CS1_SE4 25. Diff with draft-ietf-clue-data-model-schema-01 version Terminology has been added. Switching policies has been removed from capture scene entry attributes list. multiple content captures can now be modeled by using the schema. Related attributes (composed, switching, maxCaptures, captureIDs) have been added in the definition of the media capture type. H26X encoding type has been removed. maxGroupPps (max pixels per second for encoding groups) removed Presentation attribute for media captures has been added. Presta & Romano Expires August 7, 2014 [Page 49] Internet-Draft draft-ietf-clue-data-model-schema-03 February 2014 View attribute for media captures has been added. 26. Diff with draft-ietf-clue-data-model-schema-02 version captureParameters and encodingParameters have been removed from the captureEncodingType data model example has been updated and validated according to the new schema. Further description of the represented scenario have been provided. A multiple content capture example has been added. Obsolete comments and references have been removed. 27. Informative References [I-D.ietf-clue-framework] Duckworth, M., Pepperell, A., and S. Wenger, "Framework for Telepresence Multi- Streams", draft-ietf-clue-framework-13 (work in progress), December 2013. [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description Protocol (SDP) Content Attribute", RFC 4796, February 2007. Authors' Addresses Roberta Presta University of Napoli Via Claudio 21 Napoli 80125 Italy EMail: roberta.presta@unina.it Simon Pietro Romano University of Napoli Via Claudio 21 Napoli 80125 Italy EMail: spromano@unina.it Presta & Romano Expires August 7, 2014 [Page 50]