idnits 2.17.1 draft-ietf-clue-data-model-schema-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 51 instances of too long lines in the document, the longest one being 18 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 176: '... Capture Scene MAY correspond to a r...' RFC 2119 keyword, line 956: '... MUST be included in spatially defin...' RFC 2119 keyword, line 991: '... line of capture MUST NOT be identical...' RFC 2119 keyword, line 993: '...of capture is provided, it MUST belong...' RFC 2119 keyword, line 1003: '...e media capture. MUST be...' (18 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 128 has weird spacing: '...ff with draft...' == Line 2821 has weird spacing: '...ff with draft...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: static SHOULD not change for the duration of the CLUE session, across multiple ADVERTISEMENT messages. -- The document date (June 29, 2015) is 3223 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0-9' is mentioned on line 1101, but not defined == Unused Reference: 'RFC4796' is defined on line 2882, but no explicit reference was found in the text == Outdated reference: A later version (-18) exists of draft-ietf-clue-datachannel-09 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-22 == Outdated reference: A later version (-19) exists of draft-ietf-clue-protocol-04 -- Obsolete informational reference (is this intentional?): RFC 3023 (Obsoleted by RFC 7303) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE Working Group R. Presta 3 Internet-Draft S P. Romano 4 Intended status: Standards Track University of Napoli 5 Expires: December 31, 2015 June 29, 2015 7 An XML Schema for the CLUE data model 8 draft-ietf-clue-data-model-schema-10 10 Abstract 12 This document provides an XML schema file for the definition of CLUE 13 data model types. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on December 31, 2015. 32 Copyright Notice 34 Copyright (c) 2015 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 49 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 50 3. XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 6 51 4. . . . . . . . . . . . . . . . . . . . . . . . 17 52 5. . . . . . . . . . . . . . . . . . . . . . . . 17 53 6. . . . . . . . . . . . . . . . . . . . . . . . 17 54 7. . . . . . . . . . . . . . . . . . . . . . . 17 55 8. . . . . . . . . . . . . . . . . . . . . . . . . 18 56 9. . . . . . . . . . . . . . . . . . . . . . . 18 57 10. . . . . . . . . . . . . . . . . . . . . . . . . 18 58 10.1. captureID attribute . . . . . . . . . . . . . . . . . . . 19 59 10.2. mediaType attribute . . . . . . . . . . . . . . . . . . . 20 60 10.3. . . . . . . . . . . . . . . . . . . . 20 61 10.4. . . . . . . . . . . . . . . . . . . . . . 20 62 10.5. . . . . . . . . . . . . . . . . . . 20 63 10.5.1. . . . . . . . . . . . . . . . . . . . 21 64 10.5.2. . . . . . . . . . . . . . . . . . . . . 22 65 10.6. . . . . . . . . . . . . . . . . . 23 66 10.7. . . . . . . . . . . . . . . . . . . . . . . . . 23 67 10.8. . . . . . . . . . . . . . . . . . . . 24 68 10.9. . . . . . . . . . . . . . . . . . . . 24 69 10.10. . . . . . . . . . . . . . . . . . . . . . . . . 24 70 10.11. . . . . . . . . . . . . . . . . . . . . . . 25 71 10.12. . . . . . . . . . . . . . . . . . . . . . . 26 72 10.13. . . . . . . . . . . . . . . . . . . . . . . 26 73 10.14. . . . . . . . . . . . . . . . . . . . . . . . 27 74 10.15. . . . . . . . . . . . . . . . . . . . . . . . . . 27 75 10.16. . . . . . . . . . . . . . . . . . . . . . . . 27 76 10.17. . . . . . . . . . . . . . . . . . . . . . . . 27 77 10.18. . . . . . . . . . . . . . . . . . . . . . . . . . 28 78 10.19. . . . . . . . . . . . . . . . . . . . . . 28 79 10.19.1. . . . . . . . . . . . . . . . . . . . 28 80 10.19.2. . . . . . . . . . . . . . . . . . . 29 81 10.20. Audio captures . . . . . . . . . . . . . . . . . . . . . 29 82 10.20.1. . . . . . . . . . . . . . . . . 30 83 10.21. Video captures . . . . . . . . . . . . . . . . . . . . . 30 84 10.22. Text captures . . . . . . . . . . . . . . . . . . . . . . 31 85 10.23. Other capture types . . . . . . . . . . . . . . . . . . . 31 86 10.24. . . . . . . . . . . . . . . . . . . . . . 32 87 10.24.1. . . . . . . . . . . . . . . . . . 33 88 10.24.2. . . . . . . . . . . . . . . . . . . . . 33 89 10.24.3. sceneID attribute . . . . . . . . . . . . . . . . . . 33 90 10.24.4. scale attribute . . . . . . . . . . . . . . . . . . . 33 91 10.25. . . . . . . . . . . . . . . . . . . . . . . . 34 92 10.25.1. . . . . . . . . . . . . . . . . . . 35 93 10.25.2. sceneViewID attribute . . . . . . . . . . . . . . . . 35 94 10.26. . . . . . . . . . . . . . . . . . . . . . 35 95 10.26.1. . . . . . . . . . . . . . . . . . 36 96 10.26.2. . . . . . . . . . . . . . . . . . . 36 97 10.26.3. encodingGroupID attribute . . . . . . . . . . . . . . 36 98 10.27. . . . . . . . . . . . . . . . . . . . . 36 99 10.27.1. setID attribute . . . . . . . . . . . . . . . . . . . 37 100 10.27.2. mediaType attribute . . . . . . . . . . . . . . . . . 37 101 10.27.3. . . . . . . . . . . . . . . . . . 38 102 10.27.4. . . . . . . . . . . . . . . . . . . 38 103 10.27.5. . . . . . . . . . . . . . . . . . 38 104 10.28. . . . . . . . . . . . . . . . . . . . . . . 38 105 10.29. . . . . . . . . . . . . . . . . . . . . . . . . 38 106 10.29.1. . . . . . . . . . . . . . . . . . . . . . . 39 107 11. . . . . . . . . . . . . . . . . . . . . . . 40 108 11.1. . . . . . . . . . . . . . . . . . . . . . . . 41 109 11.2. . . . . . . . . . . . . . . . . . . . . . . 41 110 11.3. . . . . . . . . . . . . . . . . . . . 41 111 12. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 112 13. XML Schema extensibility . . . . . . . . . . . . . . . . . . . 42 113 13.1. Example of extension . . . . . . . . . . . . . . . . . . 43 114 14. Security considerations . . . . . . . . . . . . . . . . . . . 44 115 15. IANA considerations . . . . . . . . . . . . . . . . . . . . . 45 116 15.1. XML namespace registration . . . . . . . . . . . . . . . 45 117 15.2. XML Schema registration . . . . . . . . . . . . . . . . . 46 118 15.3. MIME Media Type Registration for 119 'application/clue_info+xml' . . . . . . . . . . . . . . . 46 120 16. Sample XML file . . . . . . . . . . . . . . . . . . . . . . . 47 121 17. MCC example . . . . . . . . . . . . . . . . . . . . . . . . . 54 122 18. Diff with draft-ietf-clue-data-model-schema-09 version . . . . 60 123 19. Diff with draft-ietf-clue-data-model-schema-08 version . . . . 61 124 20. Diff with draft-ietf-clue-data-model-schema-07 version . . . . 61 125 21. Diff with draft-ietf-clue-data-model-schema-06 version . . . . 61 126 22. Diff with draft-ietf-clue-data-model-schema-04 version . . . . 62 127 23. Diff with draft-ietf-clue-data-model-schema-03 version . . . . 62 128 24. Diff with draft-ietf-clue-data-model-schema-02 version . . . 63 129 25. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63 130 26. Informative References . . . . . . . . . . . . . . . . . . . . 63 132 1. Introduction 134 This document provides an XML schema file for the definition of CLUE 135 data model types. 137 The schema is based on information contained in 138 [I-D.ietf-clue-framework]. It encodes information and constraints 139 defined in the aforementioned document in order to provide a formal 140 representation of the concepts therein presented. 142 The document aims at the definition of a coherent structure for 143 information associated with the description of a telepresence 144 scenario. Such information is used within the CLUE protocol messages 145 ([I-D.ietf-clue-protocol]) enabling the dialogue between a Media 146 Provider and a Media Consumer. CLUE protocol messages, indeed, are 147 XML messages allowing (i) a Media Provider to advertise its 148 telepresence capabilities in terms of media captures, capture scenes, 149 and other features envisioned in the CLUE framework, according to the 150 format herein defined and (ii) a Media Consumer to request the 151 desired telepresence options in the form of capture encodings, 152 represented as described in this document. 154 2. Terminology 156 This document refers to the same terminology used in 157 [I-D.ietf-clue-framework], except for the "CLUE Participant" 158 definition. We briefly recall herein some of the main terms used in 159 the document. 161 Audio Capture: Media Capture for audio. Denoted as ACn in the 162 examples in this document. 164 Capture: Same as Media Capture. 166 Capture Device: A device that converts physical input, such as 167 audio, video or text, into an electrical signal, in most cases to 168 be fed into a media encoder. 170 Capture Encoding: A specific encoding of a Media Capture, to be sent 171 by a Media Provider to a Media Consumer via RTP. 173 Capture Scene: A structure representing a spatial region captured by 174 one or more Capture Devices, each capturing media representing a 175 portion of the region. The spatial region represented by a 176 Capture Scene MAY correspond to a real region in physical space, 177 such as a room. A Capture Scene includes attributes and one or 178 more Capture Scene Views, with each view including one or more 179 Media Captures. 181 Capture Scene View: A list of Media Captures of the same media type 182 that together form one way to represent the entire Capture Scene.. 184 CLUE Participant: This term is not imported from the framework 185 terminology. A CLUE Participant identifies a generic entity 186 (either an Endpoint or a MCU) making use of the CLUE protocol. 188 Consumer: Short for Media Consumer. 190 Encoding or Individual Encoding: A set of parameters representing a 191 way to encode a Media Capture to become a Capture Encoding. 193 Encoding Group: A set of encoding parameters representing a total 194 media encoding capability to be sub-divided across potentially 195 multiple Individual Encodings. 197 Endpoint A CLUE-capable device which is the logical point of final 198 termination through receiving, decoding and rendering, and/or 199 initiation through capturing, encoding, and sending of media 200 streams. An endpoint consists of one or more physical devices 201 which source and sink media streams, and exactly one [RFC4353] 202 Participant (which, in turn, includes exactly one SIP User Agent). 203 Endpoints can be anything from multiscreen/multicamera rooms to 204 handheld devices. 206 Media: Any data that, after suitable encoding, can be conveyed over 207 RTP, including audio, video or timed text. 209 Media Capture: A source of Media, such as from one or more Capture 210 Devices or constructed from other Media streams. 212 Media Consumer: A CLUE-capable device that intends to receive 213 Capture Encodings. 215 Media Provider: A CLUE-capable device that intends to send Capture 216 Encodings. 218 Multiple Content Capture: A Capture that mixes and/or switches other 219 Captures of a single type. (E.g. all audio or all video.) 220 Particular Media Captures may or may not be present in the 221 resultant Capture Encoding depending on time or space. Denoted as 222 MCCn in the example cases in this document. 224 Multipoint Control Unit (MCU): A CLUE-capable device that connects 225 two or more endpoints together into one single multimedia 226 conference [RFC5117]. An MCU includes an [RFC4353] like Mixer, 227 without the [RFC4353] requirement to send media to each 228 participant. 230 Plane of Interest: The spatial plane containing the most relevant 231 Subject matter. 233 Provider: Same as Media Provider. 235 Render: The process of reproducing the received Streams like, for 236 instance, displaying of the remote video on the Media Consumer's 237 screens, or playing of the remote audio through loudspeakers. 239 Scene: Same as Capture Scene. 241 Simultaneous Transmission Set: A set of Media Captures that can be 242 transmitted simultaneously from a Media Provider. 244 Single Media Capture: A capture which contains media from a single 245 source capture device, e.g. an audio capture from a single 246 microphone, a video capture from a single camera. 248 Spatial Relation: The arrangement in space of two objects, in 249 contrast to relation in time or other relationships. 251 Stream: A Capture Encoding sent from a Media Provider to a Media 252 Consumer via RTP [RFC3550]. 254 Stream Characteristics: The union of the features used to describe a 255 Stream in the CLUE environment and in the SIP-SDP environment. 257 Video Capture: A Media Capture for video. 259 3. XML Schema 261 This section contains the CLUE data model schema definition. 263 The element and attribute definitions are formal representations of 264 the concepts needed to describe the capabilities of a Media Provider 265 and the streams that are requested by a Media Consumer given the 266 Media Provider's ADVERTISEMENT ([I-D.ietf-clue-protocol]). 268 The main groups of information are: 270 : the list of media captures available (Section 4) 272 : the list of encoding groups (Section 5) 274 : the list of capture scenes (Section 6) 276 : the list of simultaneous transmission sets 277 (Section 7) 278 : the list of global views sets (Section 8) 280 : meta data about the participants represented in the 281 telepresence session (Section 10.29). 283 : the list of instantiated capture encodings 284 (Section 9) 286 All of the above refers to concepts that have been introduced in 287 [I-D.ietf-clue-framework] and further detailed in this documentollowing sections describe the XML schema in more detail. As a 760 general remark, please notice that optional elements that don't 761 define what their absence means are intended to be associated with 762 undefined properties. 764 4. 766 represents the list of one or more media captures 767 available at the Media Provider's side. Each media capture is 768 represented by a element (Section 10). 770 5. 772 represents the list of the encoding groups organized 773 on the Media Provider's side. Each encoding group is represented by 774 an element (Section 10.26). 776 6. 778 represents the list of the capture scenes organized 779 on the Media Provider's side. Each capture scene is represented by a 780 element. (Section 10.24). 782 7. 784 contains the simultaneous sets indicated by the 785 Media Provider. Each simultaneous set is represented by a 786 element. (Section 10.27). 788 8. 790 contains a set of alternative representations of all 791 the scenes that are offered by a Media Provider to a Media Consumer. 792 Each alternative is named "global view" and it is represented by a 793 element. (Section 10.28). 795 9. 797 is a list of capture encodings. It can represent 798 the list of the desired capture encodings indicated by the Media 799 Consumer or the list of instantiated captures on the provider's side. 800 Each capture encoding is represented by a element. 801 (Section 11). 803 10. 805 A Media Capture is the fundamental representation of a media flow 806 that is available on the provider's side. Media captures are 807 characterized (i) by a set of features that are independent from the 808 specific type of medium, and (ii) by a set of features that are 809 media-specific. The features that are common to all media types 810 appear within the media capture type, that has been designed as an 811 abstract complex type. Media-specific captures, such as video 812 captures, audio captures and others, are specializations of that 813 abstract media capture type, as in a typical generalization- 814 specialization hierarchy. 816 The following is the XML Schema definition of the media capture type: 818 819 820 821 822 823 824 825 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 847 848 849 850 851 852 853 854 855 856 858 10.1. captureID attribute 860 The "captureID" attribute is a mandatory field containing the 861 identifier of the media capture. Such an identifier serves as the 862 way the capture is referenced from other data model elements (e.g., 863 simultaneous sets, capture encodings, and others). 865 10.2. mediaType attribute 867 The "mediaType" attribute is a mandatory attribute specifying the 868 media type of the capture. Common values are "audio", "video", 869 "text". Other values can be provided. It is assumed that 870 implementations agree on the interpretation of those other values. 872 10.3. 874 is a mandatory field containing the value of the 875 identifier of the capture scene the media capture is defined in, 876 i.e., the value of the sceneID (Section 10.24.3) attribute of that 877 capture scene. Indeed, each media capture must be defined within one 878 and only one capture scene. When a media capture is spatially 879 definable, some spatial information is provided along with it in the 880 form of point coordinates (see Section 10.5). Such coordinates refer 881 to the space of coordinates defined for the capture scene containing 882 the capture. 884 10.4. 886 is an optional field containing the identifier of the 887 encoding group the media capture is associated with, i.e., the value 888 of the encodingGroupID (Section 10.26.3) attribute of that encoding 889 group. Media captures that are not associated with any encoding 890 group can not be instantiated as media streams. 892 10.5. 894 Media captures are divided into two categories: (i) non spatially 895 definable captures and (ii) spatially definable captures. 897 Captures are spatially definable when at least (i) it is possible to 898 provide the coordinates of the device position within the 899 telepresence room of origin (capture point) together with its 900 capturing direction specified by a second point (point on line of 901 capture), or (ii) it is possible to provide the represented area 902 within the telepresence room, by listing the coordinates of the four 903 co-planar points identifying the plane of interest (area of capture). 904 The coordinates of the abovementioned points must be expressed 905 according to the coordinate space of the capture scene the media 906 captures belongs to. 908 Non spatially definable captures cannot be characterized within the 909 physical space of the telepresence room of origin. Captures of this 910 kind are for example those related to recordings, text captures, 911 DVDs, registered presentations, or external streams that are played 912 in the telepresence room and transmitted to remote sites. 914 Spatially definable captures represent a part of the telepresence 915 room. The captured part of the telepresence room is described by 916 means of the element. By comparing the 917 element of different media captures within the 918 same capture scene, a consumer can better determine the spatial 919 relationships between them and render them correctly. Non spatially 920 definable captures do not embed such element in their XML 921 description: they are instead characterized by having the 922 tag set to "true" (see Section 10.6). 924 The definition of the spatial information type is the following: 926 927 928 929 930 931 933 934 935 937 The contains the coordinates of the capture device 938 that is taking the capture (i.e., the capture point) , as well as, 939 optionally, the pointing direction (i.e., the point on line of 940 capture) (see Section 10.5.1). 942 The is an optional field containing four points 943 defining the captured area covered by the capture (see 944 Section 10.5.2). 946 The scale of the points coordinates is specified in the scale 947 (Section 10.24.4) attribute of the capture scene the media capture 948 belongs to. Indeed, all the spatially definable media captures 949 referring to the same capture scene share the same coordinate system 950 and express their spatial information according to the same scale. 952 10.5.1. 954 The element is used to represent the position and 955 optionally the line of capture of a capture device. 956 MUST be included in spatially definable audio captures, while it is 957 optional for spatially definable video captures. 959 The XML Schema definition of the element type is the 960 following: 962 963 964 965 966 967 968 969 971 972 973 974 975 976 977 978 980 The point type contains three spatial coordinates (x,y,z) 981 representing a point in the space associated with a certain capture 982 scene. 984 The element includes a mandatory 985 element and an optional element, both of the 986 type "pointType". specifies the three coordinates 987 identifying the position of the capture device. 988 is another pointType element representing the "point on line of 989 capture", that gives the pointing direction of the capture device. 991 The coordinates of the point on line of capture MUST NOT be identical 992 to the capture point coordinates. For a spatially definable video 993 capture, if the point on line of capture is provided, it MUST belong 994 to the region between the point of capture and the capture area. For 995 a spatially definable audio capture, if the point on line of capture 996 is not provided, the sensitivity pattern should be considered 997 omnidirectional. 999 10.5.2. 1001 is an optional element that can be contained within the 1002 spatial information associated with a media capture. It represents 1003 the spatial area captured by the media capture. MUST be 1004 included in the spatial information of spatially definable video 1005 captures, while it MUST NOT be associated with audio captures. 1007 The XML representation of that area is provided through a set of four 1008 point-type elements, , , , and 1009 that MUST be co-planar. The four coplanar points are 1010 identified from the perspective of the capture device. The XML 1011 schema definition is the following: 1013 1014 1015 1016 1017 1018 1019 1020 1021 1023 10.6. 1025 When media captures are non spatially definable, they MUST be marked 1026 with the boolean element set to "true" and no 1027 MUST be provided. Indeed, 1028 and are mutually 1029 exclusive tags, according to the section within the XML 1030 Schema definition of the media capture type. 1032 10.7. 1034 A media capture can be (i) an individual media capture or (ii) a 1035 multiple content capture (MCC). A multiple content capture is made 1036 by different captures that can be arranged spatially (by a 1037 composition operation), or temporally (by a switching operation), or 1038 that can result from the orchestration of both the techniques. If a 1039 media capture is an MCC, then it MAY show in its XML data model 1040 representation the element. It is composed by a list of 1041 media capture identifiers ("captureIDREF") and capture scene view 1042 identifiers ("sceneViewIDREF"), where the last ones are used as 1043 shortcuts to refer to multiple capture identifiers. The referenced 1044 captures are used to create the MCC according to a certain strategy. 1045 If the element does not appear in a MCC, or it has no child 1046 elements, then the MCC is assumed to be made of multiple sources but 1047 no information regarding those sources is provided. 1049 1050 1051 1052 1054 1056 1058 1059 1060 1062 10.8. 1064 is an optional element for multiple content 1065 captures that contains a numeric identifier. Multiple content 1066 captures marked with the same identifier in the 1067 contain at all times captures coming from the same sources. It is 1068 the Media Provider that determines what the source for the captures 1069 is. In this way, the Media Provider can choose how to group together 1070 single captures for the purpose of keeping them synchronized 1071 according to the element. 1073 10.9. 1075 is an optional boolean element for multiple 1076 content captures. It indicates whether or not the Provider allows 1077 the Consumer to choose a specific subset of the captures referenced 1078 by the MCC. If this attribute is true, and the MCC references other 1079 captures, then the Consumer MAY specify in a CONFIGURE message a 1080 specific subset of those captures to be included in the MCC, and the 1081 Provider MUST then include only that subset. If this attribute is 1082 false, or the MCC does not reference other captures, then the 1083 Consumer MUST NOT select a subset. If is not 1084 shown in the XML description of the MCC, its value is to be 1085 considered "false". 1087 10.10. 1089 is an optional element that can be used only for multiple 1090 content captures. It indicates the criteria applied to build the 1091 multiple content capture using the media captures referenced in 1092 . The value is in the form of a token 1093 that indicates the policy and an index representing an instance of 1094 the policy, separated by a ":" (e.g., SoundLevel:2, RoundRobin:0, 1095 etc.). The XML schema defining the type of the element is 1096 the following: 1098 1099 1100 1101 1102 1103 1105 At the time of writing, only two switching policies are defined in 1106 [I-D.ietf-clue-framework]: 1108 SoundLevel: the content of the MCC is determined by a sound level 1109 detection algorithm. The loudest (active) speaker (or a previous 1110 speaker, depending on the index value) is contained in the MCC. 1111 Index 0 represents the most current instance of the policy, i.e., 1112 the currently active speaker, 1 represents the previous instance, 1113 i.e., the previous active speaker, and so on. 1115 RoundRobin: the content of the MCC is determined by a time based 1116 algorithm. 1118 Other values for the element can be used. In this case, it 1119 is assumed that implementations agree on the meaning of those other 1120 values and/or those new switching policies are defined in later 1121 documents. 1123 10.11. 1125 is an optional element that can be used only for 1126 multiple content captures (MCC). It provides information about the 1127 number of media captures that can be represented in the multiple 1128 content capture at a time. If is not provided, all the 1129 media captures listed in the element can appear at a time 1130 in the capture encoding. The type definition is provided below. 1132 1133 1134 1135 1136 1137 1138 1140 1142 When the "exactNumber" attribute is set to "true", it means the 1143 element carries the exact number of the media captures 1144 appearing at a time. Otherwise, the number of the represented media 1145 captures MUST be considered "<=" the value. 1147 For instance, an audio MCC having the value set to 1 1148 means that a media stream from the MCC will only contain audio from a 1149 single one of its constituent captures at a time. On the other hand, 1150 if the value is set to 4 and the exactNumber attribute 1151 is set to "true", it would mean that the media stream received from 1152 the MCC will always contain a mix of audio from exactly four of its 1153 constituent captures. 1155 10.12. 1157 is a boolean element that MUST be used for single- 1158 content captures. Its value is fixed and set to "true". Such 1159 element indicates the capture that is being described is not a 1160 multiple content capture. Indeed, and the 1161 aforementioned tags related to MCC attributes (from Section 10.7 to 1162 Section 10.11) are mutually exclusive, according to the 1163 section within the XML Schema definition of the media capture type. 1165 10.13. 1167 is used to provide human-readable textual information. 1168 This element is included in the XML definition of media captures, 1169 capture scenes and capture scene views to the aim of providing human- 1170 readable description of, respectively, media captures, capture scenes 1171 and capture scene views. According to the data model definition of a 1172 media capture (Section 10)), zero or more elements can 1173 be used, each providing information in a different language. The 1174 element definition is the following: 1176 1177 1178 1179 1180 1181 1182 1183 1184 1186 1188 As can be seen, is a string element with an attribute 1189 ("lang") indicating the language used in the textual description. 1191 10.14. 1193 is an optional unsigned integer field indicating the 1194 importance of a media capture according to the Media Provider's 1195 perspective. It can be used on the receiver's side to automatically 1196 identify the most relevant contribution from the Media Provider. The 1197 higher the importance, the lower the contained value. If no priority 1198 is assigned, no assumptions regarding relative importance of the 1199 media capture can be assumed. 1201 10.15. 1203 is an optional element containing the language used in the 1204 capture. Zero or more elements can appear in the XML 1205 description of a media capture. 1207 10.16. 1209 is an optional element indicating whether or not the 1210 capture device originating the capture may move during the 1211 telepresence session. That optional element can assume one of the 1212 three following values: 1214 static SHOULD not change for the duration of the CLUE session, 1215 across multiple ADVERTISEMENT messages. 1217 dynamic MAY change in each new ADVERTISEMENT message. Can be 1218 assumed to remain unchanged until there is a new ADVERTISEMENT 1219 message. 1221 highly-dinamic MAY change dynamically, even between consecutive 1222 ADVERTISEMENT messages. The spatial information provided in an 1223 ADVERTISEMENT message is simply a snapshot of the current values 1224 at the time when the message is sent. 1226 10.17. 1228 The optional element contains the value of the captureID 1229 attribute (Section 10.1) of the media capture to which the considered 1230 media capture refers. The media capture marked with a 1231 element can be for example the translation of the referred media 1232 capture in a different language. 1234 10.18. 1236 The element is an optional tag describing what is represented 1237 in the spatial area covered by a media capture. The current possible 1238 values are: "table", "lectern", "individual", and "audience", as 1239 listed in the enumerative view type in the following. 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1252 10.19. 1254 The element is an optional tag used for media captures 1255 conveying information about presentations within the telepresence 1256 session. The current possible values are "slides" and "images", as 1257 listed in the enumerative presentation type in the following. 1259 1260 1261 1262 1263 1264 1265 1266 1268 10.19.1. 1270 The element is a boolean element indicating that there 1271 is text embedded in the media capture (e.g., in a video capture). 1272 The language used in such embedded textual description is reported in 1273 "lang" attribute. 1275 The XML Schema definition of the element is: 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1288 10.19.2. 1290 This optional element is used to indicate which telepresence session 1291 participants are represented within the media captures. For each 1292 participant, a element is provided. 1294 10.19.2.1. 1296 contains the identifier of the represented person, 1297 i.e., the value of the related personID attribute 1298 (Section 10.29.1.1). Metadata about the represented participant can 1299 be retrieved by accessing the list (Section 10.29). 1301 10.20. Audio captures 1303 Audio captures inherit all the features of a generic media capture 1304 and present further audio-specific characteristics. The XML Schema 1305 definition of the audio capture type is reported below: 1307 1308 1309 1310 1311 1312 1314 1316 1317 1318 1319 1320 1321 An example of audio-specific information that can be included is 1322 represented by the element. (Section 10.20.1). 1324 10.20.1. 1326 The element is an optional field describing the 1327 characteristics of the nominal sensitivity pattern of the microphone 1328 capturing the audio signal. 1330 The XML Schema definition is provided below: 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1343 10.21. Video captures 1345 Video captures, similarly to audio captures, extend the information 1346 of a generic media capture with video-specific features. 1348 The XML Schema representation of the video capture type is provided 1349 in the following: 1351 1352 1353 1354 1355 1356 1358 1359 1360 1361 1362 1364 10.22. Text captures 1366 Also text captures can be described by extending the generic media 1367 capture information, similarly to audio captures and video captures. 1369 The XML Schema representation of the text capture type is currently 1370 lacking text-specific information, as it can be seen by looking at 1371 the definition below: 1373 1374 1375 1376 1377 1378 1380 1381 1382 1383 1384 1386 Text captures SHOULD be marked as non spatially definable (i.e., they 1387 should present in their XML description the 1388 (Section 10.6) element set to "true"). 1390 10.23. Other capture types 1392 Other media capture types can be described by using the CLUE data 1393 model. They can be represented by exploiting the "otherCaptureType" 1394 type. This media capture type is conceived to be filled in with 1395 elements defined within extensions of the current schema, i.e., with 1396 elements defined in other XML schemas (see Section 13 for an 1397 example). The otherCaptureType inherits all the features envisioned 1398 for the abstract mediaCaptureType. 1400 The XML Schema representation of the otherCaptureType is the 1401 following: 1403 1404 1405 1406 1407 1408 1410 1411 1412 1413 1414 1416 When defining new media capture types that are going to be described 1417 by means of the element, spatial properties of 1418 such new media capture types SHOULD be defined (e.g., whether or not 1419 they are spatially definable, whether or not they should be 1420 associated with an area of capture, etc.). 1422 10.24. 1424 A Media Provider organizes the available captures in capture scenes 1425 in order to help the receiver both in the rendering and in the 1426 selection of the group of captures. Capture scenes are made of media 1427 captures and capture scene views, that are sets of media captures of 1428 the same media type. Each capture scene view is an alternative to 1429 represent completely a capture scene for a fixed media type. 1431 The XML Schema representation of a element is the 1432 following: 1434 1435 1436 1437 1438 1439 1440 1442 1443 1444 1445 1446 1447 Each capture scene is identified by a "sceneID" attribute. The 1448 element can contain zero or more textual 1449 elements, defined as in Section 10.13. Besides , there 1450 is the optional element (Section 10.24.1), which 1451 contains structured information about the scene in the vcard format, 1452 and the optional element (Section 10.24.2), which is the 1453 list of the capture scene views. When no is provided, 1454 the capture scene is assumed to be made of all the media captures 1455 which contain the value of its sceneID attribute in their mandatory 1456 captureSceneIDREF attribute. 1458 10.24.1. 1460 The element contains optional information about 1461 the capture scene according to the vcard format. 1463 10.24.2. 1465 The element is a mandatory field of a capture scene 1466 containing the list of scene views. Each scene view is represented 1467 by a element (Section 10.25). 1469 1470 1471 1472 1473 1475 1476 1478 10.24.3. sceneID attribute 1480 The sceneID attribute is a mandatory attribute containing the 1481 identifier of the capture scene. 1483 10.24.4. scale attribute 1485 The scale attribute is a mandatory attribute that specifies the scale 1486 of the coordinates provided in the spatial information of the media 1487 capture belonging to the considered capture scene. The scale 1488 attribute can assume three different values: 1490 "mm" - the scale is in millimeters. Systems which know their 1491 physical dimensions (for example professionally installed 1492 telepresence room systems) should always provide such real-world 1493 measurements. 1495 "unknown" - the scale is the same for every media capture in the 1496 capture scene but the unity of measure is undefined. Systems 1497 which are not aware of specific physical dimensions yet still know 1498 relative distances should select "unknown" in the scale attribute 1499 of the capture scene to be described. 1501 "noscale" - there is no common physical scale among the media 1502 captures of the capture scene. That means the scale could be 1503 different for each media capture. 1505 1506 1507 1508 1509 1510 1511 1512 1514 10.25. 1516 A element represents a capture scene view, which contains 1517 a set of media captures of the same media type describing a capture 1518 scene. 1520 A element is characterized as follows. 1522 1523 1524 1525 1526 1527 1528 1529 1530 One or more optional elements provide human-readable 1531 information about what the scene view contains. is 1532 defined as already seen in Section 10.13. 1534 The remaining child elements are described in the following 1535 subsections. 1537 10.25.1. 1539 The is the list of the identifiers of the media 1540 captures included in the scene view. It is an element of the 1541 captureIDListType type, which is defined as a sequence of 1542 , each containing the identifier of a media capture 1543 listed within the element: 1545 1546 1547 1548 1550 1551 1553 10.25.2. sceneViewID attribute 1555 The sceneViewID attribute is a mandatory attribute containing the 1556 identifier of the capture scene view represented by the 1557 element. 1559 10.26. 1561 The element represents an encoding group, which is 1562 made by a set of one or more individual encodings and some parameters 1563 that apply to the group as a whole. Encoding groups contain 1564 references to individual encodings that can be applied to media 1565 captures. The definition of the element is the 1566 following: 1568 1569 1570 1571 1572 1573 1575 1576 1577 1578 1580 In the following, the contained elements are further described. 1582 10.26.1. 1584 is an optional field containing the maximum 1585 bitrate expressed in bits per second that can be shared by the 1586 individual encodings included in the encoding group. 1588 10.26.2. 1590 is the list of the individual encodings grouped 1591 together in the encoding group. Each individual encoding is 1592 represented through its identifier contained within an 1593 element. 1595 1596 1597 1598 1599 1600 1602 10.26.3. encodingGroupID attribute 1604 The encodingGroupID attribute contains the identifier of the encoding 1605 group. 1607 10.27. 1609 represents a simultaneous transmission set, i.e., a 1610 list of captures of the same media type that can be transmitted at 1611 the same time by a Media Provider. There are different simultaneous 1612 transmission sets for each media type. 1614 1615 1616 1617 1619 1621 1623 1625 1626 1627 1628 1629 1631 Besides the identifiers of the captures ( 1632 elements), also the identifiers of capture scene views and of capture 1633 scene can be exploited as shortcuts ( and 1634 elements). As an example, let's consider the 1635 situation where there are two capture scene views (S1 and S7). S1 1636 contains captures AC11, AC12, AC13. S7 contains captures AC71, AC72. 1637 Provided that AC11, AC12, AC13, AC71, AC72 can be simultaneously sent 1638 to the media consumer, instead of having 5 1639 elements listed in the simultaneous set (i.e., one 1640 for AC11, one for AC12, and so on), there can be 1641 just two elements (one for S1 and one for S7). 1643 10.27.1. setID attribute 1645 The "setID" attribute is a mandatory field containing the identifier 1646 of the simultaneous set. 1648 10.27.2. mediaType attribute 1650 The "mediaType" attribute is an optional attribute containing the 1651 media type of the captures referenced by the simultaneous set. 1653 When only capture scene identifiers are listed within a simultaneous 1654 set, the media type attribute MUST appear in the XML description in 1655 order to determine which media captures can be simultaneously sent 1656 together. 1658 10.27.3. 1660 contains the identifier of the media capture that 1661 belongs to the simultanous set. 1663 10.27.4. 1665 contains the identifier of the scene view containing 1666 a group of captures that are able to be sent simultaneously with the 1667 other captures of the simultaneous set. 1669 10.27.5. 1671 contains the identifier of the capture scene 1672 where all the included captures of a certain media type are able to 1673 be sent together with the other captures of the simultaneous set. 1675 10.28. 1677 is a set of captures of the same media type representing 1678 a summary of the complete Media Provider's offer. The content of a 1679 global view is expressed by leveraging only scene view identifiers, 1680 put within elements. Each global view is identified 1681 by a unique identifier within the "globalViewID" attribute. 1683 1684 1685 1686 1688 1690 1691 1692 1693 1695 10.29. 1697 Information about the participants that are represented in the media 1698 captures is conveyed via the element. As it can be seen 1699 from the XML Schema depicted below, for each participant, a 1700 element is provided. 1702 1703 1704 1705 1707 1708 1710 10.29.1. 1712 includes all the metadata related to a person represented 1713 within one or more media captures. Such element provides the vcard 1714 of the subject (via the element, see Section 10.29.1.2) 1715 and his conference role(s) (via one or more elements, 1716 see Section 10.29.1.3). Furthermore, it has a mandatory "personID" 1717 attribute (Section 10.29.1.1). 1719 1720 1721 1722 1724 1727 1729 1730 1731 1732 1734 10.29.1.1. personID attribute 1736 The "personID" attribute carries the identifier of a represented 1737 person. Such an identifier can be used to refer to the participant, 1738 as in the element in the media captures 1739 representation (Section 10.19.2). 1741 10.29.1.2. 1743 The element is the XML representation of all the fields 1744 composing a vcard as specified in the Xcard RFC [RFC6351]. The 1745 vcardType is imported by the Xcard XML Schema provided by 1747 [I-D.ietf-ecrit-additional-data]. As such schema specifies, the 1748 element within is mandatory. 1750 10.29.1.3. 1752 The value of the element determines the role of the 1753 represented participant within the telepresence session organization. 1754 It can be one of the following terms, that are defined in the 1755 framework document: "presenter", "timekeeper", "attendee", "minute 1756 taker", "translator", "chairman", "vice-chairman". 1758 A participant can play more than one conference role. In that case, 1759 more than one element will appear in his description. 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1775 11. 1777 A capture encoding is given from the association of a media capture 1778 with an individual encoding, to form a capture stream as defined in 1779 [I-D.ietf-clue-framework]. Capture encodings are used within 1780 CONFIGURE messages from a Media Consumer to a Media Provider for 1781 representing the streams desired by the Media Consumer. For each 1782 desired stream, the Media Consumer needs to be allowed to specify: 1783 (i) the capture identifier of the desired capture that has been 1784 advertised by the Media Provider; (ii) the encoding identifier of the 1785 encoding to use, among those advertised by the Media Provider; (iii) 1786 optionally, in case of multi-content captures, the list of the 1787 capture identifiers of the desired captures. All the mentioned 1788 identifiers are intended to be included in the ADVERTISEMENT message 1789 that the CONFIGURE message refers to. The XML model of 1790 is provided in the following. 1792 1793 1794 1795 1796 1797 1798 1800 1801 1802 1803 1805 11.1. 1807 is the mandatory element containing the identifier of the 1808 media capture that has been encoded to form the capture encoding. 1810 11.2. 1812 is the mandatory element containing the identifier of 1813 the applied individual encoding. 1815 11.3. 1817 is an optional element to be used in case of 1818 configuration of MCC. It contains the list of capture identifiers 1819 and capture scene view identifiers the Media Consumer wants within 1820 the MCC. That element is structured as the element used to 1821 describe the content of an MCC. The total number of media captures 1822 listed in the must be lower than or equal to the 1823 value carried within the attribute of the MCC. 1825 12. 1827 The element includes all the information needed to 1828 represent the Media Provider's description of its telepresence 1829 capabilities according to the CLUE framework. Indeed, it is made by: 1831 the list of the available media captures ( 1832 (Section 4)) 1833 the list of encoding groups ( (Section 5)) 1835 the list of capture scenes ( (Section 6)) 1837 the list of simultaneous transmission sets ( 1838 (Section 7)) 1840 the list of global views sets ( (Section 8)) 1842 meta data about the participants represented in the telepresence 1843 session ( (Section 10.29)). 1845 It has been conceived only for data model testing purposes and, 1846 though it resembles the body of an ADVERTISEMENT message, it is not 1847 actually used in the CLUE protocol message definitions. The 1848 telepresence capabilities descriptions compliant to this data model 1849 specification that can be found in Section 16 and Section 17 are 1850 provided by using the element. 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1863 1864 1865 1866 1868 13. XML Schema extensibility 1870 The telepresence data model defined in this document is meant to be 1871 extensible. Extensions are accomplished by defining elements or 1872 attributes qualified by namespaces other than 1873 "urn:ietf:params:xml:ns:clue-info" and 1874 "urn:ietf:params:xml:ns:vcard-4.0" for use wherever the schema allows 1875 such extensions (i.e., where the XML Schema definition specifies 1876 "anyAttribute" or "anyElement"). Elements or attributes from unknown 1877 namespaces MUST be ignored. 1879 13.1. Example of extension 1881 When extending the CLUE data model, a new schema with a new namespace 1882 associated with it needs to be specified. 1884 In the following, an example of extension is provided. The extension 1885 defines a new audio capture attribute ("newAudioFeature") and an 1886 attribute for characterizing the captures belonging to an 1887 "otherCaptureType" defined by the user. An XML document compliant 1888 with the extension is also included. The XML file results validated 1889 against the current CLUE data model schema. 1891 1892 1903 1904 1907 1908 1911 1912 1913 1915 1917 1918 1922 1923 1927 CS1 1928 true 1929 true 1930 EG1 1931 newAudioFeatureValue 1932 1933 1937 CS1 1938 true 1939 EG1 1940 OtherValue 1941 1942 1943 1944 1945 1946 300000 1947 1948 ENC4 1949 ENC5 1950 1951 1952 1953 1954 1955 1956 1958 14. Security considerations 1960 This document defines an XML Schema data model for telepresence 1961 scenarios. The modeled information is identified in the CLUE 1962 framework as necessary in order to enable a full-optional media 1963 stream negotiation and rendering. Indeed, the XML elements herein 1964 defined are used within CLUE protocol messages to describe both the 1965 media streams representing the Media Provider's telepresence offer 1966 and the desired selection requested by the Media Consumer. Security 1967 concerns described in [I-D.ietf-clue-framework], Section 15, apply to 1968 this document. 1970 Data model information carried within CLUE messages SHOULD be 1971 accessed only by authenticated endpoints. Indeed, some information 1972 published by the Media Provider might reveal sensitive data about who 1973 and what is represented in the transmitted streams. The vCard 1974 included in the elements (Section 10.29.1) mandatorily 1975 contains the identity of the represented person. Optionally vCards 1976 can also carry the person's contact addresses, together with his/her 1977 photo and other personal data. Similar privacy-critical information 1978 can be conveyed by means of elements 1979 (Section 10.24.1) describing the capture scenes. The 1980 elements (Section 10.13) also can specify details about the content 1981 of media captures , capture scenes and scene views that should be 1982 protected. 1984 Integrity attacks to the data model information encapsulated in CLUE 1985 messages can invalidate the success of the telepresence session's 1986 setup by misleading the Media Consumer's and Media Provider's 1987 interpretation of the offered and desired media streams. 1989 The assurance of the authenticated access and of the integrity of the 1990 data model information is up to the involved transport mechanisms, 1991 namely the CLUE protocol [I-D.ietf-clue-protocol] and the CLUE data 1992 channel [I-D.ietf-clue-datachannel]. 1994 15. IANA considerations 1996 This document registers a new XML namespace, a new XML schema and the 1997 MIME type for the schema. 1999 15.1. XML namespace registration 2001 URI: urn:ietf:params:xml:ns:clue-info 2003 Registrant Contact: IETF CLUE Working Group , Roberta 2004 Presta 2006 XML: 2008 BEGIN 2010 2011 2013 2014 2015 2017 CLUE Data Model Namespace 2018 2019 2020

Namespace for CLUE Data Model

2021

urn:ietf:params:xml:ns:clue-info

2022

See RFC XXXX. 2023 2025

2026 2027 2029 END 2031 15.2. XML Schema registration 2033 This section registers an XML schema per the guidelines in [RFC3688]. 2035 URI: urn:ietf:params:xml:schema:clue-info 2037 Registrant Contact: CLUE working group (clue@ietf.org), Roberta 2038 Presta (roberta.presta@unina.it). 2040 Schema: The XML for this schema can be found as the entirety of 2041 Section 3 of this document. 2043 15.3. MIME Media Type Registration for 'application/clue_info+xml' 2045 This section registers the "application/clue_info+xml" MIME type. 2047 To: ietf-types@iana.org 2049 Subject: Registration of MIME media type application/clue+xml 2051 MIME media type name: application 2053 MIME subtype name: clue_info+xml 2054 Required parameters: (none) 2056 Optional parameters: charset 2057 Same as the charset parameter of "application/xml" as specified in 2058 [RFC3023], Section 3.2. 2060 Encoding considerations: Same as the encoding considerations of 2061 "application/xml" as specified in [RFC3023], Section 3.2. 2063 Security considerations: This content type is designed to carry data 2064 related to telepresence information. Some of the data could be 2065 considered private. This media type does not provide any protection 2066 and thus other mechanisms such as those described in Section 14 are 2067 required to protect the data. This media type does not contain 2068 executable content. 2070 Interoperability considerations: None. 2072 Published specification: RFC XXXX [[NOTE TO IANA/RFC-EDITOR: Please 2073 replace XXXX with the RFC number for this specification.]] 2075 Applications that use this media type: None. 2077 Additional Information: Magic Number(s): (none), 2078 File extension(s): .clue, 2079 Macintosh File Type Code(s): TEXT. 2081 Person & email address to contact for further information: Roberta 2082 Presta (roberta.presta@unina.it). 2084 Intended usage: LIMITED USE 2086 Author/Change controller: The IETF 2088 Other information: This media type is a specialization of 2089 application/xml [RFC3023], and many of the considerations described 2090 there also apply to application/clue_info+xml. 2092 16. Sample XML file 2094 The following XML document represents a schema compliant example of a 2095 CLUE telepresence scenario. Taking inspiration from the examples 2096 described in the framework draft ([I-D.ietf-clue-framework]), it is 2097 provided the XML representation of an endpoint-style Media Provider's 2098 offer. 2100 There are three cameras, where the central one is also capable of 2101 capturing a zoomed-out view of the overall telepresence room. 2103 Besides the three video captures coming from the cameras, the Media 2104 Provider makes available a further multi-content capture of the 2105 loudest segment of the room, obtained by switching the video source 2106 across the three cameras. For the sake of simplicity, only one audio 2107 capture is advertised for the audio of the whole room. 2109 The three cameras are placed in front of three participants (Alice, 2110 Bob and Ciccio), whose vcard and conference role details are also 2111 provided. 2113 Media captures are arranged into four capture scene views: 2115 1. (VC0, VC1, VC2) - left, center and right camera video captures 2117 2. (VC3) - video capture associated with loudest room segment 2119 3. (VC4) - video capture zoomed out view of all people in the room 2121 4. (AC0) - main audio 2123 There are two encoding groups: (i) EG0, for video encodings, and (ii) 2124 EG1, for audio encodings. 2126 As to the simultaneous sets, only VC1 and VC4 cannot be transmitted 2127 simultaneously since they are captured by the same device, i.e., the 2128 central camera (VC4 is a zoomed-out view while VC1 is a focused view 2129 of the front participant). The simultaneous sets would then be the 2130 following: 2132 SS1 made by VC3 and all the captures in the first capture scene view 2133 (VC0,VC1,VC2); 2135 SS2 made by VC3, VC0, VC2, VC4 2137 2138 2140 2141 2143 CS1 2144 EG1 2145 2146 2147 2148 0.5 2149 1.0 2150 0.5 2151 2152 2153 0.5 2154 0.0 2155 0.5 2156 2157 2158 2159 true 2160 main audio from the room 2161 1 2162 it 2163 static 2164 room 2165 2166 alice 2167 bob 2168 ciccio 2169 2170 2171 2173 CS1 2174 EG0 2175 2176 2177 2178 0.5 2179 1.0 2180 0.5 2181 2182 2183 0.5 2184 0.0 2185 0.5 2186 2187 2188 2189 true 2190 left camera video capture 2191 1 2192 it 2193 static 2194 individual 2195 2196 ciccio 2197 2198 2199 2201 CS1 2202 EG0 2203 2204 2205 2206 0.5 2207 1.0 2208 0.5 2209 2210 2211 0.5 2212 0.0 2213 0.5 2214 2215 2216 2217 true 2218 central camera video capture 2219 1 2220 it 2221 static 2222 individual 2223 2224 alice 2225 2226 2227 2229 CS1 2230 EG0 2231 2232 2233 2234 0.5 2235 1.0 2236 0.5 2237 2238 2239 0.5 2240 0.0 2241 0.5 2242 2243 2245 2246 true 2247 right camera video capture 2248 1 2249 it 2250 static 2251 individual 2252 2253 bob 2254 2255 2256 2258 CS1 2259 EG0 2260 true 2261 Soundlevel:0 2262 loudest room segment 2263 1 2264 it 2265 static 2266 individual 2267 2268 2270 CS1 2271 EG0 2272 2273 2274 2275 0.5 2276 1.0 2277 0.5 2278 2279 2280 0.5 2281 0.0 2282 0.5 2283 2284 2285 2286 true 2287 zoomed out view of all people in the 2288 room 2289 1 2290 it 2291 static 2292 room 2293 2294 alice 2295 bob 2296 ciccio 2297 2298 2299 2300 2301 2302 600000 2303 2304 ENC1 2305 ENC2 2306 ENC3 2307 2308 2309 2310 300000 2311 2312 ENC4 2313 ENC5 2314 2315 2316 2317 2318 2319 2320 2321 2322 VC0 2323 VC1 2324 VC2 2325 2326 2327 2328 2329 VC3 2330 2331 2332 2333 2334 VC4 2335 2336 2337 2338 2339 VC4 2340 2342 2343 2344 2345 2346 2347 2348 VC3 2349 SE1 2350 2351 2352 VC0 2353 VC2 2354 VC4 2355 VC3 2356 2357 2358 2359 2360 2361 2362 Bob 2363 2364 2365 minute taker 2366 2367 2368 2369 2370 Alice 2371 2372 2373 presenter 2374 2375 2376 2377 2378 Ciccio 2379 2380 2381 chairman 2382 timekeeper 2383 2384 2385 2386 17. MCC example 2388 Enhancing the scenario presented in the previous example, the Media 2389 Provider is able to advertise a composed capture VC7 made by a big 2390 picture representing the current speaker (VC3) and two picture-in- 2391 picture boxes representing the previous speakers (the previous one 2392 -VC5- and the oldest one -VC6). The provider does not want to 2393 instantiate and send VC5 and VC6, so it does not associate any 2394 encoding group with them. Their XML representations are provided for 2395 enabling the description of VC7. 2397 A possible description for that scenario could be the following: 2399 2400 2402 2403 2405 CS1 2406 EG1 2407 2408 2409 2410 0.5 2411 1.0 2412 0.5 2413 2414 2415 0.5 2416 0.0 2417 0.5 2418 2419 2420 2421 true 2422 main audio from the room 2423 1 2424 it 2425 static 2426 room 2427 2428 alice 2429 bob 2430 ciccio 2431 2433 2434 2436 CS1 2437 EG0 2438 2439 2440 2441 0.5 2442 1.0 2443 0.5 2444 2445 2446 0.5 2447 0.0 2448 0.5 2449 2450 2451 2452 true 2453 left camera video capture 2454 1 2455 it 2456 static 2457 individual 2458 2459 ciccio 2460 2461 2462 2464 CS1 2465 EG0 2466 2467 2468 2469 0.5 2470 1.0 2471 0.5 2472 2473 2474 0.5 2475 0.0 2476 0.5 2477 2478 2479 2480 true 2481 central camera video capture 2482 1 2483 it 2484 static 2485 individual 2486 2487 alice 2488 2489 2490 2492 CS1 2493 EG0 2494 2495 2496 2497 0.5 2498 1.0 2499 0.5 2500 2501 2502 0.5 2503 0.0 2504 0.5 2505 2506 2507 2508 true 2509 right camera video capture 2510 1 2511 it 2512 static 2513 individual 2514 2515 bob 2516 2517 2518 2520 CS1 2521 EG0 2522 true 2523 2524 SE1 2525 2526 Soundlevel:0 2527 loudest room segment 2528 1 2529 it 2530 static 2531 individual 2532 2533 2535 CS1 2536 EG0 2537 2538 2539 2540 0.5 2541 1.0 2542 0.5 2543 2544 2545 0.5 2546 0.0 2547 0.5 2548 2549 2550 2551 true 2552 zoomed out view of all people in the room 2553 1 2554 it 2555 static 2556 room 2557 2558 alice 2559 bob 2560 ciccio 2561 2562 2563 2565 CS1 2566 true 2567 2568 SE1 2569 2570 Soundlevel:1 2571 penultimate loudest room segment 2572 1 2573 it 2574 static 2575 individual 2576 2577 2579 CS1 2580 true 2581 2582 SE1 2583 2584 Soundlevel:2 2585 last but two loudest room segment 2586 1 2587 it 2588 static 2589 individual 2590 2591 2593 CS1 2594 true 2595 2596 VC3 2597 VC5 2598 VC6 2599 2600 big picture of the current speaker + 2601 pips about previous speakers 2602 1 2603 it 2604 static 2605 individual 2606 2607 2608 2609 2610 600000 2611 2612 ENC1 2613 ENC2 2614 ENC3 2615 2616 2617 2618 300000 2619 2620 ENC4 2621 ENC5 2622 2623 2624 2625 2626 2627 2628 2629 participants' individual 2630 videos 2631 2632 VC0 2633 VC1 2634 VC2 2635 2636 2637 2638 loudest segment of the 2639 room 2640 2641 VC3 2642 2643 2644 2645 loudest segment of the 2646 room + pips 2647 2648 VC7 2649 2650 2651 2652 room audio 2653 2654 AC0 2655 2656 2657 2658 room video 2659 2660 VC4 2661 2662 2663 2664 2665 2666 2667 2668 VC7 2669 SE1 2670 2671 2672 VC0 2673 VC2 2674 VC4 2675 VC7 2676 2677 2678 2679 2680 2681 2682 Bob 2683 2684 2685 minute taker 2686 2687 2688 2689 2690 Alice 2691 2692 2693 presenter 2694 2695 2696 2697 2698 Ciccio 2699 2700 2701 chairman 2702 timekeeper 2703 2704 2705 2707 18. Diff with draft-ietf-clue-data-model-schema-09 version 2709 o We have introduced a element containing a 2710 mandatory and an optional in 2711 the definition of as per Paul's review 2713 o A new type definition for switching policies (resembled by 2714 element) has been provided in order to have acceptable 2715 values in the form of "token:index". 2717 o Minor modifications suggested in WGLC reviews have been applied. 2719 19. Diff with draft-ietf-clue-data-model-schema-08 version 2721 o Typos correction 2723 20. Diff with draft-ietf-clue-data-model-schema-07 version 2725 o IANA Considerations: text added 2727 o maxCaptureEncodings removed 2729 o personTypeType values aligned with CLUE framework 2731 o allowSubsetChoice added for multiple content captures 2733 o embeddedText moved from videoCaptureType definition to 2734 mediaCaptureType definition 2736 o typos removed from section Terminology 2738 21. Diff with draft-ietf-clue-data-model-schema-06 version 2740 o Capture Scene Entry/Entries renamed as Capture Scene View/Views in 2741 the text, / renamed as / 2742 in the XML schema. 2744 o Global Scene Entry/Entries renamed as Global View/Views in the 2745 text, / renamed as 2746 / 2748 o Security section added. 2750 o Extensibility: a new type is introduced to describe other types of 2751 media capture (otherCaptureType), text and example added. 2753 o Spatial information section updated: capture point optional, text 2754 now is coherent with the framework one. 2756 o Audio capture description: added, 2757 removed, disallowed. 2759 o Simultaneous set definition: added to refer to 2760 capture scene identifiers as shortcuts and an optional mediaType 2761 attribute which is mandatory to use when only capture scene 2762 identifiers are listed. 2764 o Encoding groups: removed the constraint of the same media type. 2766 o Updated text about media captures without 2767 (optional in the XML schema). 2769 o "mediaType" attribute removed from homogeneous groups of capture 2770 (scene views and globlal views) 2772 o "mediaType" attribute removed from the global view textual 2773 description. 2775 o "millimeters" scale value changed in "mm" 2777 22. Diff with draft-ietf-clue-data-model-schema-04 version 2779 globalCaptureEntries/Entry renamed as globalSceneEntries/Entry; 2781 sceneInformation added; 2783 Only capture scene entry identifiers listed within global scene 2784 entries (media capture identifiers removed); 2786 renamed as in the >clueInfo< template 2788 renamed as to synch with the framework 2789 terminology 2791 renamed as to synch with the 2792 framework terminology 2794 renamed as in the media capture 2795 type definition to remove ambiguity 2797 Examples have been updated with the new definitions of 2798 and of . 2800 23. Diff with draft-ietf-clue-data-model-schema-03 version 2802 encodings section has been removed 2804 global capture entries have been introduced 2806 capture scene entry identifiers are used as shortcuts in listing 2807 the content of MCC (similarly to simultaneous set and global 2808 capture entries) 2810 Examples have been updated. A new example with global capture 2811 entries has been added. 2813 has been made optional. 2815 has been renamed into 2817 Obsolete comments have been removed. 2819 participants information has been added. 2821 24. Diff with draft-ietf-clue-data-model-schema-02 version 2823 captureParameters and encodingParameters have been removed from 2824 the captureEncodingType 2826 data model example has been updated and validated according to the 2827 new schema. Further description of the represented scenario has 2828 been provided. 2830 A multiple content capture example has been added. 2832 Obsolete comments and references have been removed. 2834 25. Acknowledgments 2836 The authors thank all the CLUErs for their precious feedbacks and 2837 support. 2839 26. Informative References 2841 [I-D.ietf-clue-datachannel] Holmberg, C., "CLUE Protocol data 2842 channel", 2843 draft-ietf-clue-datachannel-09 2844 (work in progress), March 2015. 2846 [I-D.ietf-clue-framework] Duckworth, M., Pepperell, A., and 2847 S. Wenger, "Framework for 2848 Telepresence Multi-Streams", 2849 draft-ietf-clue-framework-22 (work 2850 in progress), April 2015. 2852 [I-D.ietf-clue-protocol] Presta, R. and S. Romano, "CLUE 2853 protocol", 2854 draft-ietf-clue-protocol-04 (work 2855 in progress), April 2015. 2857 [I-D.ietf-ecrit-additional-data] Gellens, R., Rosen, B., Tschofenig, 2858 H., Marshall, R., and J. 2859 Winterbottom, "Additional Data 2860 Related to an Emergency Call", 2861 (work in progress), March 2015. 2863 [RFC3023] Murata, M., St. Laurent, S., and D. 2864 Kohn, "XML Media Types", RFC 3023, 2865 January 2001. 2867 [RFC3550] Schulzrinne, H., Casner, S., 2868 Frederick, R., and V. Jacobson, 2869 "RTP: A Transport Protocol for 2870 Real-Time Applications", STD 64, 2871 RFC 3550, July 2003. 2873 [RFC3688] Mealling, M., "The IETF XML 2874 Registry", BCP 81, RFC 3688, 2875 January 2004. 2877 [RFC4353] Rosenberg, J., "A Framework for 2878 Conferencing with the Session 2879 Initiation Protocol (SIP)", 2880 RFC 4353, February 2006. 2882 [RFC4796] Hautakorpi, J. and G. Camarillo, 2883 "The Session Description Protocol 2884 (SDP) Content Attribute", RFC 4796, 2885 February 2007. 2887 [RFC5117] Westerlund, M. and S. Wenger, "RTP 2888 Topologies", RFC 5117, 2889 January 2008. 2891 [RFC6351] Perreault, S., "xCard: vCard XML 2892 Representation", RFC 6351, 2893 August 2011. 2895 Authors' Addresses 2897 Roberta Presta 2898 University of Napoli 2899 Via Claudio 21 2900 Napoli 80125 2901 Italy 2903 EMail: roberta.presta@unina.it 2904 Simon Pietro Romano 2905 University of Napoli 2906 Via Claudio 21 2907 Napoli 80125 2908 Italy 2910 EMail: spromano@unina.it