idnits 2.17.1 draft-ietf-clue-data-model-schema-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 49 instances of too long lines in the document, the longest one being 7 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 19, 2015) is 3105 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0-9' is mentioned on line 1113, but not defined == Outdated reference: A later version (-18) exists of draft-ietf-clue-datachannel-10 == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-23 == Outdated reference: A later version (-19) exists of draft-ietf-clue-protocol-06 == Outdated reference: A later version (-38) exists of draft-ietf-ecrit-additional-data-37 -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE Working Group R. Presta 3 Internet-Draft S P. Romano 4 Intended status: Standards Track University of Napoli 5 Expires: April 21, 2016 October 19, 2015 7 An XML Schema for the CLUE data model 8 draft-ietf-clue-data-model-schema-11 10 Abstract 12 This document provides an XML schema file for the definition of CLUE 13 data model types. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on April 21, 2016. 32 Copyright Notice 34 Copyright (c) 2015 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 49 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 50 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 4. XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 6 52 5. . . . . . . . . . . . . . . . . . . . . . . . 17 53 6. . . . . . . . . . . . . . . . . . . . . . . . 17 54 7. . . . . . . . . . . . . . . . . . . . . . . . 18 55 8. . . . . . . . . . . . . . . . . . . . . . . 18 56 9. . . . . . . . . . . . . . . . . . . . . . . . . 18 57 10. . . . . . . . . . . . . . . . . . . . . . . 18 58 11. . . . . . . . . . . . . . . . . . . . . . . . . 18 59 11.1. captureID attribute . . . . . . . . . . . . . . . . . . . 19 60 11.2. mediaType attribute . . . . . . . . . . . . . . . . . . . 20 61 11.3. . . . . . . . . . . . . . . . . . . . 20 62 11.4. . . . . . . . . . . . . . . . . . . . . . 20 63 11.5. . . . . . . . . . . . . . . . . . . 20 64 11.5.1. . . . . . . . . . . . . . . . . . . . 21 65 11.5.2. . . . . . . . . . . . . . . . . . . . . 22 66 11.6. . . . . . . . . . . . . . . . . . 23 67 11.7. . . . . . . . . . . . . . . . . . . . . . . . . 23 68 11.8. . . . . . . . . . . . . . . . . . . . 24 69 11.9. . . . . . . . . . . . . . . . . . . . 24 70 11.10. . . . . . . . . . . . . . . . . . . . . . . . . 24 71 11.11. . . . . . . . . . . . . . . . . . . . . . . 25 72 11.12. . . . . . . . . . . . . . . . . . . . . . . 26 73 11.13. . . . . . . . . . . . . . . . . . . . . . . 26 74 11.14. . . . . . . . . . . . . . . . . . . . . . . . 27 75 11.15. . . . . . . . . . . . . . . . . . . . . . . . . . 27 76 11.16. . . . . . . . . . . . . . . . . . . . . . . . 27 77 11.17. . . . . . . . . . . . . . . . . . . . . . . . 27 78 11.18. . . . . . . . . . . . . . . . . . . . . . . . . . 28 79 11.19. . . . . . . . . . . . . . . . . . . . . . 28 80 11.19.1. . . . . . . . . . . . . . . . . . . . 28 81 11.19.2. . . . . . . . . . . . . . . . . . . 29 82 11.20. Audio captures . . . . . . . . . . . . . . . . . . . . . 29 83 11.20.1. . . . . . . . . . . . . . . . . 30 84 11.21. Video captures . . . . . . . . . . . . . . . . . . . . . 30 85 11.22. Text captures . . . . . . . . . . . . . . . . . . . . . . 31 86 11.23. Other capture types . . . . . . . . . . . . . . . . . . . 31 87 11.24. . . . . . . . . . . . . . . . . . . . . . 32 88 11.24.1. . . . . . . . . . . . . . . . . . 33 89 11.24.2. . . . . . . . . . . . . . . . . . . . . 33 90 11.24.3. sceneID attribute . . . . . . . . . . . . . . . . . . 33 91 11.24.4. scale attribute . . . . . . . . . . . . . . . . . . . 33 92 11.25. . . . . . . . . . . . . . . . . . . . . . . . 34 93 11.25.1. . . . . . . . . . . . . . . . . . . 35 94 11.25.2. sceneViewID attribute . . . . . . . . . . . . . . . . 35 95 11.26. . . . . . . . . . . . . . . . . . . . . . 35 96 11.26.1. . . . . . . . . . . . . . . . . . 36 97 11.26.2. . . . . . . . . . . . . . . . . . . 36 98 11.26.3. encodingGroupID attribute . . . . . . . . . . . . . . 36 99 11.27. . . . . . . . . . . . . . . . . . . . . 36 100 11.27.1. setID attribute . . . . . . . . . . . . . . . . . . . 37 101 11.27.2. mediaType attribute . . . . . . . . . . . . . . . . . 37 102 11.27.3. . . . . . . . . . . . . . . . . . 38 103 11.27.4. . . . . . . . . . . . . . . . . . . 38 104 11.27.5. . . . . . . . . . . . . . . . . . 38 105 11.28. . . . . . . . . . . . . . . . . . . . . . . 38 106 11.29. . . . . . . . . . . . . . . . . . . . . . . . . 38 107 11.29.1. . . . . . . . . . . . . . . . . . . . . . . 39 108 12. . . . . . . . . . . . . . . . . . . . . . . 40 109 12.1. . . . . . . . . . . . . . . . . . . . . . . . 41 110 12.2. . . . . . . . . . . . . . . . . . . . . . . 41 111 12.3. . . . . . . . . . . . . . . . . . . . 41 112 13. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 113 14. XML Schema extensibility . . . . . . . . . . . . . . . . . . . 42 114 14.1. Example of extension . . . . . . . . . . . . . . . . . . 43 115 15. Security considerations . . . . . . . . . . . . . . . . . . . 44 116 16. IANA considerations . . . . . . . . . . . . . . . . . . . . . 45 117 16.1. XML namespace registration . . . . . . . . . . . . . . . 45 118 16.2. XML Schema registration . . . . . . . . . . . . . . . . . 46 119 16.3. MIME Media Type Registration for 120 'application/clue_info+xml' . . . . . . . . . . . . . . . 46 121 17. Sample XML file . . . . . . . . . . . . . . . . . . . . . . . 47 122 18. MCC example . . . . . . . . . . . . . . . . . . . . . . . . . 54 123 19. Diff with draft-ietf-clue-data-model-schema-10 version . . . . 60 124 20. Diff with draft-ietf-clue-data-model-schema-09 version . . . . 61 125 21. Diff with draft-ietf-clue-data-model-schema-08 version . . . . 61 126 22. Diff with draft-ietf-clue-data-model-schema-07 version . . . . 61 127 23. Diff with draft-ietf-clue-data-model-schema-06 version . . . . 61 128 24. Diff with draft-ietf-clue-data-model-schema-04 version . . . . 62 129 25. Diff with draft-ietf-clue-data-model-schema-03 version . . . . 63 130 26. Diff with draft-ietf-clue-data-model-schema-02 version . . . . 63 131 27. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63 132 28. Informative References . . . . . . . . . . . . . . . . . . . . 63 134 1. Introduction 136 This document provides an XML schema file for the definition of CLUE 137 data model types. 139 The schema is based on information contained in 140 [I-D.ietf-clue-framework]. It encodes information and constraints 141 defined in the aforementioned document in order to provide a formal 142 representation of the concepts therein presented. 144 The document aims at the definition of a coherent structure for 145 information associated with the description of a telepresence 146 scenario. Such information is used within the CLUE protocol messages 147 ([I-D.ietf-clue-protocol]) enabling the dialogue between a Media 148 Provider and a Media Consumer. CLUE protocol messages, indeed, are 149 XML messages allowing (i) a Media Provider to advertise its 150 telepresence capabilities in terms of media captures, capture scenes, 151 and other features envisioned in the CLUE framework, according to the 152 format herein defined and (ii) a Media Consumer to request the 153 desired telepresence options in the form of capture encodings, 154 represented as described in this document. 156 2. Terminology 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 160 document are to be interpreted as described in [RFC2119]. 162 3. Definitions 164 This document refers to the same definitions used in 165 [I-D.ietf-clue-framework], except for the "CLUE Participant" 166 definition. We briefly recall herein some of the main terms used in 167 the document. 169 Audio Capture: Media Capture for audio. Denoted as ACn in the 170 examples in this document. 172 Capture: Same as Media Capture. 174 Capture Device: A device that converts physical input, such as 175 audio, video or text, into an electrical signal, in most cases to 176 be fed into a media encoder. 178 Capture Encoding: A specific encoding of a Media Capture, to be sent 179 by a Media Provider to a Media Consumer via RTP. 181 Capture Scene: A structure representing a spatial region captured by 182 one or more Capture Devices, each capturing media representing a 183 portion of the region. The spatial region represented by a 184 Capture Scene MAY correspond to a real region in physical space, 185 such as a room. A Capture Scene includes attributes and one or 186 more Capture Scene Views, with each view including one or more 187 Media Captures. 189 Capture Scene View: A list of Media Captures of the same media type 190 that together form one way to represent the entire Capture Scene.. 192 CLUE Participant: This term is not imported from the framework 193 terminology. A CLUE Participant identifies a generic entity 194 (either an Endpoint or a MCU) making use of the CLUE protocol. 196 Consumer: Short for Media Consumer. 198 Encoding or Individual Encoding: A set of parameters representing a 199 way to encode a Media Capture to become a Capture Encoding. 201 Encoding Group: A set of encoding parameters representing a total 202 media encoding capability to be sub-divided across potentially 203 multiple Individual Encodings. 205 Endpoint A CLUE-capable device which is the logical point of final 206 termination through receiving, decoding and rendering, and/or 207 initiation through capturing, encoding, and sending of media 208 streams. An endpoint consists of one or more physical devices 209 which source and sink media streams, and exactly one [RFC4353] 210 Participant (which, in turn, includes exactly one SIP User Agent). 211 Endpoints can be anything from multiscreen/multicamera rooms to 212 handheld devices. 214 Media: Any data that, after suitable encoding, can be conveyed over 215 RTP, including audio, video or timed text. 217 Media Capture: A source of Media, such as from one or more Capture 218 Devices or constructed from other Media streams. 220 Media Consumer: A CLUE-capable device that intends to receive 221 Capture Encodings. 223 Media Provider: A CLUE-capable device that intends to send Capture 224 Encodings. 226 Multiple Content Capture: A Capture that mixes and/or switches other 227 Captures of a single type. (E.g. all audio or all video.) 228 Particular Media Captures may or may not be present in the 229 resultant Capture Encoding depending on time or space. Denoted as 230 MCCn in the example cases in this document. 232 Multipoint Control Unit (MCU): A CLUE-capable device that connects 233 two or more endpoints together into one single multimedia 234 conference [RFC5117]. An MCU includes an [RFC4353] like Mixer, 235 without the [RFC4353] requirement to send media to each 236 participant. 238 Plane of Interest: The spatial plane containing the most relevant 239 Subject matter. 241 Provider: Same as Media Provider. 243 Render: The process of reproducing the received Streams like, for 244 instance, displaying of the remote video on the Media Consumer's 245 screens, or playing of the remote audio through loudspeakers. 247 Scene: Same as Capture Scene. 249 Simultaneous Transmission Set: A set of Media Captures that can be 250 transmitted simultaneously from a Media Provider. 252 Single Media Capture: A capture which contains media from a single 253 source capture device, e.g. an audio capture from a single 254 microphone, a video capture from a single camera. 256 Spatial Relation: The arrangement in space of two objects, in 257 contrast to relation in time or other relationships. 259 Stream: A Capture Encoding sent from a Media Provider to a Media 260 Consumer via RTP [RFC3550]. 262 Stream Characteristics: The union of the features used to describe a 263 Stream in the CLUE environment and in the SIP-SDP environment. 265 Video Capture: A Media Capture for video. 267 4. XML Schema 269 This section contains the CLUE data model schema definition. 271 The element and attribute definitions are formal representations of 272 the concepts needed to describe the capabilities of a Media Provider 273 and the streams that are requested by a Media Consumer given the 274 Media Provider's ADVERTISEMENT ([I-D.ietf-clue-protocol]). 276 The main groups of information are: 278 : the list of media captures available (Section 5) 280 : the list of encoding groups (Section 6) 282 : the list of capture scenes (Section 7) 284 : the list of simultaneous transmission sets 285 (Section 8) 287 : the list of global views sets (Section 9) 289 : meta data about the participants represented in the 290 telepresence session (Section 11.29). 292 : the list of instantiated capture encodings 293 (Section 10) 295 All of the above refers to concepts that have been introduced in 296 [I-D.ietf-clue-framework] and further detailed in this document. 298 299 309 310 313 314 315 316 317 318 319 320 322 323 324 325 326 328 329 331 332 333 334 335 336 337 338 339 340 342 343 344 345 346 347 348 349 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 371 372 373 374 375 376 377 378 379 381 383 384 385 386 387 388 390 391 392 393 395 397 399 400 401 403 404 405 406 407 408 409 410 412 413 414 415 416 417 419 420 421 422 424 425 427 428 429 430 432 435 437 438 439 440 442 443 444 445 446 447 448 449 450 451 452 453 454 456 457 458 459 460 461 462 463 464 465 467 468 469 470 471 472 473 475 476 477 478 479 480 482 483 484 486 487 488 489 490 491 492 493 495 496 497 498 499 500 501 502 503 505 506 507 508 509 510 511 512 513 515 516 517 518 519 520 521 522 524 525 526 527 528 529 531 532 533 534 535 537 538 539 540 541 542 544 545 546 547 548 550 551 552 553 554 555 557 559 560 561 562 563 565 566 567 568 569 570 571 572 573 574 575 577 578 579 580 581 582 584 585 586 587 588 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 606 607 609 610 611 612 613 614 615 617 618 619 620 621 623 624 625 626 627 628 629 630 632 633 634 635 636 638 639 641 642 643 644 645 647 648 649 651 652 653 654 656 657 659 660 661 662 664 665 667 668 669 670 671 672 674 675 676 677 679 680 681 682 683 684 686 687 688 689 691 692 693 694 695 696 698 700 702 704 705 706 707 708 710 711 712 713 715 716 718 719 720 721 723 725 726 727 728 730 731 732 733 735 736 738 739 740 741 742 743 744 746 747 748 749 751 752 754 755 756 757 758 759 760 761 762 763 765 766 767 768 769 771 Following sections describe the XML schema in more detail. As a 772 general remark, please notice that optional elements that don't 773 define what their absence means are intended to be associated with 774 undefined properties. 776 5. 778 represents the list of one or more media captures 779 available at the Media Provider's side. Each media capture is 780 represented by a element (Section 11). 782 6. 784 represents the list of the encoding groups organized 785 on the Media Provider's side. Each encoding group is represented by 786 an element (Section 11.26). 788 7. 790 represents the list of the capture scenes organized 791 on the Media Provider's side. Each capture scene is represented by a 792 element. (Section 11.24). 794 8. 796 contains the simultaneous sets indicated by the 797 Media Provider. Each simultaneous set is represented by a 798 element. (Section 11.27). 800 9. 802 contains a set of alternative representations of all 803 the scenes that are offered by a Media Provider to a Media Consumer. 804 Each alternative is named "global view" and it is represented by a 805 element. (Section 11.28). 807 10. 809 is a list of capture encodings. It can represent 810 the list of the desired capture encodings indicated by the Media 811 Consumer or the list of instantiated captures on the provider's side. 812 Each capture encoding is represented by a element. 813 (Section 12). 815 11. 817 A Media Capture is the fundamental representation of a media flow 818 that is available on the provider's side. Media captures are 819 characterized (i) by a set of features that are independent from the 820 specific type of medium, and (ii) by a set of features that are 821 media-specific. The features that are common to all media types 822 appear within the media capture type, that has been designed as an 823 abstract complex type. Media-specific captures, such as video 824 captures, audio captures and others, are specializations of that 825 abstract media capture type, as in a typical generalization- 826 specialization hierarchy. 828 The following is the XML Schema definition of the media capture type: 830 831 832 833 834 835 836 837 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 859 860 861 862 863 864 865 866 867 868 870 11.1. captureID attribute 872 The "captureID" attribute is a mandatory field containing the 873 identifier of the media capture. Such an identifier serves as the 874 way the capture is referenced from other data model elements (e.g., 875 simultaneous sets, capture encodings, and others). 877 11.2. mediaType attribute 879 The "mediaType" attribute is a mandatory attribute specifying the 880 media type of the capture. Common values are "audio", "video", 881 "text". Other values can be provided. It is assumed that 882 implementations agree on the interpretation of those other values. 884 11.3. 886 is a mandatory field containing the value of the 887 identifier of the capture scene the media capture is defined in, 888 i.e., the value of the sceneID (Section 11.24.3) attribute of that 889 capture scene. Indeed, each media capture must be defined within one 890 and only one capture scene. When a media capture is spatially 891 definable, some spatial information is provided along with it in the 892 form of point coordinates (see Section 11.5). Such coordinates refer 893 to the space of coordinates defined for the capture scene containing 894 the capture. 896 11.4. 898 is an optional field containing the identifier of the 899 encoding group the media capture is associated with, i.e., the value 900 of the encodingGroupID (Section 11.26.3) attribute of that encoding 901 group. Media captures that are not associated with any encoding 902 group can not be instantiated as media streams. 904 11.5. 906 Media captures are divided into two categories: (i) non spatially 907 definable captures and (ii) spatially definable captures. 909 Captures are spatially definable when at least (i) it is possible to 910 provide the coordinates of the device position within the 911 telepresence room of origin (capture point) together with its 912 capturing direction specified by a second point (point on line of 913 capture), or (ii) it is possible to provide the represented area 914 within the telepresence room, by listing the coordinates of the four 915 co-planar points identifying the plane of interest (area of capture). 916 The coordinates of the abovementioned points must be expressed 917 according to the coordinate space of the capture scene the media 918 captures belongs to. 920 Non spatially definable captures cannot be characterized within the 921 physical space of the telepresence room of origin. Captures of this 922 kind are for example those related to recordings, text captures, 923 DVDs, registered presentations, or external streams that are played 924 in the telepresence room and transmitted to remote sites. 926 Spatially definable captures represent a part of the telepresence 927 room. The captured part of the telepresence room is described by 928 means of the element. By comparing the 929 element of different media captures within the 930 same capture scene, a consumer can better determine the spatial 931 relationships between them and render them correctly. Non spatially 932 definable captures do not embed such element in their XML 933 description: they are instead characterized by having the 934 tag set to "true" (see Section 11.6). 936 The definition of the spatial information type is the following: 938 939 940 941 942 943 945 946 947 949 The contains the coordinates of the capture device 950 that is taking the capture (i.e., the capture point), as well as, 951 optionally, the pointing direction (i.e., the point on line of 952 capture) (see Section 11.5.1). 954 The is an optional field containing four points 955 defining the captured area covered by the capture (see 956 Section 11.5.2). 958 The scale of the points coordinates is specified in the scale 959 (Section 11.24.4) attribute of the capture scene the media capture 960 belongs to. Indeed, all the spatially definable media captures 961 referring to the same capture scene share the same coordinate system 962 and express their spatial information according to the same scale. 964 11.5.1. 966 The element is used to represent the position and 967 optionally the line of capture of a capture device. 968 MUST be included in spatially definable audio captures, while it is 969 optional for spatially definable video captures. 971 The XML Schema definition of the element type is the 972 following: 974 975 976 977 978 979 980 981 983 984 985 986 987 988 989 990 992 The point type contains three spatial coordinates (x,y,z) 993 representing a point in the space associated with a certain capture 994 scene. 996 The element includes a mandatory 997 element and an optional element, both of the 998 type "pointType". specifies the three coordinates 999 identifying the position of the capture device. 1000 is another pointType element representing the "point on line of 1001 capture", that gives the pointing direction of the capture device. 1003 The coordinates of the point on line of capture MUST NOT be identical 1004 to the capture point coordinates. For a spatially definable video 1005 capture, if the point on line of capture is provided, it MUST belong 1006 to the region between the point of capture and the capture area. For 1007 a spatially definable audio capture, if the point on line of capture 1008 is not provided, the sensitivity pattern should be considered 1009 omnidirectional. 1011 11.5.2. 1013 is an optional element that can be contained within the 1014 spatial information associated with a media capture. It represents 1015 the spatial area captured by the media capture. MUST be 1016 included in the spatial information of spatially definable video 1017 captures, while it MUST NOT be associated with audio captures. 1019 The XML representation of that area is provided through a set of four 1020 point-type elements, , , , and 1021 that MUST be co-planar. The four coplanar points are 1022 identified from the perspective of the capture device. The XML 1023 schema definition is the following: 1025 1026 1027 1028 1029 1030 1031 1032 1033 1035 11.6. 1037 When media captures are non spatially definable, they MUST be marked 1038 with the boolean element set to "true" and no 1039 MUST be provided. Indeed, 1040 and are mutually 1041 exclusive tags, according to the section within the XML 1042 Schema definition of the media capture type. 1044 11.7. 1046 A media capture can be (i) an individual media capture or (ii) a 1047 multiple content capture (MCC). A multiple content capture is made 1048 by different captures that can be arranged spatially (by a 1049 composition operation), or temporally (by a switching operation), or 1050 that can result from the orchestration of both the techniques. If a 1051 media capture is an MCC, then it MAY show in its XML data model 1052 representation the element. It is composed by a list of 1053 media capture identifiers ("captureIDREF") and capture scene view 1054 identifiers ("sceneViewIDREF"), where the last ones are used as 1055 shortcuts to refer to multiple capture identifiers. The referenced 1056 captures are used to create the MCC according to a certain strategy. 1057 If the element does not appear in a MCC, or it has no child 1058 elements, then the MCC is assumed to be made of multiple sources but 1059 no information regarding those sources is provided. 1061 1062 1063 1064 1066 1068 1070 1071 1072 1074 11.8. 1076 is an optional element for multiple content 1077 captures that contains a numeric identifier. Multiple content 1078 captures marked with the same identifier in the 1079 contain at all times captures coming from the same sources. It is 1080 the Media Provider that determines what the source for the captures 1081 is. In this way, the Media Provider can choose how to group together 1082 single captures for the purpose of keeping them synchronized 1083 according to the element. 1085 11.9. 1087 is an optional boolean element for multiple 1088 content captures. It indicates whether or not the Provider allows 1089 the Consumer to choose a specific subset of the captures referenced 1090 by the MCC. If this attribute is true, and the MCC references other 1091 captures, then the Consumer MAY specify in a CONFIGURE message a 1092 specific subset of those captures to be included in the MCC, and the 1093 Provider MUST then include only that subset. If this attribute is 1094 false, or the MCC does not reference other captures, then the 1095 Consumer MUST NOT select a subset. If is not 1096 shown in the XML description of the MCC, its value is to be 1097 considered "false". 1099 11.10. 1101 is an optional element that can be used only for multiple 1102 content captures. It indicates the criteria applied to build the 1103 multiple content capture using the media captures referenced in 1104 . The value is in the form of a token 1105 that indicates the policy and an index representing an instance of 1106 the policy, separated by a ":" (e.g., SoundLevel:2, RoundRobin:0, 1107 etc.). The XML schema defining the type of the element is 1108 the following: 1110 1111 1112 1113 1114 1115 1117 At the time of writing, only two switching policies are defined in 1118 [I-D.ietf-clue-framework]: 1120 SoundLevel: the content of the MCC is determined by a sound level 1121 detection algorithm. The loudest (active) speaker (or a previous 1122 speaker, depending on the index value) is contained in the MCC. 1123 Index 0 represents the most current instance of the policy, i.e., 1124 the currently active speaker, 1 represents the previous instance, 1125 i.e., the previous active speaker, and so on. 1127 RoundRobin: the content of the MCC is determined by a time based 1128 algorithm. 1130 Other values for the element can be used. In this case, it 1131 is assumed that implementations agree on the meaning of those other 1132 values and/or those new switching policies are defined in later 1133 documents. 1135 11.11. 1137 is an optional element that can be used only for 1138 multiple content captures (MCC). It provides information about the 1139 number of media captures that can be represented in the multiple 1140 content capture at a time. If is not provided, all the 1141 media captures listed in the element can appear at a time 1142 in the capture encoding. The type definition is provided below. 1144 1145 1146 1147 1148 1149 1150 1152 1154 When the "exactNumber" attribute is set to "true", it means the 1155 element carries the exact number of the media captures 1156 appearing at a time. Otherwise, the number of the represented media 1157 captures MUST be considered "<=" the value. 1159 For instance, an audio MCC having the value set to 1 1160 means that a media stream from the MCC will only contain audio from a 1161 single one of its constituent captures at a time. On the other hand, 1162 if the value is set to 4 and the exactNumber attribute 1163 is set to "true", it would mean that the media stream received from 1164 the MCC will always contain a mix of audio from exactly four of its 1165 constituent captures. 1167 11.12. 1169 is a boolean element that MUST be used for single- 1170 content captures. Its value is fixed and set to "true". Such 1171 element indicates the capture that is being described is not a 1172 multiple content capture. Indeed, and the 1173 aforementioned tags related to MCC attributes (from Section 11.7 to 1174 Section 11.11) are mutually exclusive, according to the 1175 section within the XML Schema definition of the media capture type. 1177 11.13. 1179 is used to provide human-readable textual information. 1180 This element is included in the XML definition of media captures, 1181 capture scenes and capture scene views to the aim of providing human- 1182 readable description of, respectively, media captures, capture scenes 1183 and capture scene views. According to the data model definition of a 1184 media capture (Section 11)), zero or more elements can 1185 be used, each providing information in a different language. The 1186 element definition is the following: 1188 1189 1190 1191 1192 1193 1194 1195 1196 1198 1200 As can be seen, is a string element with an attribute 1201 ("lang") indicating the language used in the textual description. 1203 11.14. 1205 is an optional unsigned integer field indicating the 1206 importance of a media capture according to the Media Provider's 1207 perspective. It can be used on the receiver's side to automatically 1208 identify the most relevant contribution from the Media Provider. The 1209 higher the importance, the lower the contained value. If no priority 1210 is assigned, no assumptions regarding relative importance of the 1211 media capture can be assumed. 1213 11.15. 1215 is an optional element containing the language used in the 1216 capture. Zero or more elements can appear in the XML 1217 description of a media capture. 1219 11.16. 1221 is an optional element indicating whether or not the 1222 capture device originating the capture may move during the 1223 telepresence session. That optional element can assume one of the 1224 three following values: 1226 static SHOULD NOT change for the duration of the CLUE session, 1227 across multiple ADVERTISEMENT messages. 1229 dynamic MAY change in each new ADVERTISEMENT message. Can be 1230 assumed to remain unchanged until there is a new ADVERTISEMENT 1231 message. 1233 highly-dinamic MAY change dynamically, even between consecutive 1234 ADVERTISEMENT messages. The spatial information provided in an 1235 ADVERTISEMENT message is simply a snapshot of the current values 1236 at the time when the message is sent. 1238 11.17. 1240 The optional element contains the value of the captureID 1241 attribute (Section 11.1) of the media capture to which the considered 1242 media capture refers. The media capture marked with a 1243 element can be for example the translation of the referred media 1244 capture in a different language. 1246 11.18. 1248 The element is an optional tag describing what is represented 1249 in the spatial area covered by a media capture. The current possible 1250 values are: "table", "lectern", "individual", and "audience", as 1251 listed in the enumerative view type in the following. 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1264 11.19. 1266 The element is an optional tag used for media captures 1267 conveying information about presentations within the telepresence 1268 session. The current possible values are "slides" and "images", as 1269 listed in the enumerative presentation type in the following. 1271 1272 1273 1274 1275 1276 1277 1278 1280 11.19.1. 1282 The element is a boolean element indicating that there 1283 is text embedded in the media capture (e.g., in a video capture). 1284 The language used in such embedded textual description is reported in 1285 "lang" attribute. 1287 The XML Schema definition of the element is: 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1300 11.19.2. 1302 This optional element is used to indicate which telepresence session 1303 participants are represented within the media captures. For each 1304 participant, a element is provided. 1306 11.19.2.1. 1308 contains the identifier of the represented person, 1309 i.e., the value of the related personID attribute 1310 (Section 11.29.1.1). Metadata about the represented participant can 1311 be retrieved by accessing the list (Section 11.29). 1313 11.20. Audio captures 1315 Audio captures inherit all the features of a generic media capture 1316 and present further audio-specific characteristics. The XML Schema 1317 definition of the audio capture type is reported below: 1319 1320 1321 1322 1323 1324 1326 1328 1329 1330 1331 1332 1333 An example of audio-specific information that can be included is 1334 represented by the element. (Section 11.20.1). 1336 11.20.1. 1338 The element is an optional field describing the 1339 characteristics of the nominal sensitivity pattern of the microphone 1340 capturing the audio signal. 1342 The XML Schema definition is provided below: 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1355 11.21. Video captures 1357 Video captures, similarly to audio captures, extend the information 1358 of a generic media capture with video-specific features. 1360 The XML Schema representation of the video capture type is provided 1361 in the following: 1363 1364 1365 1366 1367 1368 1370 1371 1372 1373 1374 1376 11.22. Text captures 1378 Also text captures can be described by extending the generic media 1379 capture information, similarly to audio captures and video captures. 1381 The XML Schema representation of the text capture type is currently 1382 lacking text-specific information, as it can be seen by looking at 1383 the definition below: 1385 1386 1387 1388 1389 1390 1392 1393 1394 1395 1396 1398 Text captures SHOULD be marked as non spatially definable (i.e., they 1399 should present in their XML description the 1400 (Section 11.6) element set to "true"). 1402 11.23. Other capture types 1404 Other media capture types can be described by using the CLUE data 1405 model. They can be represented by exploiting the "otherCaptureType" 1406 type. This media capture type is conceived to be filled in with 1407 elements defined within extensions of the current schema, i.e., with 1408 elements defined in other XML schemas (see Section 14 for an 1409 example). The otherCaptureType inherits all the features envisioned 1410 for the abstract mediaCaptureType. 1412 The XML Schema representation of the otherCaptureType is the 1413 following: 1415 1416 1417 1418 1419 1420 1422 1423 1424 1425 1426 1428 When defining new media capture types that are going to be described 1429 by means of the element, spatial properties of 1430 such new media capture types SHOULD be defined (e.g., whether or not 1431 they are spatially definable, whether or not they should be 1432 associated with an area of capture, etc.). 1434 11.24. 1436 A Media Provider organizes the available captures in capture scenes 1437 in order to help the receiver both in the rendering and in the 1438 selection of the group of captures. Capture scenes are made of media 1439 captures and capture scene views, that are sets of media captures of 1440 the same media type. Each capture scene view is an alternative to 1441 represent completely a capture scene for a fixed media type. 1443 The XML Schema representation of a element is the 1444 following: 1446 1447 1448 1449 1450 1451 1452 1454 1455 1456 1457 1458 1459 Each capture scene is identified by a "sceneID" attribute. The 1460 element can contain zero or more textual 1461 elements, defined as in Section 11.13. Besides , there 1462 is the optional element (Section 11.24.1), which 1463 contains structured information about the scene in the vcard format, 1464 and the optional element (Section 11.24.2), which is the 1465 list of the capture scene views. When no is provided, 1466 the capture scene is assumed to be made of all the media captures 1467 which contain the value of its sceneID attribute in their mandatory 1468 captureSceneIDREF attribute. 1470 11.24.1. 1472 The element contains optional information about 1473 the capture scene according to the vcard format. 1475 11.24.2. 1477 The element is a mandatory field of a capture scene 1478 containing the list of scene views. Each scene view is represented 1479 by a element (Section 11.25). 1481 1482 1483 1484 1485 1487 1488 1490 11.24.3. sceneID attribute 1492 The sceneID attribute is a mandatory attribute containing the 1493 identifier of the capture scene. 1495 11.24.4. scale attribute 1497 The scale attribute is a mandatory attribute that specifies the scale 1498 of the coordinates provided in the spatial information of the media 1499 capture belonging to the considered capture scene. The scale 1500 attribute can assume three different values: 1502 "mm" - the scale is in millimeters. Systems which know their 1503 physical dimensions (for example professionally installed 1504 telepresence room systems) should always provide such real-world 1505 measurements. 1507 "unknown" - the scale is the same for every media capture in the 1508 capture scene but the unity of measure is undefined. Systems 1509 which are not aware of specific physical dimensions yet still know 1510 relative distances should select "unknown" in the scale attribute 1511 of the capture scene to be described. 1513 "noscale" - there is no common physical scale among the media 1514 captures of the capture scene. That means the scale could be 1515 different for each media capture. 1517 1518 1519 1520 1521 1522 1523 1524 1526 11.25. 1528 A element represents a capture scene view, which contains 1529 a set of media captures of the same media type describing a capture 1530 scene. 1532 A element is characterized as follows. 1534 1535 1536 1537 1538 1539 1540 1541 1542 One or more optional elements provide human-readable 1543 information about what the scene view contains. is 1544 defined as already seen in Section 11.13. 1546 The remaining child elements are described in the following 1547 subsections. 1549 11.25.1. 1551 The is the list of the identifiers of the media 1552 captures included in the scene view. It is an element of the 1553 captureIDListType type, which is defined as a sequence of 1554 , each containing the identifier of a media capture 1555 listed within the element: 1557 1558 1559 1560 1562 1563 1565 11.25.2. sceneViewID attribute 1567 The sceneViewID attribute is a mandatory attribute containing the 1568 identifier of the capture scene view represented by the 1569 element. 1571 11.26. 1573 The element represents an encoding group, which is 1574 made by a set of one or more individual encodings and some parameters 1575 that apply to the group as a whole. Encoding groups contain 1576 references to individual encodings that can be applied to media 1577 captures. The definition of the element is the 1578 following: 1580 1581 1582 1583 1584 1585 1587 1588 1589 1590 1592 In the following, the contained elements are further described. 1594 11.26.1. 1596 is an optional field containing the maximum 1597 bitrate expressed in bits per second that can be shared by the 1598 individual encodings included in the encoding group. 1600 11.26.2. 1602 is the list of the individual encodings grouped 1603 together in the encoding group. Each individual encoding is 1604 represented through its identifier contained within an 1605 element. 1607 1608 1609 1610 1611 1612 1614 11.26.3. encodingGroupID attribute 1616 The encodingGroupID attribute contains the identifier of the encoding 1617 group. 1619 11.27. 1621 represents a simultaneous transmission set, i.e., a 1622 list of captures of the same media type that can be transmitted at 1623 the same time by a Media Provider. There are different simultaneous 1624 transmission sets for each media type. 1626 1627 1628 1629 1631 1633 1635 1637 1638 1639 1640 1641 1643 Besides the identifiers of the captures ( 1644 elements), also the identifiers of capture scene views and of capture 1645 scene can be exploited as shortcuts ( and 1646 elements). As an example, let's consider the 1647 situation where there are two capture scene views (S1 and S7). S1 1648 contains captures AC11, AC12, AC13. S7 contains captures AC71, AC72. 1649 Provided that AC11, AC12, AC13, AC71, AC72 can be simultaneously sent 1650 to the media consumer, instead of having 5 1651 elements listed in the simultaneous set (i.e., one 1652 for AC11, one for AC12, and so on), there can be 1653 just two elements (one for S1 and one for S7). 1655 11.27.1. setID attribute 1657 The "setID" attribute is a mandatory field containing the identifier 1658 of the simultaneous set. 1660 11.27.2. mediaType attribute 1662 The "mediaType" attribute is an optional attribute containing the 1663 media type of the captures referenced by the simultaneous set. 1665 When only capture scene identifiers are listed within a simultaneous 1666 set, the media type attribute MUST appear in the XML description in 1667 order to determine which media captures can be simultaneously sent 1668 together. 1670 11.27.3. 1672 contains the identifier of the media capture that 1673 belongs to the simultanous set. 1675 11.27.4. 1677 contains the identifier of the scene view containing 1678 a group of captures that are able to be sent simultaneously with the 1679 other captures of the simultaneous set. 1681 11.27.5. 1683 contains the identifier of the capture scene 1684 where all the included captures of a certain media type are able to 1685 be sent together with the other captures of the simultaneous set. 1687 11.28. 1689 is a set of captures of the same media type representing 1690 a summary of the complete Media Provider's offer. The content of a 1691 global view is expressed by leveraging only scene view identifiers, 1692 put within elements. Each global view is identified 1693 by a unique identifier within the "globalViewID" attribute. 1695 1696 1697 1698 1700 1702 1703 1704 1705 1707 11.29. 1709 Information about the participants that are represented in the media 1710 captures is conveyed via the element. As it can be seen 1711 from the XML Schema depicted below, for each participant, a 1712 element is provided. 1714 1715 1716 1717 1719 1720 1722 11.29.1. 1724 includes all the metadata related to a person represented 1725 within one or more media captures. Such element provides the vcard 1726 of the subject (via the element, see Section 11.29.1.2) 1727 and his conference role(s) (via one or more elements, 1728 see Section 11.29.1.3). Furthermore, it has a mandatory "personID" 1729 attribute (Section 11.29.1.1). 1731 1732 1733 1734 1736 1739 1741 1742 1743 1744 1746 11.29.1.1. personID attribute 1748 The "personID" attribute carries the identifier of a represented 1749 person. Such an identifier can be used to refer to the participant, 1750 as in the element in the media captures 1751 representation (Section 11.19.2). 1753 11.29.1.2. 1755 The element is the XML representation of all the fields 1756 composing a vcard as specified in the Xcard RFC [RFC6351]. The 1757 vcardType is imported by the Xcard XML Schema provided by 1759 [I-D.ietf-ecrit-additional-data]. As such schema specifies, the 1760 element within is mandatory. 1762 11.29.1.3. 1764 The value of the element determines the role of the 1765 represented participant within the telepresence session organization. 1766 It can be one of the following terms, that are defined in the 1767 framework document: "presenter", "timekeeper", "attendee", "minute 1768 taker", "translator", "chairman", "vice-chairman". 1770 A participant can play more than one conference role. In that case, 1771 more than one element will appear in his description. 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1787 12. 1789 A capture encoding is given from the association of a media capture 1790 with an individual encoding, to form a capture stream as defined in 1791 [I-D.ietf-clue-framework]. Capture encodings are used within 1792 CONFIGURE messages from a Media Consumer to a Media Provider for 1793 representing the streams desired by the Media Consumer. For each 1794 desired stream, the Media Consumer needs to be allowed to specify: 1795 (i) the capture identifier of the desired capture that has been 1796 advertised by the Media Provider; (ii) the encoding identifier of the 1797 encoding to use, among those advertised by the Media Provider; (iii) 1798 optionally, in case of multi-content captures, the list of the 1799 capture identifiers of the desired captures. All the mentioned 1800 identifiers are intended to be included in the ADVERTISEMENT message 1801 that the CONFIGURE message refers to. The XML model of 1802 is provided in the following. 1804 1805 1806 1807 1808 1809 1810 1812 1813 1814 1815 1817 12.1. 1819 is the mandatory element containing the identifier of the 1820 media capture that has been encoded to form the capture encoding. 1822 12.2. 1824 is the mandatory element containing the identifier of 1825 the applied individual encoding. 1827 12.3. 1829 is an optional element to be used in case of 1830 configuration of MCC. It contains the list of capture identifiers 1831 and capture scene view identifiers the Media Consumer wants within 1832 the MCC. That element is structured as the element used to 1833 describe the content of an MCC. The total number of media captures 1834 listed in the must be lower than or equal to the 1835 value carried within the attribute of the MCC. 1837 13. 1839 The element includes all the information needed to 1840 represent the Media Provider's description of its telepresence 1841 capabilities according to the CLUE framework. Indeed, it is made by: 1843 the list of the available media captures ( 1844 (Section 5)) 1845 the list of encoding groups ( (Section 6)) 1847 the list of capture scenes ( (Section 7)) 1849 the list of simultaneous transmission sets ( 1850 (Section 8)) 1852 the list of global views sets ( (Section 9)) 1854 meta data about the participants represented in the telepresence 1855 session ( (Section 11.29)). 1857 It has been conceived only for data model testing purposes and, 1858 though it resembles the body of an ADVERTISEMENT message, it is not 1859 actually used in the CLUE protocol message definitions. The 1860 telepresence capabilities descriptions compliant to this data model 1861 specification that can be found in Section 17 and Section 18 are 1862 provided by using the element. 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1875 1876 1877 1878 1880 14. XML Schema extensibility 1882 The telepresence data model defined in this document is meant to be 1883 extensible. Extensions are accomplished by defining elements or 1884 attributes qualified by namespaces other than 1885 "urn:ietf:params:xml:ns:clue-info" and 1886 "urn:ietf:params:xml:ns:vcard-4.0" for use wherever the schema allows 1887 such extensions (i.e., where the XML Schema definition specifies 1888 "anyAttribute" or "anyElement"). Elements or attributes from unknown 1889 namespaces MUST be ignored. 1891 14.1. Example of extension 1893 When extending the CLUE data model, a new schema with a new namespace 1894 associated with it needs to be specified. 1896 In the following, an example of extension is provided. The extension 1897 defines a new audio capture attribute ("newAudioFeature") and an 1898 attribute for characterizing the captures belonging to an 1899 "otherCaptureType" defined by the user. An XML document compliant 1900 with the extension is also included. The XML file results validated 1901 against the current CLUE data model schema. 1903 1904 1915 1916 1919 1920 1923 1924 1925 1927 1929 1930 1934 1935 1939 CS1 1940 true 1941 true 1942 EG1 1943 newAudioFeatureValue 1944 1945 1949 CS1 1950 true 1951 EG1 1952 OtherValue 1953 1954 1955 1956 1957 1958 300000 1959 1960 ENC4 1961 ENC5 1962 1963 1964 1965 1966 1967 1968 1970 15. Security considerations 1972 This document defines an XML Schema data model for telepresence 1973 scenarios. The modeled information is identified in the CLUE 1974 framework as necessary in order to enable a full-optional media 1975 stream negotiation and rendering. Indeed, the XML elements herein 1976 defined are used within CLUE protocol messages to describe both the 1977 media streams representing the Media Provider's telepresence offer 1978 and the desired selection requested by the Media Consumer. Security 1979 concerns described in [I-D.ietf-clue-framework], Section 15, apply to 1980 this document. 1982 Data model information carried within CLUE messages SHOULD be 1983 accessed only by authenticated endpoints. Indeed, some information 1984 published by the Media Provider might reveal sensitive data about who 1985 and what is represented in the transmitted streams. The vCard 1986 included in the elements (Section 11.29.1) mandatorily 1987 contains the identity of the represented person. Optionally vCards 1988 can also carry the person's contact addresses, together with his/her 1989 photo and other personal data. Similar privacy-critical information 1990 can be conveyed by means of elements 1991 (Section 11.24.1) describing the capture scenes. The 1992 elements (Section 11.13) also can specify details about the content 1993 of media captures , capture scenes and scene views that should be 1994 protected. 1996 Integrity attacks to the data model information encapsulated in CLUE 1997 messages can invalidate the success of the telepresence session's 1998 setup by misleading the Media Consumer's and Media Provider's 1999 interpretation of the offered and desired media streams. 2001 The assurance of the authenticated access and of the integrity of the 2002 data model information is up to the involved transport mechanisms, 2003 namely the CLUE protocol [I-D.ietf-clue-protocol] and the CLUE data 2004 channel [I-D.ietf-clue-datachannel]. 2006 16. IANA considerations 2008 This document registers a new XML namespace, a new XML schema and the 2009 MIME type for the schema. 2011 16.1. XML namespace registration 2013 URI: urn:ietf:params:xml:ns:clue-info 2015 Registrant Contact: IETF CLUE Working Group , Roberta 2016 Presta 2018 XML: 2020 BEGIN 2022 2023 2025 2026 2027 2029 CLUE Data Model Namespace 2030 2031 2032

Namespace for CLUE Data Model

2033

urn:ietf:params:xml:ns:clue-info

2034

See RFC XXXX. 2035 2037

2038 2039 2041 END 2043 16.2. XML Schema registration 2045 This section registers an XML schema per the guidelines in [RFC3688]. 2047 URI: urn:ietf:params:xml:schema:clue-info 2049 Registrant Contact: CLUE working group (clue@ietf.org), Roberta 2050 Presta (roberta.presta@unina.it). 2052 Schema: The XML for this schema can be found as the entirety of 2053 Section 4 of this document. 2055 16.3. MIME Media Type Registration for 'application/clue_info+xml' 2057 This section registers the "application/clue_info+xml" MIME type. 2059 To: ietf-types@iana.org 2061 Subject: Registration of MIME media type application/clue+xml 2063 MIME media type name: application 2065 MIME subtype name: clue_info+xml 2066 Required parameters: (none) 2068 Optional parameters: charset 2069 Same as the charset parameter of "application/xml" as specified in 2070 [RFC7303], Section 3.2. 2072 Encoding considerations: Same as the encoding considerations of 2073 "application/xml" as specified in [RFC7303], Section 3.2. 2075 Security considerations: This content type is designed to carry data 2076 related to telepresence information. Some of the data could be 2077 considered private. This media type does not provide any protection 2078 and thus other mechanisms such as those described in Section 15 are 2079 required to protect the data. This media type does not contain 2080 executable content. 2082 Interoperability considerations: None. 2084 Published specification: RFC XXXX [[NOTE TO IANA/RFC-EDITOR: Please 2085 replace XXXX with the RFC number for this specification.]] 2087 Applications that use this media type: None. 2089 Additional Information: Magic Number(s): (none), 2090 File extension(s): .clue, 2091 Macintosh File Type Code(s): TEXT. 2093 Person & email address to contact for further information: Roberta 2094 Presta (roberta.presta@unina.it). 2096 Intended usage: LIMITED USE 2098 Author/Change controller: The IETF 2100 Other information: This media type is a specialization of 2101 application/xml [RFC7303], and many of the considerations described 2102 there also apply to application/clue_info+xml. 2104 17. Sample XML file 2106 The following XML document represents a schema compliant example of a 2107 CLUE telepresence scenario. Taking inspiration from the examples 2108 described in the framework draft ([I-D.ietf-clue-framework]), it is 2109 provided the XML representation of an endpoint-style Media Provider's 2110 offer. 2112 There are three cameras, where the central one is also capable of 2113 capturing a zoomed-out view of the overall telepresence room. 2115 Besides the three video captures coming from the cameras, the Media 2116 Provider makes available a further multi-content capture of the 2117 loudest segment of the room, obtained by switching the video source 2118 across the three cameras. For the sake of simplicity, only one audio 2119 capture is advertised for the audio of the whole room. 2121 The three cameras are placed in front of three participants (Alice, 2122 Bob and Ciccio), whose vcard and conference role details are also 2123 provided. 2125 Media captures are arranged into four capture scene views: 2127 1. (VC0, VC1, VC2) - left, center and right camera video captures 2129 2. (VC3) - video capture associated with loudest room segment 2131 3. (VC4) - video capture zoomed out view of all people in the room 2133 4. (AC0) - main audio 2135 There are two encoding groups: (i) EG0, for video encodings, and (ii) 2136 EG1, for audio encodings. 2138 As to the simultaneous sets, only VC1 and VC4 cannot be transmitted 2139 simultaneously since they are captured by the same device, i.e., the 2140 central camera (VC4 is a zoomed-out view while VC1 is a focused view 2141 of the front participant). The simultaneous sets would then be the 2142 following: 2144 SS1 made by VC3 and all the captures in the first capture scene view 2145 (VC0,VC1,VC2); 2147 SS2 made by VC3, VC0, VC2, VC4 2149 2150 2152 2153 2155 CS1 2156 EG1 2157 2158 2159 2160 0.5 2161 1.0 2162 0.5 2163 2164 2165 0.5 2166 0.0 2167 0.5 2168 2169 2170 2171 true 2172 main audio from the room 2173 1 2174 it 2175 static 2176 room 2177 2178 alice 2179 bob 2180 ciccio 2181 2182 2183 2185 CS1 2186 EG0 2187 2188 2189 2190 0.5 2191 1.0 2192 0.5 2193 2194 2195 0.5 2196 0.0 2197 0.5 2198 2199 2200 2201 true 2202 left camera video capture 2203 1 2204 it 2205 static 2206 individual 2207 2208 ciccio 2209 2210 2211 2213 CS1 2214 EG0 2215 2216 2217 2218 0.5 2219 1.0 2220 0.5 2221 2222 2223 0.5 2224 0.0 2225 0.5 2226 2227 2228 2229 true 2230 central camera video capture 2231 1 2232 it 2233 static 2234 individual 2235 2236 alice 2237 2238 2239 2241 CS1 2242 EG0 2243 2244 2245 2246 0.5 2247 1.0 2248 0.5 2249 2250 2251 0.5 2252 0.0 2253 0.5 2254 2255 2257 2258 true 2259 right camera video capture 2260 1 2261 it 2262 static 2263 individual 2264 2265 bob 2266 2267 2268 2270 CS1 2271 EG0 2272 true 2273 Soundlevel:0 2274 loudest room segment 2275 1 2276 it 2277 static 2278 individual 2279 2280 2282 CS1 2283 EG0 2284 2285 2286 2287 0.5 2288 1.0 2289 0.5 2290 2291 2292 0.5 2293 0.0 2294 0.5 2295 2296 2297 2298 true 2299 zoomed out view of all people in the 2300 room 2301 1 2302 it 2303 static 2304 room 2305 2306 alice 2307 bob 2308 ciccio 2309 2310 2311 2312 2313 2314 600000 2315 2316 ENC1 2317 ENC2 2318 ENC3 2319 2320 2321 2322 300000 2323 2324 ENC4 2325 ENC5 2326 2327 2328 2329 2330 2331 2332 2333 2334 VC0 2335 VC1 2336 VC2 2337 2338 2339 2340 2341 VC3 2342 2343 2344 2345 2346 VC4 2347 2348 2349 2350 2351 VC4 2352 2354 2355 2356 2357 2358 2359 2360 VC3 2361 SE1 2362 2363 2364 VC0 2365 VC2 2366 VC4 2367 VC3 2368 2369 2370 2371 2372 2373 2374 Bob 2375 2376 2377 minute taker 2378 2379 2380 2381 2382 Alice 2383 2384 2385 presenter 2386 2387 2388 2389 2390 Ciccio 2391 2392 2393 chairman 2394 timekeeper 2395 2396 2397 2398 18. MCC example 2400 Enhancing the scenario presented in the previous example, the Media 2401 Provider is able to advertise a composed capture VC7 made by a big 2402 picture representing the current speaker (VC3) and two picture-in- 2403 picture boxes representing the previous speakers (the previous one 2404 -VC5- and the oldest one -VC6). The provider does not want to 2405 instantiate and send VC5 and VC6, so it does not associate any 2406 encoding group with them. Their XML representations are provided for 2407 enabling the description of VC7. 2409 A possible description for that scenario could be the following: 2411 2412 2414 2415 2417 CS1 2418 EG1 2419 2420 2421 2422 0.5 2423 1.0 2424 0.5 2425 2426 2427 0.5 2428 0.0 2429 0.5 2430 2431 2432 2433 true 2434 main audio from the room 2435 1 2436 it 2437 static 2438 room 2439 2440 alice 2441 bob 2442 ciccio 2443 2445 2446 2448 CS1 2449 EG0 2450 2451 2452 2453 0.5 2454 1.0 2455 0.5 2456 2457 2458 0.5 2459 0.0 2460 0.5 2461 2462 2463 2464 true 2465 left camera video capture 2466 1 2467 it 2468 static 2469 individual 2470 2471 ciccio 2472 2473 2474 2476 CS1 2477 EG0 2478 2479 2480 2481 0.5 2482 1.0 2483 0.5 2484 2485 2486 0.5 2487 0.0 2488 0.5 2489 2490 2491 2492 true 2493 central camera video capture 2494 1 2495 it 2496 static 2497 individual 2498 2499 alice 2500 2501 2502 2504 CS1 2505 EG0 2506 2507 2508 2509 0.5 2510 1.0 2511 0.5 2512 2513 2514 0.5 2515 0.0 2516 0.5 2517 2518 2519 2520 true 2521 right camera video capture 2522 1 2523 it 2524 static 2525 individual 2526 2527 bob 2528 2529 2530 2532 CS1 2533 EG0 2534 true 2535 2536 SE1 2537 2538 Soundlevel:0 2539 loudest room segment 2540 1 2541 it 2542 static 2543 individual 2544 2545 2547 CS1 2548 EG0 2549 2550 2551 2552 0.5 2553 1.0 2554 0.5 2555 2556 2557 0.5 2558 0.0 2559 0.5 2560 2561 2562 2563 true 2564 zoomed out view of all people in the room 2565 2566 1 2567 it 2568 static 2569 room 2570 2571 alice 2572 bob 2573 ciccio 2574 2575 2576 2578 CS1 2579 true 2580 2581 SE1 2582 2583 Soundlevel:1 2584 penultimate loudest room segment 2585 2586 1 2587 it 2588 static 2589 individual 2590 2591 2593 CS1 2594 true 2595 2596 SE1 2597 2598 Soundlevel:2 2599 last but two loudest room segment 2600 2601 1 2602 it 2603 static 2604 individual 2605 2606 2608 CS1 2609 true 2610 2611 VC3 2612 VC5 2613 VC6 2614 2615 big picture of the current speaker + 2616 pips about previous speakers 2617 1 2618 it 2619 static 2620 individual 2621 2622 2623 2624 2625 600000 2626 2627 ENC1 2628 ENC2 2629 ENC3 2630 2631 2632 2633 300000 2634 2635 ENC4 2636 ENC5 2638 2639 2640 2641 2642 2643 2644 2645 participants' individual 2646 videos 2647 2648 VC0 2649 VC1 2650 VC2 2651 2652 2653 2654 loudest segment of the 2655 room 2656 2657 VC3 2658 2659 2660 2661 loudest segment of the 2662 room + pips 2663 2664 VC7 2665 2666 2667 2668 room audio 2669 2670 AC0 2671 2672 2673 2674 room video 2675 2676 VC4 2677 2678 2679 2680 2681 2682 2683 2684 VC7 2685 SE1 2687 2688 2689 VC0 2690 VC2 2691 VC4 2692 VC7 2693 2694 2695 2696 2697 2698 2699 Bob 2700 2701 2702 minute taker 2703 2704 2705 2706 2707 Alice 2708 2709 2710 presenter 2711 2712 2713 2714 2715 Ciccio 2716 2717 2718 chairman 2719 timekeeper 2720 2721 2722 2724 19. Diff with draft-ietf-clue-data-model-schema-10 version 2726 Minor modifications have been applied to address nits at page https:/ 2727 /www.ietf.org/tools/idnits?url=https://www.ietf.org/archive/id/ 2728 draft-ietf-clue-data-model-schema-10.txt. 2730 20. Diff with draft-ietf-clue-data-model-schema-09 version 2732 o We have introduced a element containing a 2733 mandatory and an optional in 2734 the definition of as per Paul's review 2736 o A new type definition for switching policies (resembled by 2737 element) has been provided in order to have acceptable 2738 values in the form of "token:index". 2740 o Minor modifications suggested in WGLC reviews have been applied. 2742 21. Diff with draft-ietf-clue-data-model-schema-08 version 2744 o Typos correction 2746 22. Diff with draft-ietf-clue-data-model-schema-07 version 2748 o IANA Considerations: text added 2750 o maxCaptureEncodings removed 2752 o personTypeType values aligned with CLUE framework 2754 o allowSubsetChoice added for multiple content captures 2756 o embeddedText moved from videoCaptureType definition to 2757 mediaCaptureType definition 2759 o typos removed from section Terminology 2761 23. Diff with draft-ietf-clue-data-model-schema-06 version 2763 o Capture Scene Entry/Entries renamed as Capture Scene View/Views in 2764 the text, / renamed as / 2765 in the XML schema. 2767 o Global Scene Entry/Entries renamed as Global View/Views in the 2768 text, / renamed as 2769 / 2771 o Security section added. 2773 o Extensibility: a new type is introduced to describe other types of 2774 media capture (otherCaptureType), text and example added. 2776 o Spatial information section updated: capture point optional, text 2777 now is coherent with the framework one. 2779 o Audio capture description: added, 2780 removed, disallowed. 2782 o Simultaneous set definition: added to refer to 2783 capture scene identifiers as shortcuts and an optional mediaType 2784 attribute which is mandatory to use when only capture scene 2785 identifiers are listed. 2787 o Encoding groups: removed the constraint of the same media type. 2789 o Updated text about media captures without 2790 (optional in the XML schema). 2792 o "mediaType" attribute removed from homogeneous groups of capture 2793 (scene views and globlal views) 2795 o "mediaType" attribute removed from the global view textual 2796 description. 2798 o "millimeters" scale value changed in "mm" 2800 24. Diff with draft-ietf-clue-data-model-schema-04 version 2802 globalCaptureEntries/Entry renamed as globalSceneEntries/Entry; 2804 sceneInformation added; 2806 Only capture scene entry identifiers listed within global scene 2807 entries (media capture identifiers removed); 2809 renamed as in the >clueInfo< template 2811 renamed as to synch with the framework 2812 terminology 2814 renamed as to synch with the 2815 framework terminology 2817 renamed as in the media capture 2818 type definition to remove ambiguity 2820 Examples have been updated with the new definitions of 2821 and of . 2823 25. Diff with draft-ietf-clue-data-model-schema-03 version 2825 encodings section has been removed 2827 global capture entries have been introduced 2829 capture scene entry identifiers are used as shortcuts in listing 2830 the content of MCC (similarly to simultaneous set and global 2831 capture entries) 2833 Examples have been updated. A new example with global capture 2834 entries has been added. 2836 has been made optional. 2838 has been renamed into 2840 Obsolete comments have been removed. 2842 participants information has been added. 2844 26. Diff with draft-ietf-clue-data-model-schema-02 version 2846 captureParameters and encodingParameters have been removed from 2847 the captureEncodingType 2849 data model example has been updated and validated according to the 2850 new schema. Further description of the represented scenario has 2851 been provided. 2853 A multiple content capture example has been added. 2855 Obsolete comments and references have been removed. 2857 27. Acknowledgments 2859 The authors thank all the CLUErs for their precious feedbacks and 2860 support. 2862 28. Informative References 2864 [I-D.ietf-clue-datachannel] Holmberg, C., "CLUE Protocol data 2865 channel", 2866 draft-ietf-clue-datachannel-10 2867 (work in progress), September 2015. 2869 [I-D.ietf-clue-framework] Duckworth, M., Pepperell, A., and 2870 S. Wenger, "Framework for 2871 Telepresence Multi-Streams", 2872 draft-ietf-clue-framework-23 (work 2873 in progress), September 2015. 2875 [I-D.ietf-clue-protocol] Presta, R. and S. Romano, "CLUE 2876 protocol", 2877 draft-ietf-clue-protocol-06 (work 2878 in progress), October 2015. 2880 [I-D.ietf-ecrit-additional-data] Gellens, R., Rosen, B., Tschofenig, 2881 H., Marshall, R., and J. 2882 Winterbottom, "Additional Data 2883 Related to an Emergency Call", 2884 draft-ietf-ecrit-additional-data-37 2885 (work in progress), October 2015. 2887 [RFC2119] Bradner, S., "Key words for use in 2888 RFCs to Indicate Requirement 2889 Levels", BCP 14, RFC 2119, 2890 DOI 10.17487/RFC2119, March 1997, < 2891 http://www.rfc-editor.org/info/ 2892 rfc2119>. 2894 [RFC3550] Schulzrinne, H., Casner, S., 2895 Frederick, R., and V. Jacobson, 2896 "RTP: A Transport Protocol for 2897 Real-Time Applications", STD 64, 2898 RFC 3550, DOI 10.17487/RFC3550, 2899 July 2003, . 2902 [RFC3688] Mealling, M., "The IETF XML 2903 Registry", BCP 81, RFC 3688, 2904 DOI 10.17487/RFC3688, January 2004, 2905 . 2908 [RFC4353] Rosenberg, J., "A Framework for 2909 Conferencing with the Session 2910 Initiation Protocol (SIP)", 2911 RFC 4353, DOI 10.17487/RFC4353, 2912 February 2006, . 2915 [RFC5117] Westerlund, M. and S. Wenger, "RTP 2916 Topologies", RFC 5117, 2917 DOI 10.17487/RFC5117, January 2008, 2918 . 2921 [RFC6351] Perreault, S., "xCard: vCard XML 2922 Representation", RFC 6351, 2923 DOI 10.17487/RFC6351, August 2011, 2924 . 2927 [RFC7303] Thompson, H. and C. Lilley, "XML 2928 Media Types", RFC 7303, 2929 DOI 10.17487/RFC7303, July 2014, . 2933 Authors' Addresses 2935 Roberta Presta 2936 University of Napoli 2937 Via Claudio 21 2938 Napoli 80125 2939 Italy 2941 EMail: roberta.presta@unina.it 2943 Simon Pietro Romano 2944 University of Napoli 2945 Via Claudio 21 2946 Napoli 80125 2947 Italy 2949 EMail: spromano@unina.it