idnits 2.17.1 draft-presta-clue-data-model-schema-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 769: '... line of capture MUST NOT be identical...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 2045 has weird spacing: '...ncoding to be...' -- The document date (March 8, 2013) is 4066 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'TBD' is mentioned on line 139, but not defined == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-09 Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE Working Group R. Presta 3 Internet-Draft S P. Romano 4 Intended status: Informational University of Napoli 5 Expires: September 9, 2013 March 8, 2013 7 An XML Schema for the CLUE data model 8 draft-presta-clue-data-model-schema-03 10 Abstract 12 This document provides an XML schema file for the definition of CLUE 13 data model types. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on September 9, 2013. 32 Copyright Notice 34 Copyright (c) 2013 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 49 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 50 3. XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 4. . . . . . . . . . . . . . . . . . . . . . . . 13 52 5. . . . . . . . . . . . . . . . . . . . . . . . . . 13 53 6. . . . . . . . . . . . . . . . . . . . . . . . 13 54 7. . . . . . . . . . . . . . . . . . . . . . . . 14 55 8. . . . . . . . . . . . . . . . . . . . . . . 14 56 9. . . . . . . . . . . . . . . . . . . . . . . 14 57 10. . . . . . . . . . . . . . . . . . . . . . . . . 14 58 10.1. . . . . . . . . . . . . . . . . . . . . . 15 59 10.2. . . . . . . . . . . . . . . . . . . . 15 60 10.3. . . . . . . . . . . . . . . . . . . . . . 16 61 10.4. . . . . . . . . . . . . . . . . . . 16 62 10.4.1. . . . . . . . . . . . . . . . . . . . 17 63 10.4.2. . . . . . . . . . . . . . . . . . . . . 18 64 10.5. . . . . . . . . . . . . . . . . . 18 65 10.6. . . . . . . . . . . . . . . . . . . . . . . 19 66 10.7. . . . . . . . . . . . . . . . . . . . . . . . 19 67 10.8. . . . . . . . . . . . . . . . . . . . . . . . . . 19 68 10.9. . . . . . . . . . . . . . . . . . . . . . . . . 19 69 10.10. . . . . . . . . . . . . . . . . . . . . . . . 20 70 10.11. . . . . . . . . . . . . . . . . . . . . . . . . 20 71 10.12. . . . . . . . . . . . . . . . . . . . . . . . 20 72 10.13. . . . . . . . . . . . . . . . . . . 20 73 10.14. . . . . . . . . . . . . . . . . . . . . . . . 20 74 10.15. captureID attribute . . . . . . . . . . . . . . . . . . . 21 75 11. Audio captures . . . . . . . . . . . . . . . . . . . . . . . . 21 76 11.1. . . . . . . . . . . . . . . . . . . 21 77 11.2. . . . . . . . . . . . . . . . . . . . . . . 22 78 12. Video captures . . . . . . . . . . . . . . . . . . . . . . . . 22 79 12.1. . . . . . . . . . . . . . . . . . . . 23 80 12.2. . . . . . . . . . . . . . . . . . . . . . 23 81 13. Text captures . . . . . . . . . . . . . . . . . . . . . . . . 24 82 14. . . . . . . . . . . . . . . . . . . . . . . . . 24 83 14.1. (was:) . . . . . . . . . . . . . 25 84 14.2. . . . . . . . . . . . . . . . . . . . . . 26 85 14.3. sceneID attribute . . . . . . . . . . . . . . . . . . . . 26 86 14.4. scale attribute . . . . . . . . . . . . . . . . . . . . . 26 87 15. . . . . . . . . . . . . . . . . . . . . . . . . . 27 88 15.1. . . . . . . . . . . . . . . . . . . . 27 89 15.2. . . . . . . . . . . . . . . . . . . . . 28 90 15.3. sceneEntryID attribute . . . . . . . . . . . . . . . . . 29 91 15.4. mediaType attribute . . . . . . . . . . . . . . . . . . . 29 92 16. . . . . . . . . . . . . . . . . . . . . . . . . . . 29 93 16.1. . . . . . . . . . . . . . . . . . . . . . 29 94 16.2. . . . . . . . . . . . . . . . . . . . . . 29 95 16.3. encodingID attribute . . . . . . . . . . . . . . . . . . 30 97 17. Audio encodings . . . . . . . . . . . . . . . . . . . . . . . 30 98 18. Video encodings . . . . . . . . . . . . . . . . . . . . . . . 30 99 18.1. . . . . . . . . . . . . . . . . . . . . . . . 31 100 18.2. . . . . . . . . . . . . . . . . . . . . . . . 31 101 18.3. . . . . . . . . . . . . . . . . . . . . . 31 102 19. H26X encodings . . . . . . . . . . . . . . . . . . . . . . . . 31 103 20. . . . . . . . . . . . . . . . . . . . . . . . 32 104 20.1. . . . . . . . . . . . . . . . . . . . 32 105 20.2. . . . . . . . . . . . . . . . . . . . . . . 33 106 20.3. . . . . . . . . . . . . . . . . . . . . 33 107 20.4. encodingGroupID attribute . . . . . . . . . . . . . . . . 33 108 21. . . . . . . . . . . . . . . . . . . . . . . 33 109 21.1. . . . . . . . . . . . . . . . . . . . . . 34 110 21.2. . . . . . . . . . . . . . . . . . . . . 34 111 22. . . . . . . . . . . . . . . . . . . . . . . 34 112 22.1. . . . . . . . . . . . . . . . . . . . . 34 113 22.2. . . . . . . . . . . . . . . . . . . . . . . 34 114 23. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 115 24. Sample XML file . . . . . . . . . . . . . . . . . . . . . . . 35 116 25. Diff with unofficial -02 version . . . . . . . . . . . . . . . 44 117 26. Diff with -02 version . . . . . . . . . . . . . . . . . . . . 46 118 27. Informative References . . . . . . . . . . . . . . . . . . . . 46 120 1. Introduction 122 This document provides an XML schema file for the definition of CLUE 123 data model types. 125 The schema is based on information contained in 126 [I-D.ietf-clue-framework] and also relates to the data model sketched 127 in [I-D.romanow-clue-data-model]. It encodes information and 128 constraints defined in the aforementioned documents in order to 129 provide a formal representation of the concepts therein presented. 130 The schema definition is intended to be modified according to changes 131 applied to the above mentioned CLUE documents. 133 The document actually represents a strawman proposal aiming at the 134 definition of a coherent structure for all the information associated 135 with the description of a telepresence scenario. 137 2. Terminology 139 [TBD] Copy text from the framework document. 141 3. XML Schema 143 This section contains the proposed CLUE data model schema definition. 145 The element and attribute definitions are formal representation of 146 the concepts needed to describe the capabilities of a media provider 147 and the current streams it is transmitting within a telepresence 148 session. 150 The main groups of information are: 152 : the list of media captures available (Section 4) 154 : the list of individual encodings (Section 5) 156 : the list of encodings groups (Section 6) 158 : the list of capture scenes (Section 7) 160 : the list of simultaneous capture 161 sets(Section 8) 163 : the list of instantiated capture encodings 164 (Section 9) 166 All of the above refers to concepts that have been introduced in 167 [I-D.ietf-clue-framework] and [I-D.romanow-clue-data-model] and 168 further detailed in threads on the mailing list as well as in the 169 following of this document. 171 172 180 181 182 183 184 185 186 188 189 190 191 192 194 195 197 198 199 200 201 202 203 204 205 206 208 209 210 211 212 213 214 215 216 217 219 220 221 222 223 224 225 226 227 228 229 230 232 233 234 236 237 238 239 241 242 243 244 245 247 249 250 251 253 254 255 256 257 258 259 261 262 263 264 265 266 268 270 271 272 273 275 276 277 278 279 280 281 282 283 284 285 287 288 289 290 291 292 293 295 296 297 298 299 300 302 303 304 305 306 308 309 310 311 312 313 314 315 316 317 319 320 321 322 323 325 326 328 329 330 331 332 333 334 336 337 338 339 340 342 343 344 345 346 347 348 349 351 352 353 354 355 356 357 359 360 362 363 364 365 366 367 368 369 370 371 372 373 374 376 377 378 379 380 381 382 383 385 386 387 388 389 390 392 393 394 395 396 398 399 400 401 402 404 405 406 407 408 409 410 412 413 414 415 416 418 419 420 421 422 424 425 427 428 429 430 432 433 435 436 437 438 440 441 443 444 445 446 447 448 450 451 452 453 454 455 456 457 458 459 461 462 463 464 466 467 468 469 470 471 473 474 475 476 477 478 479 481 482 483 484 485 486 487 489 490 491 492 494 495 496 497 499 500 501 502 503 504 505 507 508 510 511 512 513 515 516 517 518 519 520 522 523 524 525 527 528 530 531 532 533 535 537 538 540 541 542 543 544 545 546 548 549 550 551 553 554 556 557 559 561 562 563 564 565 566 567 568 569 571 572 573 574 576 578 Following sections describe the XML schema in more detail. 580 4. 582 represents the list of one ore more media captures 583 available on the media provider's side. Each media capture is 584 represented by a element (Section 10). 586 5. 588 represents the list of individual encodings available on 589 the media provider's side. Each individual encoding is represented 590 by an element (Section 16). 592 6. 594 represents the list of the encoding groups organized 595 on the media provider's side. Each encoding group is represented by 596 a element (Section 20). 598 7. 600 represents the list of the capture scenes organized 601 on the media provider's side. Each capture scene is represented by a 602 element. (Section 14). 604 8. 606 contains the simultaneous sets indicated by the 607 media provider. Each simultaneous set is represented by a 608 element. (Section 21). 610 9. 612 is a list of capture encodings. It can represents 613 the list of the desired capture encodings indicated by the media 614 consumer or the list of instantiated captures on the provider's side. 615 Each capture encoding is represented by a element. 616 (Section 22). 618 10. 620 According to the CLUE framework, a media capture is the fundamental 621 representation of a media flow that is available on the provider's 622 side. Media captures are characterized with a set of features that 623 are independent from the specific type of medium, and with a set of 624 feature that are media-specific. We design the media capture type as 625 an abstract type, providing all the features that can be common to 626 all media types. Media-specific captures, such as video captures, 627 audio captures and others, are specialization of that media capture 628 type, as in a typical generalization-specialization hierarchy. 630 The following is the XML Schema definition of the media capture type: 632 633 634 635 636 637 638 639 640 641 643 644 645 646 647 648 649 650 651 652 653 654 656 657 658 660 661 662 663 665 10.1. 667 is a mandatory field specifying the media type of the 668 capture ("audio", "video", "text",...). 670 10.2. 672 is a mandatory field containing the identifier of 673 the capture scene the media capture belongs to. Indeed, each media 674 capture must be associated with one and only capture scene. When a 675 media capture is spatially definible, some spatial information is 676 provided along with it in the form of point coordinates (see 677 Section 10.4). Such coordinates refers to the space of coordinates 678 defined for the capture scene containing the capture. 680 10.3. 682 is a mandatory field containing the identifier of the 683 encoding group the media capture is associated with. 685 10.4. 687 Media captures are divided into two categories: non spatially 688 definible captures and spatially definible captures. 690 Non spatially definible captures are those that do not capture parts 691 of the telepresence room. Capture of this case are for example those 692 related to registrations, text captures, DVDs, registered 693 presentation, or external streams, that are played in the 694 telepresence room and transmitted to remote sites. 696 Spatially definible captures are those that capture part of the 697 telepresence room. The captured part of the telepresence room is 698 described by means of the element. 700 This is the definition of the spatial information type: 702 703 704 705 706 708 710 711 713 715 The contains the coordinates of the capture device 716 that is taking the capture, as well as, optionally, the pointing 717 direction (see Section 10.4.1). It is a mandatory field when the 718 media capture is spatially definible, independently from the media 719 type. 721 The is an optional field containing four points 722 defining the captured area represented by the capture (see 723 Section 10.4.2). 725 10.4.1. 727 The element is used to represent the position and the 728 line of capture of a capture device. The XML Schema definition of 729 the element type is the following: 731 732 733 734 735 736 739 740 741 742 743 745 746 747 748 749 750 751 752 754 The point type contains three spatial coordinates ("x","y","z") 755 representing a point in the space associated with a certain capture 756 scene. 758 The capture point type extends the point type, i.e., it is 759 represented by three coordinates identifying the position of the 760 capture device, but can add further information. Such further 761 information is conveyed by the , which is another 762 point-type element representing the "point on line of capture", that 763 gives the pointing direction of the capture device. 765 If the point of capture is not specified, it means the consumer 766 should not assume anything about the spatial location of the 767 capturing device. 769 The coordinates of the point on line of capture MUST NOT be identical 770 to the capture point coordinates. If the point on line of capture is 771 not specified, no assumptions are made about the axis of the 772 capturing device. 774 10.4.2. 776 is an optional element that can be contained within the 777 spatial information associated with a media capture. It represents 778 the spatial area captured by the media capture. 780 The XML representation of that area is provided through a set of four 781 point-type element, , , , and 782 , as it can be seen from the following definition: 784 785 786 787 788 789 790 791 792 794 , , , and should be co- 795 planar. 797 For a switched capture that switches between different sections 798 within a larger area, the area of capture should use coordinates for 799 the larger potential area. 801 By comparing the capture area of different media captures within the 802 same capture scene, a consumer can determine the spatial 803 relationships between them and render them correctly. If the area of 804 capture is not specified, it means the Media Capture is not spatially 805 related to any other media capture. 807 10.5. 809 When media captures are non spatially definible, they are marked with 810 the boolean element set to "true". 812 10.6. 814 is used to provide optionally human-readable textual 815 information. It is used to describe media captures, capture scenes 816 and capture scene entries. A media capture can be described by using 817 multiple elements, each one providing information in a 818 different language. Indeed, the element definition is 819 the following: 821 822 823 824 825 826 827 828 829 830 832 As it can be seen, is a string element with an 833 attribute ("lang") indicating the language used in the textual 834 description. 836 10.7. 838 ([I-D.groves-clue-capture-attr]) is an optional integer 839 field indicating the importance of a media capture according to the 840 media provider's perspective. It can be used on the receiver's side 841 to automatically identify the most "important" contribution available 842 from the media provider. 844 [edt note: no final consensus has been reached on the adoption of 845 such media capture attribute.] 847 10.8. 849 is an optional element containing the language used in the 850 capture, if any. The purpose of the element could match the one of 851 the "language" attribute proposed in [I-D.groves-clue-capture-attr]. 853 10.9. 855 is an optional string element. It contains enumerated 856 values describing the "role" of the media capture according to what 857 is envisionend in [RFC4796] ("slides", "speaker", "sl", "main", 858 "alt"). The values for this attribute are the same as the mediacnt 859 values for the content attribute in [RFC4796]. This attribute can 860 list multiple values, for example "main, speaker". 862 [edt note: a better XML Schema definition for that element will soon 863 be defined.] 865 10.10. 867 is a boolean element which indicates whether or not the 868 media capture represents the most appropriate subset of a "whole". 869 What is "most appropriate" is up to the provider and could be the 870 active speaker, a lecturer or a VIP. 872 [edt note: :(] 874 10.11. 876 is an optional boolean element indicating wheter or not the 877 capture device originating the capture moves during the telepresence 878 session. That optional boolean element has the same purpose of the 879 dynamic attribute proposed in [I-D.groves-clue-capture-attr]. 881 [edt note: There isn't yet final consensus about that element.] 883 10.12. 885 is an optional boolean element indicating wheter or not 886 the media capture is a mix (audio) or composition (video) of streams. 887 This attribute is useful for a media consumer for example to avoid 888 nesting a composed video capture into another composed capture or 889 rendering. 891 10.13. 893 The optional contains an unsigned integer 894 indicating the maximum number of capture encodings that can be 895 simultaneously active for the media capture. If absent, this 896 parameter defaults to 1. The minimum value for this attribute is 1. 897 The number of simultaneous capture encodings is also limited by the 898 restrictions of the encoding group the media capture refers to my 899 means of the element. 901 10.14. 903 The optional element contains the value of the ID 904 attribute of the media capture it refers to. The media capture 905 marked with a element can be for example the translation 906 of a main media capture in a different language. The 907 element could be interpreted the same manner of the supplementary 908 information attribute proposed in [I-D.groves-clue-capture-attr] and 909 further discussed in 910 http://www.ietf.org/mail-archive/web/clue/current/msg02238.html. 912 [edt note: There isn't yet final consensus about that element.] 914 10.15. captureID attribute 916 The "captureID" attribute is a mandatory field containing the 917 identifier of the media capture. 919 11. Audio captures 921 Audio captures inherit all the features of a generic media capture 922 and present further audio-specific characteristics. The XML Schema 923 definition of the audio capture type is reported below: 925 926 927 928 929 930 932 934 935 936 937 939 Audio-specific information about the audio capture is contained in 940 (Section 11.1) and in 941 (Section 11.2). 943 11.1. 945 The optional element is a field with enumerated 946 values ("mono" and "stereo") which describes the method of encoding 947 used for audio. A value of "mono" means the audio capture has one 948 channel. A value of "stereo" means the audio capture has two audio 949 channels, left and right. A single stereo capture is different from 950 two mono captures that have a left-right spatial relationship. A 951 stereo capture maps to a single RTP stream, while each mono audio 952 capture maps to a separate RTP stream. 954 The XML Schema definition of the element type is 955 provided below: 957 958 959 960 961 962 963 965 11.2. 967 The element is an optional field describing the 968 characteristic of the mic capturing the audio signal. It can 969 contains the enumerated values listed below: 971 972 973 974 975 976 977 978 979 980 981 983 12. Video captures 985 Video captures, similarly to audio captures, extend the information 986 of a generic media capture with video-specific features, such as 987 (Section 12.1) and (Section 12.2). 989 The XML Schema representation of the video capture type is provided 990 in the following: 992 993 994 995 996 997 999 1000 1001 1002 1003 1005 12.1. 1007 If a video capture has a native aspect ratio (for instance, it 1008 corresponds to a camera that generates 4:3 video), then it can be 1009 supplied as a value of the element, in order to 1010 help rendering. 1012 12.2. 1014 The element is a boolean element indicating that there 1015 is text embedded in the video capture. The language used in such 1016 embedded textual description is reported in "lang" 1017 attribute. 1019 The XML Schema definition of the element is: 1021 1022 1023 1024 1025 1026 1027 1028 1029 1031 1033 The element could correspond to the embedded-text 1034 attribute introduced in [I-D.groves-clue-capture-attr] 1036 [edt note: no final consensus has been reached yet about the adoption 1037 of such element] 1039 13. Text captures 1041 Also text captures can be described by extending the generic media 1042 capture information, similarly to audio captures and video captures. 1044 The XML Schema representation of the text capture type is currently 1045 lacking text-specific information, as it can be seen by looking at 1046 the definition below: 1048 1049 1050 1051 1052 1053 1054 1056 14. 1058 A media provider organizes the available capture in capture scenes in 1059 order to help the receiver both in the rendering and in the selection 1060 of the group of captures. Capture scenes are made of capture scene 1061 entries, that are set of media captures of the same media type. Each 1062 capture scene entry represents an alternative to represent completely 1063 a capture scene for a fixed media type. 1065 The XML Schema representation of a element is the 1066 following: 1068 1069 1070 1071 1072 1073 1074 1076 1077 1078 1079 1080 1082 The element can contain zero or more textual 1083 elements, defined as in Section 10.6. Besides 1084 , there are two other fields: 1085 (Section 14.1), describing the coordinate space which the media 1086 captures of the capture scene refer to, and 1087 (Section 14.2), the list of the capture scene entries. 1089 14.1. (was:) 1091 The describes a bounding volume for the spatial 1092 information provided alongside spatially-definible media capture 1093 associated with the considered capture scene. Such volume is 1094 described as an arbitrary hexahedrons with eight points 1095 (, , , 1096 , , , , 1097 and ). The coordinate system is Cartesian X, Y, Z with 1098 the origin at a spatial location of the media provider's choosing. 1099 The media provider must use the same coordinate system with same 1100 scale and origin for all media capture coordinates within the same 1101 capture scene. 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1117 [edt note: this is just a place holder, the definition of the 1118 bounding volume has to be discussed] 1120 14.2. 1122 The element is a mandatory field of a capture scene 1123 containing the list of scene entries. Each scene entry is 1124 represented by a element (Section 15). 1126 1127 1128 1129 1130 1132 1133 1135 14.3. sceneID attribute 1137 The sceneID attribute is a mandatory attribute containing the 1138 identifier of the capture scene. 1140 14.4. scale attribute 1142 The scale attribute is a mandatory attribute that specifies the scale 1143 of the coordinates provided in the capture space and in the spatial 1144 information of the media capture belonging to the considered capture 1145 scene. The scale attribute can assume three different values: 1147 "millimeters" - the scale is in millimeters. Systems which know 1148 their physical dimensions (for example professionally installed 1149 telepresence room systems) should always provide those real-world 1150 measurements. 1152 "unknown" - the scale is not necessarily millimeters, but the 1153 scale is the same for every media capture in the capture scene. 1154 Systems which don't know specific physical dimensions but still 1155 know relative distances should select "unknown" in the scale 1156 attribute of the capture scene to be described. 1158 "noscale" - there is no a common physical scale among the media 1159 captures of the capture scene. That means the scale could be 1160 different for each media capture. 1162 1163 1164 1165 1166 1167 1168 1169 1171 15. 1173 A element represents a capture scene entry, which 1174 contains a set of media capture of the same media type describing a 1175 capture scene. 1177 A element is characterized as follows. 1179 1180 1181 1182 1183 1185 1186 1187 1188 1189 1191 One or more optional elements provide human-readable 1192 information about what the scene entry contains. is 1193 defined as already seen in Section 10.6. 1195 The remaining child elements are described in the following 1196 subsections. 1198 15.1. 1200 represents the switching policies the media 1201 provider support for the media captures contained inside a scene 1202 entry. The element contains two boolean 1203 elements: 1205 : if set to "true", it means that the media 1206 provider supports the site switching policy for the included media 1207 captures; 1209 : if set to "true", it means that the media 1210 provider supports the segment switching policy for the included 1211 media captures. 1213 The "site-switch" policy means all captures are switched at the same 1214 time to keep captures from the same endpoint site together. 1216 The "segment-switch" policy means different captures can switch at 1217 different times, and can be coming from different endpoints. 1219 1220 1221 1222 1223 1225 1226 1228 15.2. 1230 The is the list of the identifiers of the media 1231 captures included in the scene entry. It is an element of the 1232 captureIDListType type, which is defined as a sequence of 1233 each one containing the identifier of a media capture 1234 listed within the element: 1236 1237 1238 1239 1241 1242 1244 15.3. sceneEntryID attribute 1246 The sceneEntryID attribute is a mandatory attribute containing the 1247 identifier of the capture scene entry represented by the 1248 element. 1250 15.4. mediaType attribute 1252 The mediaType attribute contains the media type of the media captures 1253 included in the scene entry. 1255 16. 1257 The element represents an individual encoding, i.e., a way 1258 to encode a media capture. Individual encodings can be characterized 1259 with features that are independent from the specific type of medium, 1260 and with features that are media-specific. We design the individual 1261 encoding type as an abstract type, providing all the features that 1262 can be common to all media types. Media-specific individual 1263 encodings, such as video encodings, audio encodings and others, are 1264 specialization of that type, as in a typical generalization- 1265 specialization hierarchy. 1267 1268 1269 1270 1271 1272 1274 1275 1276 1277 1279 16.1. 1281 is a mandatory field containing the name of the 1282 encoding (e.g., G711, H264, ...). 1284 16.2. 1286 represent the maximum bitrate the media provider can 1287 instantiate for that encoding. 1289 16.3. encodingID attribute 1291 The encodingID attribute is a mandatory attribute containing the 1292 identifier of the individual encoding. 1294 17. Audio encodings 1296 Audio encodings inherit all the features of a generic individual 1297 encoding and can present further audio-specific encoding 1298 characteristics. The XML Schema definition of the audio encoding 1299 type is reported below: 1301 1302 1303 1304 1305 1306 1308 1309 1310 1311 1313 Up to now the only audio-specific information is the 1314 element containing the media type of the media captures that can be 1315 encoded with the considered individual encoding. In the case of 1316 audio encoding, that element is forced to "audio". 1318 18. Video encodings 1320 Similarly to audio encodings, video encodings can extend the 1321 information of a generic individual encoding with video-specific 1322 encoding features, such as , and 1323 . 1325 The element contains the media type of the media 1326 captures that can be encoded with the considered individual encoding. 1327 In the case of video encoding, that element is forced to "video". 1329 1330 1331 1332 1333 1334 1336 1337 1338 1339 1340 1341 1342 1344 18.1. 1346 represents the video resolution's maximum width supported 1347 by the video encoding, expressed in pixels. 1349 [edt note: not present in -09 version of the framework doc] 1351 18.2. 1353 representd the video resolution's maximum heith supported 1354 by the video encoding, expressed in pixels. 1356 [edt note: not present in -09 version of the framework doc] 1358 18.3. 1360 provides the maximum frame rate supported by the video 1361 encoding for the video capture to be encoded. 1363 [edt note: not present in -09 version of the framework doc] 1365 19. H26X encodings 1367 This is an example of how it is possible to further specialize the 1368 definition of a video individual encoding in order to cover encoding 1369 specific information. A H26X video encoding can be represented 1370 through an element inheriting the video encoding characteristics 1371 described above (Section 18) and by adding other information such as 1372 , which represent the maximum number of pixels to be 1373 processed per second;. 1375 1376 1377 1378 1379 1380 1381 1383 1384 1385 1386 1388 [edt note: Need to be checked] 1390 20. 1392 The element represents an encoding group, which is a 1393 set of one or more individual encodings, and parameters that apply to 1394 the group as a whole. The definition of the element 1395 is the following: 1397 1398 1399 1400 1401 1403 1404 1406 1407 1408 1409 1411 In the following, the contained elements are further described. 1413 20.1. 1415 is an optional field containing the maximum 1416 bitrate supported for all the individual encodings included in the 1417 encoding group. 1419 20.2. 1421 is an optional field containing the maximum number of 1422 pixel per second for all the individual encodings included in the 1423 encoding group. 1425 [edt note: Need to be checked] 1427 20.3. 1429 is the list of the individual encoding grouped 1430 together. Each individual encoding is represented through its 1431 identifier contained within an element. 1433 1434 1435 1436 1437 1438 1440 20.4. encodingGroupID attribute 1442 The encodingGroupID attribute contains the identifier of the encoding 1443 group. 1445 21. 1447 represents a simultaneous set, i.e. a list of 1448 capture of the same type that cab be transmitted at the same time by 1449 a media provider. There are different simultaneous transmission sets 1450 for each media type. 1452 1453 1454 1455 1457 1459 1460 1462 [edt note: need to be checked] 1464 21.1. 1466 contains the identifier of the media capture that 1467 belongs to the simultanous set. 1469 21.2. 1471 contains the identifier of the scene entry containing 1472 a group of capture that are able to be sent simultaneously with the 1473 other capture of the simultaneous set. 1475 22. 1477 A is given from the association of a media capture 1478 and an individual encoding, to form a capture stream. It is defined 1479 as en element of the following type: 1481 1482 1483 1484 1485 1486 1487 1489 22.1. 1491 contains the identifier of the media capture that 1492 has been encoded to form the capture encoding. 1494 22.2. 1496 contains the identifier of the applied individual 1497 encoding. 1499 23. 1501 The element has been left within the XML Schema for the 1502 sake of convenience when representing a prototype of ADVERTISEMENT 1503 message (see the example section). 1505 1506 1508 1510 1511 1512 1513 1514 1515 1516 1517 1518 1520 1521 1522 1523 1525 24. Sample XML file 1527 The following XML document represents a schema compliant example of a 1528 CLUE telepresence scenario. 1530 There are 5 video captures: 1532 VC0: the video from the left camera 1534 VC1: the video from the central camera 1536 VC2: the video from the right camera 1538 VC3: the overall view of the telepresence room taken from the 1539 central camera 1541 VC4: the video associated with the slide stream 1543 There are 2 audio captures: 1545 AC0: the overall room audio taken from the central camera 1547 AC1: the audio associated with the slide stream presentation 1549 The captures are organized into two capture scenes: 1551 CS1: this scene contains captures associated with the participants 1552 that are in the telepresence room. 1554 CS2: this scene contains captures associated with the slide 1555 presentation, which is a pre-registered presentation played within 1556 the context of the telepresence session. 1558 Within the capture scene CS1, there are three scene entries 1559 available: 1561 CS1_SE1: this entry contains the partipants' video captures taken 1562 from the three cameras (VC0, VC1, VC2). 1564 CS1_SE2: this entry contains the zoomed-out view of the overall 1565 telepresence room (VC3) 1567 CS1_SE3: this entry contains the overall telepresence room audio 1568 (AC0) 1570 On the other hand, capture scene CS2 presents two scene entries: 1572 CS2_SE1: this entry contains the presentation audio stream (AC1) 1574 CS2_SE2: this entry contains the presentation video stream (VC4) 1576 There are two encoding groups: 1578 EG0 This encoding groups involves video encodings ENC0, ENC1, ENC2 1580 EG1 This encoding groups involves audio encodings ENC3, ENC4 1582 As to the simultaneous sets, only VC1 and VC3 cannot be transmitted 1583 simultaneously since they are captured by the same device. i.e. the 1584 central camera (VC3 is a zoomed-out view while VC1 is a focused view 1585 of the front participants). The simultaneous sets would then be the 1586 following: 1588 SS1 made by VC0, VC1, VC2, VC4, AC0, AC1 1590 SS2 made by VC0, VC3, VC2, VC4, AC0, AC1 1592 1593 1594 1595 1598 audio 1599 CS2 1600 EG1 1601 true 1602 presentation audio 1603 slide 1604 mono 1605 1606 1609 video 1610 CS2 1611 EG0 1612 true 1613 presentation video 1614 slides 1615 1616 1619 audio 1620 CS1 1621 EG1 1622 1623 1624 0.5 1625 1.0 1626 0.5 1627 1628 0.5 1629 0.0 1630 0.5 1631 1632 1633 1634 1635 audio from the central camera mic 1636 mono 1637 figure8 1638 1639 1642 video 1643 CS1 1644 EG0 1645 1646 1647 1.5 1648 1.0 1649 0.5 1650 1651 1.5 1652 0.0 1653 0.5 1654 1655 1656 1657 1658 0.0 1659 3.0 1660 0.0 1661 1662 1663 3.0 1664 3.0 1665 0.0 1666 1667 1668 0.0 1669 3.0 1670 3.0 1671 1672 1673 3.0 1674 3.0 1675 3.0 1676 1677 1678 1679 1680 zoomed out view of the room 1681 1682 1685 video 1686 CS1 1687 EG0 1688 1689 1690 2.5 1691 1.0 1692 0.5 1693 1694 2.5 1695 0.0 1696 0.5 1697 1698 1699 1700 1701 2.0 1702 3.0 1703 0.0 1704 1705 1706 3.0 1707 3.0 1708 0.0 1709 1710 1711 2.0 1712 3.0 1713 3.0 1714 1715 1716 3.0 1717 3.0 1718 3.0 1719 1720 1721 1722 right camera video 1723 1724 1727 video 1728 CS1 1729 EG0 1730 1731 1732 1.5 1733 1.0 1734 0.5 1735 1736 1.5 1737 0.0 1738 0.5 1739 1740 1741 1742 1743 1.0 1744 3.0 1745 0.0 1746 1747 1748 2.0 1749 3.0 1750 0.0 1751 1752 1753 1.0 1754 3.0 1755 3.0 1756 1757 1758 2.0 1759 3.0 1760 3.0 1761 1762 1763 1764 central camera video 1765 1766 1769 video 1770 CS1 1771 EG0 1772 1773 1774 0.5 1775 1.0 1776 0.5 1777 1778 0.5 1779 0.0 1780 0.5 1781 1782 1783 1784 1785 0.0 1786 3.0 1787 0.0 1788 1789 1790 1.0 1791 3.0 1792 0.0 1793 1794 1795 0.0 1796 3.0 1797 3.0 1798 1799 1800 1.0 1801 3.0 1802 3.0 1803 1804 1805 1806 left camera video 1807 1808 1809 1810 1812 h263 1813 4000000 1814 video 1815 1920 1816 1088 1817 1818 1820 h263 1821 4000000 1822 video 1823 1920 1824 1088 1825 1826 1828 h263 1829 4000000 1830 video 1831 1920 1832 1088 1833 1834 1836 g711 1837 64000 1838 audio 1839 1840 1842 g711 1843 64000 1844 audio 1845 1846 1847 1848 1849 12000000 1850 1851 ENC0 1852 ENC1 1853 ENC2 1854 1855 1856 1857 12000000 1858 1859 ENC3 1860 ENC4 1861 1862 1863 1864 1865 1866 main scene 1867 1868 1869 0.0 1870 3.0 1871 0.0 1872 1873 1874 3.0 1875 3.0 1876 0.0 1877 1878 1879 0.0 1880 3.0 1881 2.0 1882 1883 1884 3.0 1885 3.0 1886 2.0 1887 1888 1889 0.0 1890 3.0 1891 0.0 1892 1893 1894 3.0 1895 3.0 1896 0.0 1897 1898 1899 0.0 1900 3.0 1901 2.0 1902 1903 1904 3.0 1905 3.0 1906 2.0 1907 1908 1909 1910 1911 1912 participants streams 1913 1914 VC0 1915 VC1 1916 VC2 1917 1918 1919 1920 room stream 1921 1922 VC3 1923 1924 1925 1926 room audio 1927 1928 AC0 1929 1930 1931 1932 1933 1934 presentation 1935 1936 1937 1938 presentation video 1939 1940 VC4 1941 1942 1943 1944 1945 presentation audio 1946 1947 AC1 1948 1949 1950 1951 1952 1953 1954 1955 VC0 1956 VC1 1957 VC2 1958 VC4 1959 AC0 1960 AC1 1961 1962 1963 VC0 1964 VC3 1965 VC2 1966 VC4 1967 AC0 1968 AC1 1969 1970 1971 1973 25. Diff with unofficial -02 version 1975 Here the link to the unofficial -02 version: 1976 http://www.grid.unina.it/Didattica/RetiDiCalcolatori 1977 /inf/draft-presta-clue-data-model-schema-02.html 1978 moved from to elements. 1979 have been moved out from the blob 1980 again. Media captures should have identifiers that are valid out 1981 of the local scope of capture scenes, since a consumer should be 1982 able to require also single captures in the CONFIGURE message. 1983 This design choice reflects a bottom up approach where captures 1984 are the basis of the data model. In each media capture a 1985 reference to the capture scene containing it is provided. It 1986 identifies the space the spatial information of the media capture 1987 refers to. 1989 XML document example updated A new example, compliant with the 1990 updated schema, has been provided. 1992 language attribute added to Such optional attribute 1993 reflects the language used in the capture, if any. The purpose of 1994 the element could match the one of the language attribute proposed 1995 in [I-D.groves-clue-capture-attr]. 1997 added to The priority element has an 1998 integer value helping in specifying a media capture relative 1999 importance with respect to the other captures. That element could 2000 correspond to the priority attribute introduced in 2001 [I-D.groves-clue-capture-attr]. 2003 added to The element, if present, 2004 indicates text embedded in the video capture. The language used 2005 in such embedded textual description is also envisioned within the 2006 element itself. That element could correspond to 2007 the priority attribute introduced in 2008 [I-D.groves-clue-capture-attr] 2010 added to That optional element contains 2011 the ID of a capture the capture refers to. This is for supporting 2012 cases where there is the translation of a main capture in a 2013 different language. Such translation can be marked with a 2014 tag to refer to the main capture. This could be 2015 interpreted the same manner of the supplementary information 2016 attribute proposed in [I-D.groves-clue-capture-attr] and further 2017 discussed in 2018 http://www.ietf.org/mail-archive/web/clue/current/msg02238.html. 2020 added to That optional boolean element has 2021 the same purpose of the dynamic attribute proposed in 2022 [I-D.groves-clue-capture-attr]. It indicates if the capture 2023 device originating the capture moves during the telepresence 2024 session. 2026 new element definition for has a new 2027 attribute, lang, indicating the language used for the text within 2028 . is used to provide human readable 2029 information about captures, scene, and scene entries. The 2030 definitions of the corresponding XML elements (i.e., 2031 , , ) have been updated to 2032 make them able to contain more than one . In that 2033 way, they can be described in different languages. 2035 text capture added as new type of capture The element is just a 2036 place holder, since it is not characterized with any further 2037 information up to now. 2039 26. Diff with -02 version 2041 of capture space type (was:) 2042 describes a bounding volume for the space of a capture scene as an 2043 arbitrary hexahedrons with eight points (placeholder solution). 2045 H26X encoding to be checked. 2047 Simultaneous sets The XML Schema definition of the simultaneous sets 2048 has changed. A simultaneous set is defined as a list of L media 2049 capture identifiers and M capture scene entrie identifiers, where 2050 L, M can be 0 or unbounded. 2052 Capture encoding A new XML Schema type has been added to describe 2053 capture encodings as the result of the association of a media 2054 capture, represented by its identifier, with an individual 2055 encoding, represented by its identifier as well. 2057 Clue info The element has been left within the XML Schema 2058 for the sake of convenience when representing a prototype of 2059 ADVERTISEMENT message (see the example section). 2061 Data model definitions added For each element of the datamodel a 2062 brief description has been reported to foster discussion. 2064 27. Informative References 2066 [I-D.groves-clue-capture-attr] Groves, C., Yang, W., and R. Even, 2067 "CLUE media capture description", 2068 draft-groves-clue-capture-attr-01 2069 (work in progress), February 2013. 2071 [I-D.ietf-clue-framework] Duckworth, M., Pepperell, A., and S. 2072 Wenger, "Framework for Telepresence 2073 Multi-Streams", 2074 draft-ietf-clue-framework-09 (work in 2075 progress), February 2013. 2077 [I-D.romanow-clue-data-model] Romanow, A. and A. Pepperell, "Data 2078 model for the CLUE Framework", 2079 draft-romanow-clue-data-model-01 2080 (work in progress), June 2012. 2082 [RFC4796] Hautakorpi, J. and G. Camarillo, "The 2083 Session Description Protocol (SDP) 2084 Content Attribute", RFC 4796, 2085 February 2007. 2087 Authors' Addresses 2089 Roberta Presta 2090 University of Napoli 2091 Via Claudio 21 2092 Napoli 80125 2093 Italy 2095 EMail: roberta.presta@unina.it 2097 Simon Pietro Romano 2098 University of Napoli 2099 Via Claudio 21 2100 Napoli 80125 2101 Italy 2103 EMail: spromano@unina.it