idnits 2.17.1 draft-mahy-xcon-media-policy-control-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 241 has weird spacing: '...ue type regis...' == Line 492 has weird spacing: '...ence to the...' == Line 495 has weird spacing: '...ence to the...' == Line 504 has weird spacing: '...ence to the...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 16, 2004) is 7374 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Normative reference to a draft: ref. '2' -- Possible downref: Normative reference to a draft: ref. '3' == Outdated reference: A later version (-02) exists of draft-koskelainen-xcon-xcap-cpcp-usage-00 -- Possible downref: Normative reference to a draft: ref. '4' ** Obsolete normative reference: RFC 2616 (ref. '5') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' == Outdated reference: A later version (-26) exists of draft-ietf-mmusic-sdp-new-13 ** Obsolete normative reference: RFC 3388 (ref. '11') (Obsoleted by RFC 5888) -- Possible downref: Non-RFC (?) normative reference: ref. '12' == Outdated reference: A later version (-05) exists of draft-ietf-sipping-conferencing-framework-00 == Outdated reference: A later version (-01) exists of draft-ietf-sipping-conferencing-requirements-00 == Outdated reference: A later version (-12) exists of draft-ietf-sipping-conference-package-00 -- No information found for draft-koskelainen-xcon-floor-control-reqs - is the name correct? Summary: 3 errors (**), 0 flaws (~~), 11 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 XCON BOF R. Mahy 3 Internet-Draft N. Ismail 4 Expires: August 16, 2004 Cisco Systems, Inc. 5 February 16, 2004 7 Media Policy Manipulation in the Conference Policy Control Protocol 8 draft-mahy-xcon-media-policy-control-01.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at http:// 25 www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on August 16, 2004. 32 Copyright Notice 34 Copyright (C) The Internet Society (2004). All Rights Reserved. 36 Abstract 38 The SIP conferencing framework defines a model for tightly-coupled 39 conferencing signaled via the Session Initiation Protocol (SIP), in 40 which a Conference Policy Control Protocol is used to manipulate 41 policies relevant to a specific conference, such as conference 42 membership policy, authorization policy, and media layout. This 43 document describes a logical model, which can apply to any session 44 setup protocol, to describe media processing in a tightly-coupled 45 conference. It also defines specific protocol semantics and a 46 specific syntax to manipulate that model. 48 Table of Contents 50 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2.1 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.2 Groups and Bundles . . . . . . . . . . . . . . . . . . . . . . 5 54 2.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 5 55 2.4 Collections . . . . . . . . . . . . . . . . . . . . . . . . . 6 56 2.5 Using these Elements . . . . . . . . . . . . . . . . . . . . . 8 57 3. Some Standard Operators . . . . . . . . . . . . . . . . . . . 10 58 4. More about Collections . . . . . . . . . . . . . . . . . . . . 14 59 4.1 The Basic Audio Collection . . . . . . . . . . . . . . . . . . 15 60 4.2 Basic Video MP Collection . . . . . . . . . . . . . . . . . . 16 61 4.3 Basic Audio Collection with Floor Control . . . . . . . . . . 17 62 4.4 Basic Video Collection with Floor Control . . . . . . . . . . 18 63 4.5 Sidebar Audio Collection . . . . . . . . . . . . . . . . . . . 19 64 5. Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 21 65 5.1 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . 21 66 5.2 Client Behavior . . . . . . . . . . . . . . . . . . . . . . . 21 67 5.3 Server Behavior . . . . . . . . . . . . . . . . . . . . . . . 22 68 5.4 Notifications of media policy changes . . . . . . . . . . . . 23 69 6. Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . . 23 70 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 31 72 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 73 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 31 74 Normative References . . . . . . . . . . . . . . . . . . . . . 31 75 Informational References . . . . . . . . . . . . . . . . . . . 32 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 33 77 A. Standard Tile Order . . . . . . . . . . . . . . . . . . . . . 33 78 Intellectual Property and Copyright Statements . . . . . . . . 34 80 1. Conventions 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in RFC-2119 [1]. 86 2. Overview 88 The SIP conferencing framework [13] defines a model for 89 tightly-coupled conferences setup via SIP [8], in which a Conference 90 Policy Control Protocol is used to manipulate policies which are 91 relevant to a specific conference instance, such as conference 92 membership policy, authorization policy, and media layout. (As 93 discussed later, the bulk of this model is applicable to 94 tightly-coupled conferences accessed using almost any session setup 95 protocol.) While the conference policy control protocol provides many 96 non-media specific functions [4] such as membership policy and 97 authorization policy, this document specifically addresses 98 requirements [3] to manipulate the way in which media in such a 99 conference is selected, combined, and modified. It defines a logical 100 model of media processing using a "media topology graph". By 101 manipulating the graph, authorized users can change the media 102 processing behavior of the mixers associated with a specific 103 conference. 105 Here we will briefly summarize the terminology used in SIP 106 conferencing framework in protocol-inspecific terms. Each 107 "conference" is an instance of a multi-media conversation which has a 108 unique protocol-specific identifier. Other (optional) identifiers 109 can represent a conference-factory (an identifier which creates new 110 conferences when contacted). Conferences can contain 111 sub-conferences, which have a unique identifier within the 112 conference, and optionally a unique, protocol-specific, external 113 identifier as well. Each conference identifier is managed by a 114 logical role called a focus, which manages session state for all 115 sessions in the conference. The focus is responsible for 116 coordinating media combining through logical mixers. Mixers perform 117 the actual selection and combination operations. A logical 118 Conference Policy server manages creation and deletion of 119 conferences, authorization, conference longevity, and the media 120 layout or topology. In addition, the focus can use protocol-specific 121 notification mechanisms to provide access to a basic roster and 122 changes in media or non-media aspects of conference policy. Finally, 123 the conference policy may be configured such that mixers use the 124 information returned dynamically by Floor control server(s) to affect 125 media selection. 127 A media topology graph is a loop-free graph which consists of 128 individual media streams, logical groups of media streams, and 129 functions or "operations" performed on those streams. These elements 130 are typically associated with a specific subconference. A 131 subconference simply defines a context which allows different groups 132 of users to share a media topology and participant roster with a 133 subset of the participants in a conference. Subconferences are 134 defined in the conferencing framework, and are typically used to 135 enable conferencing sidebars. For convenience purposes, 136 subgraphs--called collections--of connected operators can be defined, 137 instantiated, and manipulated just like individual elements. These 138 elements and their properties are described below. 140 2.1 Streams 142 In the beginning there were Streams. These are the actual media 143 streams sent and/or received by or on behalf of conference 144 participants. Media streams are typically established when conference 145 participants join a conference and are described by the SDP [9] 146 media lines in the offer/answer [10] exchange between the 147 participants and the focus, or the analogous exchange in other 148 protocols (ex: H.245 [12] logical channel establishment). Within the 149 media topology graph, each stream is described by a media type, 150 direction and at least one identifier. Initially media types 151 considered include audio, video or text. (Other media types can also 152 be considered in the future.) The direction "in" corresponds to 153 streams originating from the conference participants to the 154 conference, and "out" for streams originating from the conference and 155 terminating at the conference participant. Stream identifiers can be 156 network identifiers or aliases. Network identifiers consist of an 157 address family (IPv4 or IPv6), an IP address, and a port number. 159 Aliases can also be created for any of the streams, either 160 automatically or when created manually. One such automatic alias 161 consists of a participant identifier and a media stream instance (for 162 example, in SDP, either the media stream identification "mid" as 163 specified in RFC3388 [11] or the position of the media line 164 describing the stream in SDP). Another set of automatic aliases can 165 be created automatically when per media line i-lines (description 166 lines) appear in the SDP. 168 Conference Policy servers provide clients with lists of stream 169 descriptions as part of protocol-specific notification mechanisms 170 such as the SIP conference package [15] and in response to inventory 171 requests as specified in Section 5.3. Clients use the stream 172 identifier that is part of a stream description to associate and 173 connect (or disconnect) a specific stream with a specific group. 174 (Stream identifiers also play an important role in the naming of the 175 logical internal streams which make up the "bundles" described later 176 in this section.) 178 Editors Note: The distinction between external streams and 179 internal (logical) streams may be confusing. If this becomes a 180 problem, one or both terms will be renamed. 182 2.2 Groups and Bundles 184 Media groups (hereafter just "groups") are created automatically by 185 servers within the context of a sub-conference as specified in 186 Section 5.3 and have a media type and a direction. Input groups 187 take individual streams and aggregate them into a bundle of named 188 streams. Likewise, output groups accept a bundle of named streams, 189 and distribute these as appropriate to individual output streams. 190 One motivation for naming streams in a bundle is described shortly. 191 Also, the process used to distribute output streams is described in 192 the server behavior section. Groups do not connect directly to other 193 groups. 195 Bundles are a logical concept which represent a set of individually 196 tagged (named) logical streams. Input bundles contain tags which 197 describe which identifier or participant is contributing to a logical 198 stream. Output bundles contain tags which describe which identifiers 199 or participants should receive a logical stream. This distinction 200 allows participants to receive different streams even when their 201 logical description of the topology is the same. For example, in 202 most audio conferences participants do not hear their own input. 203 Most output bundles also contain a default logical stream. 205 2.3 Operators 207 Next are Operators. Operators are basic elements that perform simple 208 media operations. They select among media streams, combine streams, 209 or perform other media processing. Each operator has a type, one or 210 more inputs, one logical output, and an optional set of parameters. 211 The type uniquely identifies the operator and specifies the media 212 service offered. 214 Selection operators typically accept an input bundle and generate an 215 ordered Set of names of logical streams. These sets can be further 216 manipulated by other operators, but typically they are used as input 217 to a mixing or combining operator. Mixing operators typically 218 receive an input bundle and an ordered list and generate an output 219 bundle. Obviously at least one mixer in the topology graph must be 220 present which can switch the orientation of the streams. Other types 221 of mixers may receive one or more output bundles, perform the 222 appropriate content manipulation, and return a bundle which preserves 223 the sense of the original tags. 225 For example, the simplest type of mixer is a promiscuous media mux. 226 It receives an input bundle and generates a bundle consisting of a 227 single default stream (all of the original streams appended to each 228 other). In another simple variation, a media mux generates a named 229 output stream in the output bundle which contains all the other 230 output except that of the sender, for each named input stream in the 231 input bundle. Most mixing operations actually combine input streams 232 in some media-specific way (for example: tiling for video). Other 233 types of operators can provide other arbitrary media or set 234 manipulations such as adjust volume, cross-fade, etc. Operators 235 cannot connect directly to input or output streams. Each type of 236 operator defines the semantics of the operation and any parameters. 237 Parameters define aspects of the operator's function that can differ 238 from one instance of the operator to another. 240 This document defines a set of standard operators (see Section 3 ). 241 Each standard operator has a unique type registered with IANA and an 242 XML schema describing the operator. Server implementations can 243 support any of the set of standard operators. As well, implementors 244 can define their own operators and operator types. Clients can 245 discover which operators are supported by making inventory requests 246 to the Server. Authorized clients can then instantiate operators 247 using the method specified in Section 5.2. 249 2.4 Collections 251 Finally there are Collections. Collections are subgraphs created by 252 connecting different operators together. Each collection can provide 253 a specific, potentially sophisticated, media service. Like 254 operators, a collection has a type that uniquely identifies it and 255 specifies its function. Each collection has one or more inputs, one 256 logical output and an optional set of parameters. As with operators, 257 this specification defines a set of standard collections that offer 258 the most common mixing and switching media functions available. Each 259 standard collection has a unique type that will be registered with 260 IANA and an XML schema describing the collection. Server 261 implementations can support any of the set of standard collections 262 and they can also define their own proprietary collections. Each 263 newly defined collection needs a unique type and a published XML 264 schema. Clients can make inventory requests to Servers to get the set 265 of collections supported by the server. Clients can then instantiate 266 collections using the method specified in Section 5.2. Clients can 267 also make their own collections to provide new media services by 268 using the method specified in Section 4. 270 Below follows an example diagram of a media topology graph for a 271 simple audio conference using the default audio collection. 273 Input Streams 275 A B C D E 277 | | | | | 278 | | | | | 279 v v v v v 280 +----------------------------------+ 281 | | 282 | Subconference 0 (Main conf) | 283 | Audio Input Group | 284 | | 285 +----------------------------------+ 286 || 287 \/ 288 ............................................. 289 : || || : 290 : Input || Input || : 291 : Bundle || Bundle || : 292 : || \/ : 293 : || +-------------+ : 294 : || | Speaker | : 295 : || | Selection | : 296 : || | Operator | : 297 Default : || | | : 298 Audio : || +-------------+ : 299 Collec- : || / : 300 tion : || / Ordered List of : 301 : \/ / Speakers : 302 : +---------------+ : 303 : | Audio | : 304 : | MixMinus | : 305 : | Operator | : 306 : | | : 307 : +---------------+ : 308 : || : 309 : || Output Bundle : 310 : \/ : 311 ............................................. 312 || 313 \/ 314 +----------------------------------+ 315 | | 316 | Subconference 0 (Main conf) | 317 | Audio Input Group | 318 | | 319 +----------------------------------+ 320 | | | | | 321 | | | | | 322 v v v v v 324 A B C D E 326 Output Streams 328 2.5 Using these Elements 330 This document defines numerous standard operators (in Section 3) to 331 facilitate interoperability. Implementors are free to extend this 332 list of operators, and an IANA registration process is defined for 333 this purpose. Note that specific conference servers may (MAY) support 334 as few or as many operators as they choose, however each conference 335 server needs to (MUST) support at least one standard collection per 336 media type (these are defined in Section 4) which the conference 337 server is capable of handling. 339 Media manipulation is generally media-specific. When a subconference 340 is created, an input group and an output group are automatically 341 created for each media type supported by the conference server, and a 342 specific collection can be instantiated (again, for each media type). 343 Once instantiated, collections are simply a subgraph of operators 344 connected in some specific way. The resulting graph can be modified, 345 attached, detached, and deleted without affecting the collection from 346 which the graph was copied. Note also that more than one collection 347 can be incorporated into the topology graph for a given subconference 348 and media type. 350 Manipulating the topology graph for a tightly-coupled conference 351 enables a number of useful features, many of which are described in 352 the XCON scenarios [16] and SIP conferencing high-level requirements 353 [14] documents. 355 For example, noisy participants can be "muted" from a conference by 356 disconnecting their audio from the appropriate input group. 357 Participants can be moved to a sidebar by disconnecting their media 358 streams (some or all of them) and reconnecting them to the input and 359 output groups created for the corresponding subconference. 360 Interaction with floor control [17] is coordinated by including an 361 operator which selects only media streams corresponding to 362 participants who have the appropriate floor. The resulting logical 363 output stream or group of streams can be connected to a suitable 364 filtering, mixing, or combining operator (for example tiling for 365 video). 367 Obviously, authorization is required to allow manipulation of media 368 topology by multiple parties (participants and non-participants 369 alike). The effects of manipulating the media topology graph can 370 range from simple, benign changes which only affect the participant 371 requesting the change, to complete failure of the conference. Clearly 372 no one-size-fits-all policy can be applied. However it is useful to 373 recognize several different categories or severities of impact. 375 o connecting and disconnecting your own streams to a group 377 o connecting and disconnecting another participants streams 379 o creating subconferences 381 o instantiating arbitrary operators or collections 383 o connecting and disconnecting operators and collections to your own 384 groups 386 o connecting and disconnecting operators and collections which 387 affect an existing conference or subconference 389 The rest of the functions of the Conference Policy Control Protocol 390 (CPCP for brevity) are mostly orthogonal to media manipulation and so 391 they are described in a separate document [4]. However it is 392 important to mention the interaction between the media 393 topology-specific and other aspects of the policy. Conferences and 394 subconferences can be created and deleted by CPCP. Although not 395 topology dependent, when these are created the media topology will 396 change automatically to reflect this. Also, one participant may wish 397 to invite several other participants to a subconference (sidebar), 398 but the initiating participant may not have permission to change the 399 stream connection properties of all of the participants. In this 400 case, the initiator places the participant in a pending state. This 401 informs the participant that the initiator would like the participant 402 to join the sidebar. Then the participant (or an agent acting on his 403 or her behalf) either makes the requested change to the media 404 topology by connecting his or her streams to the appropriate groups 405 (a media topology task), or removes himself or herself from the 406 pending list (a non-media related task). Finally, in many cases 407 authorized users can set authorization policy related to a variety of 408 aspects of conference policy. While setting these policies is 409 non-media related, many uses of these policies do affect the media 410 topology. Note that because of this separation, it is possible to 411 produce an implementation of CPCP which runs on two separate servers, 412 one responsible for media topology and the other responsible for the 413 balance of conference policy functions. 415 3. Some Standard Operators 417 This sections specifies a set of operators that are needed to provide 418 the most common media processing operators used in conferencing 419 today. Each operator performs a specific function. Each type of 420 operator is registerd with IANA and has an XML Schema [7] that 421 defines how to use the operator. Server implementations are free to 422 support any number of these operators (or none of them) as well as 423 define their own operators. 425 The operators described below are logical operators which are useful 426 for describing conference features. Implementations may use any 427 internal representation which generates externally identical 428 functionality. The formal syntax for using these operators is 429 described in Section 6. 431 The "audioSelectSpeakers" operator takes an audio input bundle and 432 generates an ordered list of names of streams. This list is ordered 433 by the priority for including them in an audio mix. No specific 434 algorithm is specified for selecting which speakers are the "best", 435 but commercial implementations typically use a combination of last, 436 loudest, and longest speakers. The actual list of selected speakers 437 is dynamically calculated by a conference mixer. A generically vague 438 definition was intentionally chosen to allow most implementations to 439 offer this operator. 441 The "audioMixMinus" operator takes an audio input bundle and an 442 ordered list of names of streams and generates an audio output 443 bundle. It selects the first of the streams from the ordered 444 list, where is an implementation-specific integer. The output 445 bundle contains a default stream (which mixes all logical 446 streams) and one logical stream for each stream present in the 447 original input bundle which contains a mix of all logical streams 448 except for input streams corresponding to the same participant as 449 that output stream. In general this property of a mixer is called an 450 exclusive property because it causes participant ouputs to be 451 excluded from their own inputs. With these two operators, you can 452 build the default audio collection described in Section 4.1 and 453 illustrated in the figure in Section 2.4. 455 The "allParticipantsSet" operator takes an input bundle and generates 456 an unordered list of all the stream names which could conceivably 457 contribute to that bundle. 459 The "videoSelectSpeakers" operator takes an audio input bundle (to 460 determine who is speaking) and generates an ordered list of names of 461 streams. This list is ordered by the priority for including any 462 corresponding video streams in a video mix. Note that at a given 463 instant the output of videoSelectSpeakers and audioSelectSpeakers may 464 be different. For example, video speaker selection algorithms 465 typically delay their selection to avoid swapping speakers in the 466 presence of noise such as coughs. 468 The "setIntersection" operator takes an (optionally) ordered list and 469 an unordered list and generates a new list in the same order as the 470 first list. The new list contains the intersection of the members of 471 the two lists. 473 The "streamMux" operator takes an input bundle and an ordered list of 474 streams, and generates an output bundle where each output stream 475 contains at least and at most of the input streams muxed in 476 priority order. ( and are attributes which specify the 477 minimum and maximum number of streams respectively). This operator 478 also takes an attribute which indicates if the operator should 479 include input streams corresponding to the output stream's 480 participant. With these additional four operators you can build the 481 default multipoint video collection described in Section 4.2. A 482 client using these operators directly to create the same effect would 483 follow these steps. (Note that in most cases the correct "connector" 484 to use is implicit from the direction and type of the connection.) 486 1. Instantiate a streamMux operator with the following parameters: 487 n=1, m=1, exclusive=true. 489 2. Instantiate an allParticipants operator, a setIntersection 490 operator, and a videoSelectSpeakers operator. 492 3. Connect the video input group for this conference to the 493 allParticipants operator 495 4. Connect the audio input group for this conference to the 496 videoSpeakerSelection operator 498 5. Connect the allParticipants operator to the "unordered" input of 499 the setIntersection operator 501 6. Connect the videoSelectSpeakers operator to the "ordered" input 502 of the setIntersection operator 504 7. Connect the video input group for this conference to the 505 streamMux operator 507 8. Connect the (output of the) setIntersection operator to the 508 streamMux operator 510 9. Connect the streamMux operator to the video output group for this 511 conference 513 The "selectFloorHolders" operator takes an input bundle and a 514 mandatory attribute which names the floor, and generates an unordered 515 list of names of streams which have been granted the named floor. 516 With this additional operator you can build the floor controlled 517 audio collection in Section 4.3 and the floor controlled video 518 collection in Section 4.4. 520 The "volume" operator takes an audio bundle and generates an audio 521 bundle which has been adjusted to modify the volume of all streams 522 according to the attributes provided. Either a qualitative or 523 quantitative attribute can be provided. The quantitative attribute 524 is an integer percentage compared to the input volume. The 525 qualitative attributes are "normal", "soft", "softer", "very soft", 526 "loud", "louder", and "very loud". 528 The "audioMix" operator takes in one or more output bundles and 529 generates a new output bundle. This operator preserves tags. In 530 other words, the output bundle contains streams for each member in 531 the intersection of the participants in the input bundles. With these 532 additional two operators, you can build the audio sidebar collection 533 in Section 4.5 which addresses both sidebar and coaching scenarios. 535 The "tile" operator takes at least one input video bundle and an 536 ordered list of names of streams. It generates a video output bundle 537 where each output stream consists of tiled windows with a fixed 538 orientation and in priority order as described in Appendix A. One 539 attribute to this operator selects the number of tiles, and another 540 selects if the tile operator is an exclusive or non-exclusive mix. 541 If an exclusive operator is chosen, whenever a tile would display the 542 input of the current participant the next video source is selected 543 instead from the ordered list. Bundles can be connected to a 544 specific tile of the tile operator. For example, tile 4 may be 545 connected to a bundle which shows one of the current floor holders, 546 or to a stream corresponding to a named participant in an input 547 bundle. With this additional operator, you can build a fixed tile 548 continuous presence video layout. 550 Is there anyway to do this with one input bundle and set or list 551 manipulation? Possibly use weighted lists or position-based 552 manipulation? We should be able to use setSubtraction and/or 553 subSets to enable this functionality. 555 The "autotile" operator dynamically selects a number of tiles between 556 a minimum and maximum number of streams and incorporates them in a 557 tiled layout automatically. Like the tile operator, this operator can 558 be exclusive or non-exclusive and specific bundles may be connected 559 to specific tiles. With this additional operator, you can build the 560 an automatically tiled continuous presence video layout. 562 In addition to those operators just listed, future versions of this 563 document will contain additional standard operators. Some other 564 operators for consideration are listed below. 566 o textMux 568 o textMuxExclusive 570 o explicitList 572 o explicitWeightedList 574 o sortSet 576 o setIntersection 578 o setAddition 580 o setSubtraction 582 o subSet 584 o volumeWeighted 586 o smilLayout (apply a W3C SMIL stylesheet) 588 o textStylesheet 590 o xsltLayout 592 o selectExplicitParticipants 594 o containsContributor 596 o doesNotContainContributor 598 o crossFade 600 o invertSet 602 o playUrl 604 o selectLast 606 o selectLoudest 607 o selectLongest 609 o stereo2mono 611 o pan 613 o text2speech 615 o speech2text 617 o speech2gesture 619 o speech2signlanguage 621 4. More about Collections 623 To create a new collection, a client defines a list of "connectors" 624 which form the interface between the collection and external graphs. 625 These connectors are strongly typed as input or output bundles or 626 sets, and may be further restricted to media type. Then the 627 "interior" subgraph is created by connecting operators and these 628 connectors to each other. It is even possible to make use of existing 629 collections inside a collection, although this makes loop detection 630 more difficult for the server. Once a new collection is defined, the 631 XML description is stored on the conference policy server as a 632 collection template. These are stored in a context completely removed 633 from individual conferences. Templates persist until they are 634 removed. 636 Collections are instantiated just like operators. In some cases 637 however, the conference policy server may hide the internal structure 638 of a collection. Also, some conference policy servers may choose to 639 implement only collections (individual operators cannot be 640 instantiated). Conference policy server MUST implement at least one 641 standard collection for each media type they support. Of course they 642 MAY implement as many other standard or vendor-specific collections 643 as desired. 645 Below we list some of these standard collections. For each 646 collection we give a short textual description and describe the media 647 topology subgraph which describes the behavior of that collection. 649 o The basicAudioCollection (see Section 4.1) 651 o basicMpVideoCollection (see Section 4.2) 653 o sidebarAudioCollection (see Section 4.5) 654 o audioStreamSelectionCollection 656 o videoStreamSelectionCollection 658 o basicTextCollection 660 o textWithStylesheetCollection 662 o smilLayoutVideoCollection 664 o stereoAudioCollection 666 And a subset of these collections which are floor control enabled... 668 o audioWithFloorControlCollection (see Section 4.3) 670 o mpVideoWithFloorControlCollection (see Section 4.4) 672 o audioStreamSelectionWithFloorControlCollection 674 o videoStreamSelectionWithFloorControlCollection 676 o textWithFloorControlCollection 678 o textWithStylesheetWithFloorControlCollection 680 4.1 The Basic Audio Collection 682 683 684 686 688 689 690 691 692 693 694 695 696 697 698 699 700 702 703 704 705 706 707 708 709 710 711 712 714 4.2 Basic Video MP Collection 716 717 718 720 722 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 748 749 750 751 753 754 755 756 757 758 759 760 761 762 763 765 4.3 Basic Audio Collection with Floor Control 767 OPEN ISSUE: How do we pass parameters (like the name of the floor) 768 into the interior of a collection? 770 771 772 774 775 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 793 794 795 796 797 798 799 800 802 803 804 805 807 808 809 810 811 812 813 814 815 816 817 819 4.4 Basic Video Collection with Floor Control 821 822 823 825 827 828 830 831 832 833 834 835 836 837 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 857 858 859 860 861 862 863 864 865 866 867 868 870 871 872 873 874 875 876 877 878 879 880 882 4.5 Sidebar Audio Collection 883 884 885 887 889 890 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 927 5. Semantics 929 5.1 Transactions 931 Manipulations of a "live" media topology graph are performed as 932 transactions. This insures that the media graph transitions from one 933 consistent state to another. It should never be in a partially 934 connected or disconnected state. Loop detection is always performed 935 by the server before a transaction is accepted. 937 Note that operators are automatically deleted unless they have at 938 least one input connection and at least one output connection. As a 939 result, a transaction which instantiates an operator must connect it 940 to an input source and an output source during the same transaction, 941 otherwise adding the operator would have no effect. 943 A transaction encloses one or more topology graph manipulations which 944 must all succeed or all fail. Within the transaction, individual 945 steps consist of either creating or instantiating elements or 946 connecting them together. Note that there is an important 947 distinction between groups and aliases and collections and operators. 948 Groups and aliases are created (they don't exist before they are 949 created), while collections and operators are instantiated (a copy of 950 the original is placed in the media topology graph). 952 While nearly any RPC-style protocol could be used to express media 953 policy transactions, this document describes an XCAP [2] profile for 954 manipulating media policy. XCAP is a usage of HTTP [5] which uses 955 XPath [6] to address fragments of an XML document in the Request URI. 956 Two XML schemas are defined--one for managing collections for later 957 use, and another for real-time manipulation of media policy graphs. 959 Note that support for transactions is currently an open issue in 960 XCAP. 962 5.2 Client Behavior 964 To query the media policy for a particular conference, a client 965 merely fetches the media policy document (or document fragment) of 966 interest. In some cases the document will be filtered to remove 967 hidden or private information. Similarly, if the client is 968 authorized, it can view the internal structure of a collection 969 template by just fetching its definition document. When filtered, a 970 collection template may just describe the connectors associated with 971 it and a textual description. 973 A client connects a stream to a group merely by writing the stream 974 into the appropriate group structure in the target conference or 975 subconference. Likewise a client disconnects a stream by deleting the 976 stream from the appropriate group structure. The client permissions 977 determine if this request fails, requires confirmation from the 978 affected target, or succeeds immediately. Since a stream can only 979 exist in one group at a time, if a write operation succeeds and the 980 stream is already connected it results in a reassignment rather than 981 the same stream in multiple groups. 983 To instantiate a new operator or collection, just append an XML 984 fragment of code which describes the parameters for that operator to 985 the appropriate XPath (the operators or collections XPath). To make a 986 connection, just append the appropriate XML fragment describing that 987 connection to the connections XPath. Deleting an XPath, removes the 988 operation, collection, or connection. Once an connection is removed 989 this may cause one or more operations to be automatically deleted. 990 Likewise, when an operation is deleted, all its connections are 991 deleted as well. Just using these simple mechanisms allow authorized 992 clients to perform arbitrary manipulations of the media topology. 994 Finally, to create a new collection, the client writes an XML 995 description of the collection into the collectionTemplates XPath. 997 5.3 Server Behavior 999 Servers must maintain a list of all operator and collection types 1000 that can be used by Clients within a conference. Servers must return 1001 such a list to all authorized Clients in response to inventory 1002 queries. For operators and collections that have parameters, a list 1003 of acceptable parameter values must also be specified for each 1004 parameter. 1006 For each transaction received by the Server it must proceed with the 1007 steps that follow. For each request within the transaction the Server 1008 must verify that the party initiating the request is authorized to 1009 initiate this specific request in the context of the sub-conference 1010 specified within the request. If the initiator is not authorized, the 1011 Server must not execute any part of the transaction and return the 1012 appropriate "Authorization Failure" response to the initiator. An 1013 example if user A requests to connect the input audio stream of user 1014 B to group X in sub-conference "sidebar-1" and the output audio 1015 stream of user B to group Y in sub-conference "sidebar-1". The Server 1016 must verify that user A is authorized to manipulate the media policy 1017 of user B and is authorized to manipulate "sidebar-1". 1019 For each request the Server must verify that any changes in the media 1020 policy of any participant as a result of the execution of the request 1021 is authorized by the conference policy. If any party is not 1022 authorized for the media policy changes that result from the 1023 execution of any request within the transaction then the server must 1024 not execute any part of the transaction and return the appropriate 1025 "Authorization Failure" response to the initiator. In the example 1026 used in the previous point, the Server must verify that user B is 1027 authorized to join "sidebar-1". 1029 The Server should verify that all requests to instantiate, create 1030 and/or connect elements are conforming to the XML schema and 1031 descriptions of the elements. If any request does not conform to the 1032 XML schema of the elements that it is operating on then the Server 1033 must not execute any part of the transaction and return the 1034 appropriate "XML Schema Error" response to the initiator. For example 1035 an operator that takes one video input bundle can not be connected to 1036 an audio bundle. 1038 The Server should verify that all the relevant mixers have enough 1039 resources to perform the actual media processing required as a result 1040 of the execution of the transaction. If not enough resources are 1041 available the Server must not execute any part of the transaction and 1042 return the appropriate "No Available Resources" response to the 1043 initiator. Note that resources needed for trans-coding and 1044 trans-rating should be accounted for. Editor Note: More details and 1045 some examples need to be provided to explain this section and 1046 specifically the last bullet. 1048 5.4 Notifications of media policy changes 1050 Media topology changes should result in an appropriate 1051 protocol-specific notification to those (authorized) parties who have 1052 requested (subscribed for) them. In the case of SIP, this 1053 notification will be a notification from the SIP conference package, 1054 but will send an application/media-policy+xml MIME type in the 1055 notification body in addition to, or instead of the basic roster 1056 information normally provided by that event package. Note that the 1057 protocol should allow hidden transactions for which no notifications 1058 will be sent as a result of the media policy change. 1060 Editors Note: Need to describe how pending operations are handled 1061 with notifications. 1063 6. Formal Syntax 1065 Below is an XCAP encoding (using XML Schema) for media-topology 1066 manipulation of an active conference (or subconference): 1068 1069 1070 1071 1072 1073 1075 1077 1079 1081 1082 1083 1084 1085 1086 1088 1089 1090 1091 1092 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1136 1137 1138 1139 1140 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 1154 1155 1156 1158 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1196 And here is an XML schema for describing collection templates: 1198 1199 1200 1201 1202 1203 1205 1207 1209 1212 1213 1214 1215 1216 1217 1218 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1245 1246 1247 1248 1249 1251 1252 1253 1254 1255 1256 1257 1258 1259 1261 1262 1263 1264 1265 1267 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1305 7. Examples 1307 Below is a diagram which shows a sample media topology (with streams, 1308 collections, and groups) for an audio and video conference with an 1309 audio sidebar. 1311 Audio and Video Conference with one Audio Sidebar 1313 (streams) (streams) (streams) 1315 A B D E F H J A C D F G H I B E J 1316 | | | | | | | | | | | | | | | | | 1317 | | | | | | | | | | | | | | | | | 1318 V V V V V V V V V V V V V V V V V 1319 +------------------+ +------------------+ +-------------------+ 1320 | Main Video In | | Main Audio In | | Sidebar Audio Out | 1321 | (group) | | (group) | | (group) | 1322 +------------------+ +------------------+ +-------------------+ 1323 || // || || 1324 || // || +------+ || 1325 || // || |+----+| || 1326 || // || || || || 1327 \/ // \/ || \/ \/ 1328 ...................V. ................... || .................. 1329 : : : : || : : 1330 : : : : || : : 1331 : vendor : : standard : || : standard : 1332 : defined : : conference : || : sidebar : 1333 : video : : audio : || : audio : 1334 : collection : : collection : || : collection : 1335 : : : : || : : 1336 : : : : || : : 1337 ..................... ................... || .................. 1338 || || || || || 1339 || || |+---+| || 1340 || || +-----+ || 1341 \/ \/ \/ 1342 +------------------+ +------------------+ +-------------------+ 1343 | Main Video Out | | Main Audio Out | | Sidebar Audio Out | 1344 | (group) | | (group) | | (group) | 1345 +------------------+ +------------------+ +-------------------+ 1346 | | | | | | | | | | | | | | | | | | 1347 | | | | | | | | | | | | | | | | | | 1348 V V V V V V V V V V V V V V V V V V 1349 A B C D E F H J A C D F G H I B E J 1351 (streams) (streams) (streams) 1353 Here we have the media topologies description documents for the 1354 combined audio/video conference in the figure above. The first media 1355 topology is for the main conference, and the second is for the 1356 subconference used by the audio sidebar. Specific streams are 1357 omitted for brevity. 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1390 Below is the media topology description document for the 1391 subconference. Note that conf=".." refers to the parent of the 1392 current conference 1394 1395 1396 1397 1398 1399 1400 1402 1403 1404 1405 1406 1408 1409 1410 1411 1413 1414 1415 1416 1417 1418 1419 1421 8. Security Considerations 1423 Much needs to be written here. Authorization rules will be discussed 1424 in Section 5.3. Privacy and filtering rules will be discussed there 1425 as well. 1427 9. IANA Considerations 1429 This document defines an IANA registry of Media Operators, and 1430 another of Media Collections. 1432 10. Acknowledgments 1434 This work was the result of discussions among the SIP Conferencing 1435 Design Team. 1437 Normative References 1439 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1440 Levels", BCP 14, RFC 2119, March 1997. 1442 [2] Rosenberg, J., "The Extensible Markup Language (XML) 1443 Configuration Access Protocol (XCAP)", 1444 draft-rosenberg-simple-xcap-00 (work in progress), May 2003. 1446 [3] Even, R., Levin, O. and N. Ismail, "Conferencing media policy 1447 Requirements", draft-even-xcon-media-policy-requirements-00.txt 1448 (work in progress), June 2003. 1450 [4] Koskelainen, P. and H. Khartabil, "XCAP Usage for Conference 1451 Policy Manipulation", 1452 draft-koskelainen-xcon-xcap-cpcp-usage-00.txt (work in 1453 progress), June 2003. 1455 [5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., 1456 Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- 1457 HTTP/1.1", RFC 2616, June 1999. 1459 [6] Clark, J. and S. DeRose, "XML Path Language (XPath) Version 1460 1.0", W3C Recommendation xpath, November 1999, . 1463 [7] Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML 1464 Schema Part 1: Structures", W3C REC-xmlschema-1, May 2001, 1465 . 1467 [8] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1468 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 1469 Session Initiation Protocol", RFC 3261, June 2002. 1471 [9] Jacobson, V., Perkins, C. and M. Handley, "SDP: Session 1472 Description Protocol", draft-ietf-mmusic-sdp-new-13 (work in 1473 progress), May 2003. 1475 [10] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1476 Session Description Protocol (SDP)", RFC 3264, June 2002. 1478 [11] Camarillo, G., Eriksson, G., Holler, J. and H. Schulzrinne, 1479 "Grouping of Media Lines in the Session Description Protocol 1480 (SDP)", RFC 3388, December 2002. 1482 [12] International Telecommunications Union, "CONTROL PROTOCOL FOR 1483 MULTIMEDIA COMMUNICATION", ITU Recommendation H.245, 1998. 1485 Informational References 1487 [13] Rosenberg, J., "A Framework for Conferencing with the Session 1488 Initiation Protocol", 1489 draft-ietf-sipping-conferencing-framework-00 (work in 1490 progress), May 2003. 1492 [14] Levin, O. and R. Even, "High Level Requirements for Tightly 1493 Coupled SIP Conferencing", 1494 draft-ietf-sipping-conferencing-requirements-00 (work in 1495 progress), April 2003. 1497 [15] Rosenberg, J. and H. Schulzrinne, "A Session Initiation 1498 Protocol (SIP) Event Package for Conference State", 1499 draft-ietf-sipping-conference-package-00 (work in progress), 1500 June 2002. 1502 [16] Even, R. and N. Ismail, "Conferencing Scenarios", 1503 draft-even-xcon-conference-scenarios-00.txt (work in progress), 1504 June 2003. 1506 [17] Koskelainen, P., "Floor Control Requirements", 1507 draft-koskelainen-xcon-floor-control-reqs-00.txt (work in 1508 progress), June 2003. 1510 Authors' Addresses 1512 Rohan Mahy 1513 Cisco Systems, Inc. 1514 5617 Scotts Valley Drive 1515 Scotts Valley, CA 95066 1516 USA 1518 EMail: rohan@cisco.com 1520 Nermeen Ismail 1521 Cisco Systems, Inc. 1522 170 W Tasman Dr 1523 San Jose, CA 95134 1524 USA 1526 EMail: nismail@cisco.com 1528 Appendix A. Standard Tile Order 1529 Intellectual Property Statement 1531 The IETF takes no position regarding the validity or scope of any 1532 intellectual property or other rights that might be claimed to 1533 pertain to the implementation or use of the technology described in 1534 this document or the extent to which any license under such rights 1535 might or might not be available; neither does it represent that it 1536 has made any effort to identify any such rights. Information on the 1537 IETF's procedures with respect to rights in standards-track and 1538 standards-related documentation can be found in BCP-11. Copies of 1539 claims of rights made available for publication and any assurances of 1540 licenses to be made available, or the result of an attempt made to 1541 obtain a general license or permission for the use of such 1542 proprietary rights by implementors or users of this specification can 1543 be obtained from the IETF Secretariat. 1545 The IETF invites any interested party to bring to its attention any 1546 copyrights, patents or patent applications, or other proprietary 1547 rights which may cover technology that may be required to practice 1548 this standard. Please address the information to the IETF Executive 1549 Director. 1551 Full Copyright Statement 1553 Copyright (C) The Internet Society (2004). All Rights Reserved. 1555 This document and translations of it may be copied and furnished to 1556 others, and derivative works that comment on or otherwise explain it 1557 or assist in its implementation may be prepared, copied, published 1558 and distributed, in whole or in part, without restriction of any 1559 kind, provided that the above copyright notice and this paragraph are 1560 included on all such copies and derivative works. However, this 1561 document itself may not be modified in any way, such as by removing 1562 the copyright notice or references to the Internet Society or other 1563 Internet organizations, except as needed for the purpose of 1564 developing Internet standards in which case the procedures for 1565 copyrights defined in the Internet Standards process must be 1566 followed, or as required to translate it into languages other than 1567 English. 1569 The limited permissions granted above are perpetual and will not be 1570 revoked by the Internet Society or its successors or assignees. 1572 This document and the information contained herein is provided on an 1573 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1574 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1575 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1576 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1577 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1579 Acknowledgement 1581 Funding for the RFC Editor function is currently provided by the 1582 Internet Society.