idnits 2.17.1 draft-rosenberg-sipping-conferencing-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 12, 2003) is 7745 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 1889 (ref. '2') (Obsoleted by RFC 3550) -- Obsolete informational reference (is this intentional?): RFC 3265 (ref. '4') (Obsoleted by RFC 6665) -- Obsolete informational reference (is this intentional?): RFC 2396 (ref. '8') (Obsoleted by RFC 3986) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIPPING WG 3 Internet Draft J. Rosenberg 4 dynamicsoft 5 draft-rosenberg-sipping-conferencing-framework-01.txt 6 February 12, 2003 7 Expires: August 2003 9 A Framework for Conferencing with the Session Initiation Protocol 11 STATUS OF THIS MEMO 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress". 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 To view the list Internet-Draft Shadow Directories, see 30 http://www.ietf.org/shadow.html. 32 Abstract 34 The Session Initiation Protocol (SIP) supports the initiation, 35 modification, and termination of media sessions between user agents. 36 These sessions are managed by SIP dialogs, which represent a SIP 37 relationship between a pair of user agents. Because dialogs are 38 between pairs of user agents, SIP's usage for two-party 39 communications (such as a phone call), is obvious. Communications 40 sessions with multiple participants, generally known as conferencing, 41 are more complicated. This document defines a framework for how such 42 conferencing can occur. This framework describes the overall 43 architecture, terminology, and protocol components needed for multi- 44 party conferencing. 46 Table of Contents 48 1 Introduction ........................................ 4 49 2 Terminology ......................................... 4 50 3 Overview of Conferencing Architecture ............... 7 51 3.1 Usage of URIs ....................................... 10 52 4 Functions of the Elements ........................... 12 53 4.1 Focus ............................................... 12 54 4.2 Conference Policy Server ............................ 13 55 4.3 Mixers .............................................. 14 56 4.4 Conference Notification Service ..................... 15 57 4.5 Participants ........................................ 15 58 4.6 Conference Policy ................................... 15 59 5 Common Operations ................................... 16 60 5.1 Creating Conferences ................................ 16 61 5.1.1 SIP Mechanisms ...................................... 17 62 5.1.2 CPCP Mechanisms ..................................... 18 63 5.1.3 Non-Automated Mechanisms ............................ 18 64 5.2 Adding Participants ................................. 18 65 5.2.1 SIP Mechanisms ...................................... 18 66 5.2.2 CPCP Mechanisms ..................................... 18 67 5.2.3 Non-Automated Mechanisms ............................ 19 68 5.3 Conditional Joins ................................... 19 69 5.4 Removing Participants ............................... 19 70 5.4.1 SIP Mechanisms ...................................... 19 71 5.4.2 CPCP Mechanisms ..................................... 20 72 5.4.3 Non-Automated Mechanisms ............................ 20 73 5.5 Approving Policy Changes ............................ 20 74 5.6 Creating Sidebars ................................... 22 75 5.7 Destroying Conferences .............................. 23 76 5.7.1 SIP Mechanisms ...................................... 23 77 5.7.2 CPCP Mechanisms ..................................... 23 78 5.7.3 Non-Automated Mechanisms ............................ 23 79 5.8 Obtaining Membership ................................ 24 80 5.8.1 SIP Mechanisms ...................................... 24 81 5.8.2 CPCP Mechanisms ..................................... 24 82 5.8.3 Non-Automated Mechanisms ............................ 24 83 5.9 Adding and Removing Media ........................... 24 84 5.9.1 SIP Mechanisms ...................................... 25 85 5.9.2 CPCP Mechanisms ..................................... 25 86 5.9.3 Non-Automated Mechanisms ............................ 25 87 5.10 Conference Announcements and Recordings ............. 25 88 5.11 Floor Control ....................................... 27 89 5.12 Camera and Video Controls ........................... 27 90 6 Physical Realization ................................ 28 91 6.1 Centralized Server .................................. 28 92 6.2 Endpoint Server ..................................... 28 93 6.3 Media Server Component .............................. 28 94 6.4 Distributed Mixing .................................. 31 95 6.5 Cascaded Mixers ..................................... 33 96 7 Security Considerations ............................. 33 97 8 Contributors ........................................ 33 98 9 Changes since draft-rosenberg-sipping- 99 conferencing-framework-00 ...................................... 35 100 10 Authors Addresses ................................... 35 101 11 Normative References ................................ 35 102 12 Informative References .............................. 35 104 1 Introduction 106 The Session Initiation Protocol (SIP) [1] supports the initiation, 107 modification, and termination of media sessions between user agents. 108 These sessions are managed by SIP dialogs, which represent a SIP 109 relationship between a pair of user agents. Because dialogs are 110 between pairs of user agents, SIP's usage for two-party 111 communications (such as a phone call), is obvious. Communications 112 sessions with multiple participants, however, are more complicated. 113 SIP can support many models of multi-party communications. One, 114 referred to as loosely coupled conferences, makes use of multicast 115 media groups. In the loosely coupled model, there is no signaling 116 relationship between participants in the conference. There is no 117 central point of control or conference server. Participation is 118 gradually learned through control information that is passed as part 119 of the conference (using the Real Time Control Protocol (RTCP) [2], 120 for example). Loosely coupled conferences are easily supported in SIP 121 by using multicast addresses within its session descriptions. 123 In another model, referred to as fully distributed multiparty 124 conferencing, each participant maintains a signaling relationship 125 with each other participant, using SIP. There is no central point of 126 control; it is completely distributed amongst the participants. This 127 model is outside the scope of this document. 129 In another model, sometimes referred to as the tightly coupled 130 conference, there is a central point of control. Each participant 131 connects to this central point. It provides a variety of conference 132 functions, and may possibly perform media mixing functions as well. 133 Tightly coupled conferences are not directly addressed by RFC 3261, 134 although basic participation is possible without any additional 135 protocol support. 137 This document is one of a series of specifications that discusses 138 tightly coupled conferences. Here, we present the overall framework 139 for tightly coupled conferencing, referred to simply as 140 "conferencing" from this point forward. This framework presents a 141 general architectural model for these conferences, presents 142 terminology used to discuss such conferences, and describes the sets 143 of protocols involved in a conference. The aim of the framework is to 144 meet the general requirements for conferencing that are outlined in 145 [3]. 147 2 Terminology 149 Conference: Conference is an overused term which has different 150 meanings in different contexts. In SIP, a conference is an 151 instance of a multi-party conversation. Within the context 152 of this specification, a conference is always a tightly 153 coupled conference. 155 Loosely Coupled Conference: A loosely coupled conference is a 156 conference without coordinated signaling relationships 157 amongst participants. Loosely coupled conferences 158 frequently use multicast for distribution of conference 159 memberships. 161 Tightly Coupled Conference: A tightly coupled conference is a 162 conference in which a single user agent, referred to as a 163 focus, maintains a dialog with each participant. The focus 164 plays the role of the centralized manager of the 165 conference, and is addressed by a conference URI. 167 Focus: The focus is a SIP user agent that is addressed by a 168 conference URI and identifies a conference (recall that a 169 conference is a unique instance of a multi-party 170 conversation). The focus maintains a SIP signaling 171 relationship with each participant in the conference. The 172 focus is responsible for ensuring, in some way, that each 173 participant receives the media that make up the conference. 174 The focus also implements conference policies. The focus is 175 a logical role. 177 Conference URI: A URI, usually a SIP URI, which identifies the 178 focus of a conference. 180 Participant: The software element that connects a user or 181 automata to a conference. It implements, at a minimum, a 182 SIP user agent, but may also include a conference policy 183 control protocol client, for example. 185 Conference Notification Service: A conference notification 186 service is a logical function provided by the focus. The 187 focus can act as a notifier [4], accepting subscriptions to 188 the conference state, and notifying subscribers about 189 changes to that state. The state includes the state 190 maintained by the focus itself, the conference policy, and 191 the media policy. 193 Conference Policy Server: A conference policy server is a 194 logical function which can store and manipulate the 195 conference policy. The conference policy is the overall set 196 of rules governing operation of the conference. It is 197 broken into membership policy and media policy. Unlike the 198 focus, there is not an instance of the conference policy 199 server for each conference. Rather, there is an instance of 200 the membership and media policies for each conference. 202 Conference Policy: The complete set of rules manipulated by the 203 conference policy server. It includes the membership policy 204 and the media policy. 206 Membership Policy: A set of rules manipulated by the conference 207 policy server regarding participation in the conference. 208 These rules include directives on the lifespan of the 209 conference, who can and cannot join the conference, 210 definitions of roles available in the conference and the 211 responsibilities associated with those roles, and policies 212 on who is allowed to request which roles. 214 Media Policy: A set of rules manipulated by the conference 215 policy server regarding the media composition of the 216 conference. The media policy is used by the focus to 217 determine the mixing characteristics for the conference. 218 The media policy includes rules about which participants 219 receive media from which other participants, and the ways 220 in which that media is combined for each participant. In 221 the case of audio, these rules can include the relative 222 volumes at which each participant is mixed. In the case of 223 video, these rules can indicate whether the video is tiled, 224 whether the video indicates the loudest speaker, and so on. 226 Conference Policy Control Protocol (CPCP): The protocol used by 227 clients to manipulate the conference policy. 229 Mixer: A mixer receives a set of media streams of the same type, 230 and combines their media in a type-specific manner, 231 redistributing the result to each participant. This 232 includes media transported using RTP [2]. As a result, the 233 term defined here is a superset of the mixer concept 234 defined in RFC 1889, since it allows for non-RTP-based 235 media such as instant messaging sessions [5]. 237 Conference-Unaware Participant: A conference-unaware participant 238 is a participant in a conference that is not aware that it 239 is actually in a conference. As far as the UA is concerned, 240 it is a point-to-point call. 242 Cascaded Conferencing: A mechanism for group communications in 243 which a set of conferences are linked by having their 244 focuses interact in some fashion. 246 Simplex Cascaded Conferences: a group of conferences which are 247 linked such that the user agent which represents the focus 248 of one conference is a conference-unaware participant in 249 another conference. 251 Conference-Aware Participant: A conference-aware participant is 252 a participant in a conference that has learned, through 253 automated means, that it is in a conference, and that can 254 use a conference policy control protocol, media policy 255 control protocol, or conference subscription, to implement 256 advanced functionality. 258 Conference Server: A conference server is a physical server 259 which contains, at a minimum, the focus. It may also 260 include a conference policy server and mixers. 262 Mass Invitation: A conference policy control protocol request to 263 invite a large number of users into the conference. 265 Mass Ejection: A conference policy control protocol request to 266 remove a large number of users from the conference. 268 Sidebar: A sidebar appears to the users within the sidebar as a 269 "conference within the conference". It is a conversation 270 amongst a subset of the participants to which the remaining 271 participants are not privy. 273 Anonymous Participant: An anonymous participant is one that is 274 known to other participants through the conference 275 notification service, but whose identity is being withheld. 277 Hidden Participant: A hidden participant is one that is not 278 known to other participants in the conference. They may be 279 known to the moderator, depending on conference policy. 281 3 Overview of Conferencing Architecture 283 The central component (literally) in a SIP conference is the focus. 284 The focus maintains a SIP signaling relationship with each 285 participant in the conference. The result is a star topology, shown 286 in Figure 1. 288 The focus is responsible for making sure that the media streams which 289 constitute the conference are available to the participants in the 290 conference. It does that through the use of one or more mixers, each 291 of which combines a number of input media streams to produce one or 292 more output media streams. The focus uses the media policy to 293 determine the proper configuration of the mixers. 295 +-----------+ 296 | | 297 | | 298 |Participant| 299 | 4 | 300 | | 301 +-----------+ 302 | 303 |SIP 304 |Dialog 305 |4 306 | 307 +-----------+ +-----------+ +-----------+ 308 | | | | | | 309 | | | | | | 310 |Participant|-----------| Focus |------------|Participant| 311 | 1 | SIP | | SIP | 3 | 312 | | Dialog | | Dialog | | 313 +-----------+ 1 +-----------+ 3 +-----------+ 314 | 315 | 316 |SIP 317 |Dialog 318 |2 319 | 320 +-----------+ 321 | | 322 | | 323 |Participant| 324 | 2 | 325 | | 326 +-----------+ 328 Figure 1: SIP Conference Architecture 330 The focus has access to the conference policy (composed of the 331 membership and media policies), an instance of which exist for each 332 conference. Effectively, the conference policy can be thought of as a 333 database which describes the way that the conference should operate. 334 It is the responsibility of the focus to enforce those policies. Not 335 only does the focus need read access to the database, but it needs to 336 know when it has changed. Such changes might result in SIP signaling 337 (for example, the ejection of a user from the conference using BYE), 338 and most changes will require a notification to be sent to 339 subscribers using the conference notification service. 341 The conference is represented by a URI, which identifies the focus. 342 Each conference has a unique focus and a unique URI identifying that 343 focus. Requests to the conference URI are routed to the focus for 344 that specific conference. 346 Users usually join the conference by sending an INVITE to the 347 conference URI. As long as the conference policy allows, the INVITE 348 is accepted by the focus and the user is brought into the conference. 349 Users can leave the conference by sending a BYE, as they would in a 350 normal call. 352 Similarly, the focus can terminate a dialog with a participant, 353 should the conference policy change to indicate that the participant 354 is no longer allowed in the conference. A focus can also initiate an 355 INVITE, should the conference policy indicate that the focus needs to 356 bring a participant into the conference. 358 The notion of a conference-unaware participant is important in this 359 framework. A conference-unaware participant does not even know that 360 the UA it is communicating with happens to be a focus. As far as its 361 concerned, its a UA just like any other. The focus, of course, knows 362 that its a focus, and it performs the tasks needed for the conference 363 to operate. 365 Conference-unaware participants have access to a good deal of 366 functionality. They can join and leave conferences using SIP, and 367 obtain more advanced features through stimulus signaling, as 368 discussed in [6]. However, if the participant wishes to explicitly 369 control aspects of the conference using functional signaling 370 protocols, the participant must be conference-aware. 372 A conference-aware participant is one that has access to advanced 373 functionality through additional protocol interfaces. The client uses 374 these protocols to interact with the conference policy server and the 375 focus. A model for this interaction is shown in Figure 2. The 376 participant can interact with the focus using extensions, such as 377 REFER, in order to access enhanced call control functions [7]. The 378 participant can SUBSCRIBE to the conference URI, and be connected to 379 the conference notification service provided by the focus. Through 380 this mechanism, it can learn about changes in participants 381 (effectively, the state of the dialogs), the media policy, and the 382 membership policy. 384 The participant can communicate with the conference policy server 385 using a conference policy control protocol. Through this protocol, it 386 can affect the conference policy. The conference policy server need 387 not be available in any particular conference, although there is 388 always a conference policy. 390 The interfaces between the focus and the conference policy, and the 391 conference policy server and the conference policy, are not subject 392 to standardization at the time of this writing. They are intended 393 primarily to show the logical roles involved in a conference, as 394 opposed to suggesting a physical decomposition. The separation of 395 these functions is documented here to encourage clarity in the 396 requirements and to allow individual implementations the flexibility 397 to compose a conferencing system in a scalable and robust manner. 399 3.1 Usage of URIs 401 It is fundamental to this framework that a conference is uniquely 402 identified by a URI, and that this URI identifies the focus which is 403 responsible for the conference. The conference URI is unique, such 404 that no two conferences have the same conference URI. A conference 405 URI is always a SIP or SIPS URI. 407 The conference URI is opaque to any participants which might use it. 408 There is no way to look at the URI, and know for certain whether it 409 identifies a focus, as opposed to a user or an interface on a PSTN 410 gateway. This is in line with the general philosophy of URI usage 411 [8]. However, contextual information surrounding the URI (for 412 example, SIP header parameters) may indicate that the URI represents 413 a conference. 415 When a SIP request is sent to the conference URI, that request is 416 routed to the focus, and only to the focus. The element or system 417 that creates the conference URI is responsible for guaranteeing this 418 property. 420 The conference URI can represent a long-lived conference or interest 421 group, such as "sip:discussion-on-dogs@example.com". The focus 422 identified by this URI would always exist, and always be managing the 423 conference for whatever participants are currently joined. Other 424 conference URIs can represent short-lived conferences, such as an 425 ad-hoc conference. 427 Ideally, a conference URI is never constructed or guessed by a user. 429 ..................................... 430 . . 431 . . 432 . . 433 . . 434 . Conference . 435 . Policy . 436 Conference . . 437 Policy . +-----------+ //-----\\ . 438 Control . | | || || . 439 Protocol . | Conference| \\-----// . 440 +---------------->| Policy | | | . 441 | . | Server |----> |Membership . 442 | . | | | | . 443 | . +-----------+ | & | . 444 | . | | . 445 | . | Media | . 446 +-----------+ . +-----------+ | Policy| . 447 | | . | | \ // . 448 | | . | | \-----/ . 449 |Participant|<--------->| Focus | | . 450 | | SIP . | | | . 451 | | Dialog . | |<-----------+ . 452 +-----------+ . |...........| . 453 ^ . | Conference| . 454 | . |Notification . 455 +------------>| Service | . 456 Subscription. +-----------+ . 457 . . 458 . . 459 . . 460 . . 461 ..................................... 463 Conference 464 Functions 466 Figure 2: Conference-Aware Participant 467 Rather, conference URIs are learned through many mechanisms. A 468 conference URI can be emailed or sent in an instant message. A 469 conference URI can be linked on a web page. A conference URI can be 470 obtained from a conference policy control protocol, which can be used 471 to create conferences and the policies associated with them. 473 To determine that a SIP URI does represent a focus, standard 474 techniques for URI capability discovery can be used. Specifically, 475 the caller preferences specification [9] provides the "isfocus" 476 feature tag to indicate that the URI is a focus. Caller preferences 477 parameters are also used to indicate that a focus supports the 478 conference notification service. This is done by declaring support 479 for the SUBSCRIBE method and the relevant package(s) in the caller 480 preferences feature parameters associated with the conference URI. 482 The other functions in a conference are also represented by URIs. If 483 the conference policy server is implemented through web pages, this 484 server is identified by HTTP URIs. If it is accessed using an 485 explicit protocol, it is a URI defined for that protocol. 487 Starting with the conference URI, the URIs for the other logical 488 entities in the conference can be learned using the conference 489 notification service. 491 4 Functions of the Elements 493 This section gives a more detailed description of the functions 494 typically implemented in each of the elements. 496 4.1 Focus 498 As its name implies, the focus is the center of the conference. All 499 participants in the conference are connected to it using a SIP 500 dialog. The focus is responsible for maintaining the dialogs 501 connected to it. It ensures that the dialogs are connected to a set 502 of participants who are allowed to participate in the conference, as 503 defined by the membership policy. The focus also uses SIP to 504 manipulate the media sessions, in order to make sure each participant 505 obtains all the media for the conference. To do that, the focus makes 506 use of mixers. 508 When a focus receives an INVITE, it checks the membership policy. The 509 membership policy might indicate that this participant is not allowed 510 to join, in which case the call can be rejected. It might indicate 511 that another participant, acting as a moderator, needs to approve 512 this new participant. In that case, the INVITE might be parked on a 513 music-on-hold server, or a 183 response might be sent to indicate 514 progress. A notification, using the conference notification service, 515 would be sent to the moderator. The moderator then has the ability to 516 manipulate the policies using the conference policy control protocol. 517 If the policies are changed to allow this new participant, the focus 518 can accept the INVITE (or unpark it from the music-on-hold server). 519 The interpretation of the membership policy by the focus is, itself, 520 a matter of local policy, and not subject to standardization. 522 If a participant manipulated the membership policy to indicate that a 523 certain other participant was no longer allowed in the conference, 524 the focus would send a BYE to that other participant to remove them. 525 This is often referred to as "ejecting" a user from the conference. 526 The process of ejecting fundamentally constitutes these two steps - 527 the establishment of the policy through the conference policy 528 protocol, and the implementation of that policy (using a BYE) by the 529 focus. 531 Similarly, if a user manipulated the membership policy to indicate 532 that a number of users need to be added to the conference, the focus 533 would send an INVITE to those participants. This is often referred to 534 as the "mass invitation" function. As with ejection, it is 535 fundamentally composed of the policy functions that specify the 536 participants which should be present, and the implementation of those 537 functions. A policy request to add a set of users might not require 538 an INVITE to execute it; those users might already be participants in 539 the conference. 541 A similar model exists for media policy. If the media policy 542 indicates that a participant should not receive any video, the focus 543 might implement that policy by sending a re-INVITE, removing the 544 media stream to that participant. Alternatively, if the video is 545 being centrally mixed, it could inform the mixer to send a black 546 screen to that participant. The means by which the policy is 547 implemented are not subject to specification. 549 4.2 Conference Policy Server 551 The conference policy server allows clients to manipulate and 552 interact with the conference policy. The conference policy is used by 553 the focus to make authorization decisions and guide its overall 554 behavior. Logically speaking, there is a one-to-one mapping between a 555 conference policy and a focus. 557 The conference policy is represented by a URI. There is a unique 558 conference policy for each conference. The conference policy URI 559 points to a conference policy server which can manipulate that 560 conference policy. A conference policy server also has a "top level" 561 URI which can be used to access functions that are independent of any 562 conference. Perhaps the most important of these functions is the 563 creation of a new conference. Creation of a new conference will 564 result in the construction of a new focus and a corresponding 565 conference URI, which can then be used to join the conference itself, 566 along with a media policy and conference policy. 568 The conference policy server is accessed using a client-server 569 transactional protocol. The client can be a participant in the 570 conference, or it can be a third party. Access control lists for who 571 can modify a conference policy are themselves part of the conference 572 policy. 574 The conference policy server is responsible for reconciliation of 575 potentially conflicting requests regarding the policy for the 576 conference. 578 The client of the conference policy control protocol can be any 579 entity interested in manipulating the conference policy. Clearly, 580 participants might be interested in manipulating them. A participant 581 might want to raise or lower the volume for one of the other 582 participants it is hearing. Or, a participant might want to add a 583 user to the conference. 585 A client of the conference policy protocol could also be another 586 server whose job is to determine the conference policy. As an 587 example, a floor control server is responsible for determining which 588 participant(s) in a conference are allowed to speak at any given 589 time, based on participant requests and access rules. The floor 590 control server would act as a client of the conference policy server, 591 and change the media policy based on who is allowed to speak. 593 The client of the conference policy control protocol could also be 594 another conference policy server. 596 4.3 Mixers 598 A mixer is responsible for combining the media streams that make up 599 the conference, and generating one or more output streams that are 600 distributed to recipients (which could be participants or other 601 mixers). The process of combining media is specific to the media 602 type, and is directed by the focus, under the guidance of the rules 603 described in the media policy. 605 A mixer is not aware of a "conference" as an entity, per se. A mixer 606 receives media streams as inputs, and based on directions provided by 607 the focus, generates media streams as outputs. There is no grouping 608 of media streams beyond the policies that describe the ways in which 609 the streams are mixed. 611 A mixer is always under the control of a focus. The focus is 612 responsible for interpreting the media policy, and then installing 613 the appropriate rules in the mixer. If the focus is directly 614 controlling a mixer, the mixer can either be co-resident with the 615 focus, or can be controlled through some kind of protocol. 617 However, a focus need not directly control a mixer. Rather, a focus 618 can delegate the mixing to the participants, each of which has their 619 own mixer. This is described in Section 6.4. 621 4.4 Conference Notification Service 623 The focus can provide a conference notification service. In this 624 role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts 625 subscriptions from clients for the conference URI, and generates 626 notifications to them as the state of the conference changes. 628 This state is composed of two separate pieces. The first is the state 629 of the focus and the second is the conference policy. 631 The state of the focus includes the participants connected to the 632 focus, and information about the dialogs associated with them. As new 633 participants join, this state changes, and is reported through the 634 notification service. Similarly, when someone leaves, this state also 635 changes, allowing subscribers to learn about this fact. 637 As described previously, the conference policy includes the 638 membership policy and the media policy. As those policies change, due 639 to usage of the CPCP, direct change by the focus, or through an 640 application, the conference notification service informs subscribers 641 of these changes. 643 4.5 Participants 645 A participant in a conference is any SIP user agent that has a dialog 646 with the focus. This SIP user agent can be a PC application, a SIP 647 hardphone, or a PSTN gateway. It can also be another focus. A 648 conference which has a participant that is the focus of another 649 conference is called a simplex cascaded conference. They can also be 650 used to provide scalable conferences where there are regional sub- 651 conferences, each of which is connected to the main conference. 653 4.6 Conference Policy 655 The conference policy contains the rules that guide the operation of 656 the focus. The rules can be simple, such as an access list that 657 defines the set of allowed participants in a conference. The rules 658 can also be incredibly complex, specifying time-of-day based rules on 659 participation conditional on the presence of other participants. It 660 is important to understand that there is no restriction on the type 661 of rules that can be encapsulated in a conference policy. 663 The conference policy can be manipulated using web applications or 664 voice applications. It can also be manipulated with proprietary 665 protocols. However, the conference policy control protocol can be 666 used as a standardized means of manipulating the conference policy. 667 By the nature of conference policies, not all aspects of the policy 668 can be manipulated with the conference policy control protocol. 670 The conference policy includes the membership policy and the media 671 policy. The membership policy includes per-participant policies that 672 specify how the focus is to handle a particular participant. These 673 include whether or not the participant is anonymous, for example. 675 The media policy describes the way in which the set of inputs to a 676 mixer are combined to generate the set of outputs. Media policies can 677 span media types. In other words, the policy on how one media stream 678 is mixed can be based on characteristics of other media streams. 679 Media policies can be based on any quantifiable characteristic of the 680 media stream (its source, volume, codecs, speaking/silence, etc.), 681 and they can be based on internal or external variables accessible by 682 the media policy. 684 Some examples of media policies include: 686 o The video output is the picture of the loudest speaker (video 687 follows audio). 689 o The audio from each participant will be mixed with equal 690 weight, and distributed to all other participants. 692 o The audio and video that is distributed is the one selected by 693 the floor control server. 695 5 Common Operations 697 There are a large number of ways in which users can interact with a 698 conference. They can join, leave, set policies, approve members, and 699 so on. This section is meant as an overview of the major conferencing 700 operations, summarizing how they operate. More detailed examples of 701 the SIP mechanisms can be found in [7]. 703 5.1 Creating Conferences 705 There are many ways in which a conference can be created. The 706 creation of a conference actually constructs several elements all at 707 the same time. It results in the creation of a focus and a conference 708 policy. It also results in the construction of a conference URI, 709 which uniquely identifies the focus. Since the conference URI needs 710 to be unique, the element which creates conferences is responsible 711 for guaranteeing that uniqueness. This can be accomplished 712 deterministically, by keeping records of conference URIs, or 713 probabilistically, by creating random URI with sufficiently low 714 probabilities of collision. 716 When a media and conference policy are created, they are established 717 with default rules that are implementation dependent. If the creator 718 of the conference wishes to change those rules, they would do so 719 using the conference policy control protocol (CPCP), for example. 721 Of course, using the CPCP requires that an element know the URI for 722 manipulating the policy. That requires a means to learn the 723 conference policy URI from the conference URI, since the conference 724 URI is frequently the sole result returned to the client as a result 725 of conference creation. Any other URIs associated with the conference 726 are learned through the conference notification service. They are 727 carried as elements in the notifications. 729 5.1.1 SIP Mechanisms 731 One way to create a conference is through a conferencing application. 732 As an example, a user can send an INVITE request to 733 sip:conferences@service.com. This URI identifies an IVR application 734 which interacts with the user, collects information about the desired 735 conference, and creates it. The user can then be placed into their 736 newly created conference. 738 Creation of conferences where the focus resides in an endpoint 739 operates differently. There, the endpoint itself creates the 740 conference URI, and hands it out to other endpoints which are to be 741 the participants. What differs from case to case is how the endpoint 742 decides to create a conference. 744 One important case is the ad-hoc conference described in Section 6.2. 745 There, an endpoint unilaterally decides to create the conference 746 based on local policy. The dialogs that were connected to the UA are 747 migrated to the endpoint-hosted focus, using a re-INVITE to pass the 748 conference URI to the newly joined participants. 750 Alternatively, one UA can ask another UA to create an endpoint-hosted 751 conference. This is accomplished with the SIP Join header [10]. The 752 UA which receives the Join header in an invitation may need to create 753 a new conference URI (a new one is not needed if the dialog that is 754 being joined is already part of a conference). The conference URI is 755 then handed to the recently joined participants through a re-INVITE. 757 5.1.2 CPCP Mechanisms 759 Another way to create a conference is through interaction with the 760 conference policy server. Using the conference policy control 761 protocol, a client can instruct the conference policy server to 762 create a new conference and return the conference URI and conference 763 policy URI. 765 5.1.3 Non-Automated Mechanisms 767 Of course, a user can also create conferences by interacting with a 768 web server. The web server would prompt the user for the neccessary 769 information (start and stop times of the conference, participants, 770 etc.) and return the conference URI to the user. The user would copy 771 this URI into their SIP phone, and send it an INVITE in order to join 772 the newly-created conference. 774 5.2 Adding Participants 776 There are many mechanisms for adding participants to a conference. 777 These include SIP, the conference policy control protocol, and non- 778 automated means. In all cases, participant additions can be first 779 party (a user adds themself) or third party (a user adds another 780 user). 782 5.2.1 SIP Mechanisms 784 First person additions using SIP are trivially accomplished with a 785 standard INVITE. A participant can send an INVITE request to the 786 conference URI, and if the conference policy allows them to join, 787 they are added to the conference. 789 If a UA does not know the conference URI, but has learned about a 790 dialog which is connected to a conference (by using the dialog event 791 package, for example [11]), the UA can join the conference by using 792 the Join header to join the dialog. 794 Third party additions with SIP are done using REFER [12]. The client 795 can send a REFER request to the participant, asking them to send an 796 INVITE request to the conference URI. Additionally, the client can 797 send a REFER request to the focus, asking it to send an INVITE to the 798 participant. The latter technique has the benefit of allowing a 799 client to add a conference-unaware participant that does not support 800 the REFER method. 802 5.2.2 CPCP Mechanisms 803 A basic function of the conference policy control protocol is to add 804 participants. A client of the protocol can specify any SIP URI (which 805 may identify themself) that is to be added. If the URI does not 806 identify a user that is already a participant in the conference, the 807 focus will send an INVITE to that URI in order to add them in. 809 5.2.3 Non-Automated Mechanisms 811 There are countless non-automated means for asking a participant to 812 join the conference. Generally, they involve conveying the conference 813 URI to the desired participant, so that they can send an INVITE to 814 it. These mechanisms all require some kind of human interaction. 816 As an example, a user can send an instant message [13] to the third 817 party, containing an HTML document which requests the user to click 818 on the hyperlink to join the conference: 820 821 Hey, would you like to join 822 the conference now? 823 825 5.3 Conditional Joins 827 In many cases, a new participant will not wish to join the conference 828 unless they can join with a particular set of policies. As an 829 example, a participant may want to join anonymously, so that other 830 participants know that someone has joined, but not who. To accomplish 831 this, the conference policy control protocol is used to establish 832 these policies prior to the generation or acceptance of an invitation 833 to the conference. For example, if a user wishes to join a conference 834 with a known conference URI, the user would obtain the URI for the 835 conference policy, manipulate the policy to set themself as an 836 anonymous participant, and then actually join the conference by 837 sending an INVITE request to the conference URI. 839 5.4 Removing Participants 841 As with additions, there are several mechanisms for departures. These 842 include SIP mechanisms and CPCP mechanisms. Removals can also be 843 first person or third person. 845 5.4.1 SIP Mechanisms 847 First person departures are trivially accomplished by sending a BYE 848 request to the focus. This terminates the dialog with the focus and 849 removes the participant from the conference. 851 Third person departures can also be done using SIP, through the REFER 852 method. 854 5.4.2 CPCP Mechanisms 856 The CPCP can be used by a client to remove any participant (including 857 themself). When CPCP is used for this purpose, the focus will send a 858 BYE request to the participant that is being removed. The focus will 859 execute any other signaling that is needed to remove them (for 860 example, manipulate other dialogs in order to manage the change in 861 media streams). 863 The conference policy control protocol can also be used to remove a 864 large number of users. This is generally referred to as mass 865 ejection. 867 5.4.3 Non-Automated Mechanisms 869 As with the other common conferencing functions, there are many non- 870 automated ways to remove a participant. The identity of the 871 participant can be entered into a web form. When the user clicks 872 submit, the focus sends a BYE to that participant, removing them from 873 the conference. Alternatively, the conference can expose an IM 874 interface, where the user can send an IM to the conference saying 875 "remove Bob", causing the conference server to remove Bob. 877 5.5 Approving Policy Changes 879 OPEN ISSUE: The basic mechanism described here depends on 880 the actual protocols used for conference and media policy 881 manipulation. If the protocol itself provides change 882 notifications, sip-events may not be needed for that 883 purpose. Thus, this description here is tentative. 885 A conference policy for a particular conference may designate one or 886 more users as moderators for some set of media policy or conference 887 policy change requests. This means that those moderators need to 888 approve the specific policy change. Typically, moderators are used to 889 approve member additions and removals. However, the framework allows 890 for moderators to be associated with any policy change that can be 891 made. 893 Moderating a policy request is done using a combination of the 894 conference notification service and the CPCP protocol. 896 First, a client makes a policy change. This can be directly, using 897 the CPCP, or indirectly. An indirect policy change request is any 898 non-CPCP action that requires approval. The simplest example is an 899 INVITE to the focus from a new participant. That represents a request 900 to change the membership of the conference. From a moderation 901 perspective, it is handled identically to the case where a client 902 used the CPCP to request that the same user to be added to the 903 conference. 905 Part of the conference policy itself may designate any policy change 906 as moderated. This means that they change cannot be performed by the 907 client directly. As a result, any CPCP request will fail, and the 908 failure response informs the client that their request failed due to 909 insufficient authorization. That completes the CPCP transaction. In 910 the case of a policy change requested indirectly through some other 911 means, the behavior depends on the mechanism. For example, if a user 912 sends a SIP INVITE request to the conference in order to join, and 913 that join request is moderated, the focus can reject the INVITE, or 914 it can accept it and play music-on-hold until the request is 915 approved. 917 Even though the CPCP transaction failed, it does result in a change 918 in internal state. Specifically, the requested change shows up as a 919 "pending" state within the media and conference policies. This means 920 that the change has been requested, but has not taken effect. It is 921 almost a form of change request history. However, because it is a 922 state change, it is something that can result in notifications 923 through the conference notification service. 925 Therefore, in order to moderate requests, the moderator subscribes to 926 the conference policy notification service. Normally, the 927 notifications from the focus do not reflect pending state changes. 928 That is, the service will not normally send a notification informing 929 a subscriber that a policy change request was made and failed due to 930 lack of authorization. However, notifications to the moderator do 931 reflect these changes. That is because the policy of the focus is to 932 inform moderators, and only moderators, of these changes. Indeed, 933 different users can be moderators for different parts of the 934 conference and media policies. For example, one user can be a 935 moderator for membership changes, and another, a moderator for 936 whether users can be anonymously joined or not. 938 There are two ways that the focus knows whether a subscriber to the 939 conference notification service is a moderator. The first is 940 configured policy (once again through CPCP). That policy can specify 941 that a particular user is the moderator for a particular piece of 942 policy. Therefore, if that user subscribes to the conference 943 notification service, any notification sent to that user will include 944 pending changes to that piece of policy. As an alternative, a 945 SUBSCRIBE request from a user can include a filter [14] that requests 946 receipt of these pending state changes. If the conference policy 947 allows, that request is honored, and the subscriber will receive 948 notifications about pending state changes. 950 Once the moderator receives a notification about the pending state 951 change, they use the CPCP to implement their decision. If the 952 moderator decides to approve the change, they use the CPCP or MPCP to 953 actually perform the change themselves. Since the moderator for a 954 piece of policy is allowed to change that piece of policy, by 955 definition, their change is accepted and performed. If the moderator 956 decides to reject the change, they use the CPCP to remove the pending 957 state from the database. 959 The pending state persists in the database for a period of time which 960 is, itself, part of the conference policy. If the moderator does not 961 either approve or reject the change, the pending state eventually 962 disappears, as if the change was explicitly rejected. 964 If the pending state is approved, a real change to the conference or 965 media policy takes place, and this change will be reflected in the 966 conference notification service. In this way, if a client makes a 967 policy change, and their request is rejected because they are not 968 authorized, the client can subscribe to the conference notification 969 service to learn if their change is eventually approved or rejected. 971 This general mechanism for moderating policy requests is consistent 972 with the moderation of presence subscriptions [15] [16]. 974 5.6 Creating Sidebars 976 A sidebar is a "conference within a conference", allowing a subset of 977 the participants to converse amongst themselves. Frequently, 978 participants in a sidebar will still receive media from the main 979 conference, but "in the background". For audio, this may mean that 980 the volume of the media is reduced, for example. 982 A sidebar is represented by a separate conference URI. This URI is a 983 type of "alias" for the main conference URI. Both route to the same 984 focus. Like any other conference, the sidebar conference URI has a 985 conference policy and a media policy associated with it. Like any 986 other conference, one can join it by sending an INVITE to this URI, 987 or ask others to join by referring them to it. However, it differs 988 from a normal conference URI in several ways. First, users in the 989 main conference do not need to establish a separate dialog to the 990 sidebar conference. The focus recognizes the sidebar as a special 991 URI, and knows to use the existing dialog to the main conference as a 992 "virtual" connection to the sidebar URI. 994 The second difference is the way in which conference and media 995 policies are implemented. If the conference policy control protocol 996 is used to add a user to a normal conference, the focus will 997 typically send an INVITE to the participant to ask them to join. For 998 a sidebar conference, it is done differently. If the conference 999 policy control protocol is used to add a user to it, and that user is 1000 already part of the main conference, the focus will use the 1001 conference notification service to alert the existing participant 1002 that they have been asked to join the sidebar. The invited user can 1003 then make use of the CPCP to formally add themselves to the sidebar. 1005 5.7 Destroying Conferences 1007 Conferences can be destroyed in several ways. Generally, whether 1008 those means are applicable for any particular conference is a 1009 component of the conference policy. 1011 When a conference is destroyed, the conference and media policies 1012 associated with it are destroyed. Any attempts to read or write those 1013 policies results in a protocol error. Furthermore, the conference URI 1014 becomes invalid. Any attempts to send an INVITE to it, or SUBSCRIBE 1015 to it, would result in a SIP error response. 1017 Typically, if a conference is destroyed while there are still 1018 participants, the focus would send a BYE to those participants before 1019 actually destroying the conference. Similarly, if there were any 1020 users subscribed to the conference notification service, those 1021 subscriptions would be terminated by the server before the actual 1022 destruction. 1024 5.7.1 SIP Mechanisms 1026 There is no explicit means in SIP to destroy a conference. However, a 1027 conference may be destroyed as a by-product of a user leaving the 1028 conference, which can be done with BYE. In particular, if the 1029 conference policy states that the conference is destroyed once the 1030 last user leaves, when that user does leave (using a SIP BYE 1031 request), the conference is destroyed. 1033 5.7.2 CPCP Mechanisms 1035 The CPCP contains mechanisms for explicitly destroying a conference. 1037 5.7.3 Non-Automated Mechanisms 1039 As with conference creation, a conference can be destroyed by 1040 interacting with a web application or voice application that prompts 1041 the user for the conference to be destroyed. 1043 5.8 Obtaining Membership 1045 A participant in a conference will frequently wish to know the set of 1046 other users in the conference. This information can be obtained many 1047 ways. 1049 5.8.1 SIP Mechanisms 1051 The conference notification service allows a conference aware 1052 participant to subscribe to it, and receive notifications that 1053 contain the list of participants. When a new participant joins or 1054 leaves, subscribers are notified. The conference notification service 1055 also allows a user to do a "fetch" [4] to obtain the current listing. 1057 5.8.2 CPCP Mechanisms 1059 The CPCP contains mechanisms for querying for the current set of 1060 conference participants. 1062 5.8.3 Non-Automated Mechanisms 1064 Users can also interact with applications to obtain conference 1065 membership. There may be a conference web page associated with the 1066 conference, which has a link that will fetch the current list of 1067 participants and display them in the browser. Similarly, an 1068 interactive voice response application connected to the focus can be 1069 used to obtain the current membership. A user in the conference could 1070 press the pound key on their phone, and hear a listing of the current 1071 participants. 1073 5.9 Adding and Removing Media 1075 Each conference is composed of a particular set of media that the 1076 focus is managing. For example, a conference might contain a video 1077 stream and an audio stream. The set of media streams that constitute 1078 the conference can be changed by participants. When the set of media 1079 in the conference change, the focus will need to generate a re-INVITE 1080 to each participant in order to add or remove the media stream to 1081 each participant. When a media stream is being added, a participant 1082 can reject the offered media stream, in which case it will not 1083 receive or contribute to that stream. Rejection of a stream by a 1084 participant does not imply that that the stream is no longer part of 1085 the conference - just that the participant is not involved in it. 1087 There are several ways in which a media stream can be added or 1088 removed from a conference. 1090 5.9.1 SIP Mechanisms 1092 A SIP re-INVITE can be used by a participant to add or remove a media 1093 stream. This is accomplished using the standard offer/answer 1094 techniques for adding media streams to a session [17]. This will 1095 trigger the focus to generate its own re-INVITEs. 1097 5.9.2 CPCP Mechanisms 1099 The CPCP can be used to add or remove a media stream. This too will 1100 trigger the focus to generate a re-INVITE to each participant in 1101 order to affect the change. 1103 5.9.3 Non-Automated Mechanisms 1105 As with most of the other common functions, addition and removal of 1106 media streams can be accomplished with a web application or 1107 interactive voice application. 1109 5.10 Conference Announcements and Recordings 1111 Conference announcements and recordings play a key role in many real 1112 conferencing systems. Examples of such features include: 1114 1. Asking a user to state their name before joining the 1115 conference, in order to support a roll call 1117 2. Allowing a user to request a roll call, so they can hear 1118 who else is in the conference 1120 3. Allowing a user to press some keys on their keypad in order 1121 to record the conference 1123 4. Allowing a user to press some keys on their keypad in order 1124 to be connected with a human operator 1126 5. Allowing a user to press some keys on their keypad to mute 1127 or unmute their line 1129 In this framework, these capabilities are modeled as an application 1130 which acts as a participant in the conference. This is shown 1131 pictorially in Figure 3. The conference has four participants. Three 1132 of these participants are end users, and the fourth is the 1133 announcement application. 1135 User 1 1136 +-----------+ 1137 | | 1138 | | 1139 |Participant| 1140 | 4 | 1141 | | 1142 +-----------+ 1143 |SIP 1144 |Dialog 1145 Conference |1 1146 Policy +---|--------+ 1147 User 2 Server | | | Application 1148 +-----------+ +-----------+ | CPCP ************* 1149 | | | | |-------- * * 1150 | | | | | * * 1151 |Participant|-----------| Focus |------------*Participant* 1152 | 1 | SIP | | | SIP * 3 * 1153 | | Dialog | |--+ Dialog * * 1154 +-----------+ 2 +-----------+ 4 ************* 1155 | 1156 | 1157 |SIP 1158 |Dialog 1159 |3 1160 | 1161 +-----------+ 1162 | | 1163 | | 1164 |Participant| 1165 | 2 | 1166 | | 1167 +-----------+ 1168 User 3 1170 Figure 3: Conference announcement application 1171 If the announcement application wishes to play an announcement to all 1172 the conference members (for example, to announce a join), it merely 1173 sends media to the mixer as would any other participant. The 1174 announcement is mixed in with the conversation and played to the 1175 participants. 1177 Similarly, the announcement application can play an announcement to a 1178 specific user by using the CPCP to configure its media policy so that 1179 the media it generates is only heard by the target user. The 1180 application then generates the desired announcement, and it will be 1181 heard only by the selected recipient. 1183 The announcement application can also receive input from a specific 1184 user through the conference. The announcement application would use 1185 the CPCP to cause in-band DTMF to be dropped from the mix, and sent 1186 only to itself. When a user wishes to invoke an operation, such as to 1187 obtain a roll call, the user would press the appropriate key 1188 sequence. That sequence would be heard only by the announcement 1189 application. Once the application determines that the user wishes to 1190 hear a roll call, it can use the CPCP to set the media policy so that 1191 media from that user is delivered only to the announcement 1192 application. This "disconnects" the user from the rest of the 1193 conference so they can interact with the application. Once the 1194 interaction is done, and announcement application uses the CPCP to 1195 "reconnect" the user to the conference. 1197 5.11 Floor Control 1199 Floor control is similar to a conference announcement application. 1200 Within this framework, floor control is managed by an application 1201 (possibly one that is not a participant) that uses the CPCP to 1202 enforce the resulting floor control decisions. 1204 [[Need more work here]] 1206 5.12 Camera and Video Controls 1208 OPEN ISSUE: Originally, I was just going to say that this 1209 is outside the scope of conferencing. But, it does impact 1210 conferencing. Effectively, camera control is treated like a 1211 media stream. The mixer would combine the various requests 1212 across participants and direct them to the appropriate 1213 device. How does that work though? In a video conference 1214 with 4 participants, the camera control needs to identify 1215 the specific user whose camera is to be controlled. That is 1216 something unique to conferencing. 1218 6 Physical Realization 1220 In this section, we present several physical instantiations of these 1221 components, to show how these basic functions can be combined to 1222 solve a variety of problems. 1224 6.1 Centralized Server 1226 In the most simplistic realization of this framework, there is a 1227 single physical server in the network which implements the focus, the 1228 conference policy server, and the mixers. This is the classic "one 1229 box" solution, shown in Figure 4. 1231 6.2 Endpoint Server 1233 Another important model is that of a locally-mixed ad-hoc conference. 1234 In this scenario, two users (A and B) are in a regular point-to-point 1235 call. One of the participants (A) decides to conference in a third 1236 participant, C. To do this, A begins acting as a focus. Its existing 1237 dialog with B becomes the first dialog attached to the focus. A would 1238 re-INVITE B on that dialog, changing its Contact URI to a new value 1239 which identifies the focus. In essence, A "mutates" from a single- 1240 user UA to a focus plus a single user UA, and in the process of such 1241 a mutation, its URI changes. Then, the focus makes an outbound INVITE 1242 to C. When C accepts, it mixes the media from B and C together, 1243 redistributing the results. The mixed media is also played locally. 1244 Figure 5 shows a diagram of this transition. 1246 It is important to note that the external interfaces in this model, 1247 between A and B, and between B and C, are exactly the same to those 1248 that would be used in a centralized server model. B could also 1249 include a conference policy server and conference notification 1250 service, allowing the participants to have access to them if they so 1251 desired. Just because the focus is co-resident with a participant 1252 does not mean any aspect of the behaviors and external interfaces 1253 will change. 1255 6.3 Media Server Component 1257 In this model, shown in Figure 6, each conference involves two 1258 centralized servers. One of these servers, referred to as the 1259 "application server" owns and manages the membership and media 1260 policies, and maintains a dialog with each participant. As a result, 1261 it represents the focus seen by all participants in a conference. 1262 However, this server doesn't provide any media support. To perform 1263 Conference Server 1264 ................................... 1265 . . 1266 . +------------+ . 1267 . | Conference | . 1268 . |Notification| . 1269 . | Server | . 1270 . +------------+ . 1271 . +----------+ . 1272 . |Conference| +-----+ . 1273 . | Policy | +-------+ +-----+| . 1274 . | Server | | Focus | |Mixer|+ . 1275 . +----------+ +-------+ +-----+ . 1276 ................//.\.....***....... 1277 // \ *** * 1278 // *** * RTP 1279 SIP // *** \ * 1280 // *** \SIP * 1281 // *** RTP \ * 1282 / ** \ * 1283 +-----------+ +-----------+ 1284 |Participant| |Participant| 1285 +-----------+ +-----------+ 1287 Figure 4: Centralized server architecture 1289 the actual media mixing function, it makes use of a second server, 1290 called the "mixing server". This server includes a focus, and a 1291 conference policy server, but has no conference notification service. 1292 It has a default membership policy, which accepts all invitations 1293 from the top-level focus. Its conference policy server accepts any 1294 controls made by the application server. The focus in the application 1295 B B 1296 +------+ +------+ 1297 | | | | 1298 | UA | | UA | 1299 | | | | 1300 +------+ +------+ 1301 | . | . 1302 | . | . 1303 | . | . 1304 | . Transition | . 1305 | . ------------> | . 1306 SIP| .RTP SIP| .RTP 1307 | . | . 1308 | . | . 1309 | . | . 1310 | . | . 1311 | . +----------+ 1312 +------+ | +------+ | SIP +------+ 1313 | | | |Focus | |----------| | 1314 | UA | | |C.Pol.| | | UA | 1315 | | | |Mixers| |..........| | 1316 +------+ | | | | RTP +------+ 1317 | +------+ | 1318 A | + | C 1319 | + <..|....... 1320 | + | . 1321 | +------+ | . 1322 | |Parti-| | . 1323 | |cipant| | . 1324 | | | | . 1325 | +------+ | . 1326 +----------+ . 1327 A . 1328 . 1330 Internal 1331 Interface 1333 Figure 5: Transition from two-party call to conference 1335 server uses third party call control to connect the media streams of 1336 each user to the mixing server, as needed. If the focus in the 1337 application server receives a conference policy control command from 1338 +------------+ +------------+ 1339 | App Server| SIP |Conf. Cmpnt.| 1340 | |-------------| | 1341 | Focus | Conf. Proto | Focus | 1342 | C.Pol |-------------| C.Pol | 1343 | | Media Proto | Mixers | 1344 |Notification|-------------| | 1345 | | | | 1346 +------------+ +------------+ 1347 | \ .. . 1348 | \\ RTP... . 1349 | \\ .. . 1350 | SIP \\ ... . 1351 SIP | \\ ... .RTP 1352 | ..\ . 1353 | ... \\ . 1354 | ... \\ . 1355 | .. \\ . 1356 | ... \\ . 1357 | .. \ . 1358 +-----------+ +-----------+ 1359 |Participant| |Participant| 1360 +-----------+ +-----------+ 1362 Figure 6: Media server component model 1364 This model allows for the mixing server to be used as a resource for 1365 a variety of different conferencing applications. This is because it 1366 is unaware of any conference or media policies; it is merely a 1367 "slave" to the top-level server, doing whatever it asks. This is 1368 consistent with the SIP Application Server Component Model [18]. 1370 6.4 Distributed Mixing 1371 In a distributed mixed conference, there is still a centralized 1372 server which implements the focus, conference policy server, and 1373 media policy server. However, there are no centralized mixers. 1374 Rather, there are mixers in each endpoint, along with a conference 1375 policy server. The focus distributes the media by using third party 1376 call control [19] to move a media stream between each participant and 1377 each other participant. As a result, if there are N participants in 1378 the conference, there will be a single dialog between each 1379 participant and the focus, but the session description associated 1380 with that dialog will be constructed to allow media to be distributed 1381 amongst the participants. This is shown in Figure 7. 1383 There are several ways in which the media can be distributed to each 1384 participant for mixing. In a multi-unicast model, each participant 1385 sends a copy of its media to each other participant. In this case, 1386 the session description manages N-1 media streams. In a multicast 1387 model, each participant joins a common multicast group, and each 1388 participant sends a single copy of its media stream to that group. 1389 The underlying multicast infrastructure then distributes the media, 1390 so that each participant gets a copy. In a single-source multicast 1391 model (SSM), each participant sends its media stream to a central 1392 point, using unicast. The central point then redistributes the media 1393 to all participants using multicast. The focus is responsible for 1394 selecting the modality of media distribution, and for handling any 1395 hybrids that would be necessitated from clients with mixed 1396 capabilities. 1398 When a new participant joins or is added, the focus will perform the 1399 necessary third party call control to distribute the media from the 1400 new participant to all the other participants, and vice-a-versa. 1402 The central conference server also includes a conference policy 1403 server. Of course, the central conference server cannot implement any 1404 of the media policies directly. Rather, it would delegate the 1405 implementation to the conference policy servers co-resident with a 1406 participant. As an example, if a participant decides to switch the 1407 overall conference mode from "voice activated" to "continuous 1408 presence", they would communicate with the central conference policy 1409 server. The conference policy server, in turn, would communicate with 1410 the conference policy servers co-resident with each participant, 1411 using the same conference policy control protocol, and instruct them 1412 to use "continuous presence". 1414 This model requires additional functionality in user agents, which 1415 may or may not be present. The participants, therefore, must be able 1416 to advertise this capability to the focus. 1418 6.5 Cascaded Mixers 1420 In very large conferences, it may not be possible to have a single 1421 mixer that can handle all of the media. A solution to this is to use 1422 cascaded mixers. In this architecture, there is a centralized focus, 1423 but the mixing function is implemented by a multiplicity of mixers, 1424 scattered throughout the network. Each participant is connected to 1425 one, and only one of the mixers. The focus uses some kind of control 1426 protocol to connect the mixers together, so that all of the 1427 participants can hear each other. 1429 This architecture is shown in Figure 8. 1431 7 Security Considerations 1433 Conferences frequently require security features in order to properly 1434 operate. The conference policy may dictate that only certain 1435 participants can join, or that certain participants can create new 1436 policies. Generally speaking, conference applications are very 1437 concerned about authorization decisions. Mechanisms for establishing 1438 and enforcing such authorization rules is a central concept 1439 throughout this document. 1441 Of course, authorization rules require authentication. Normal SIP 1442 authentication mechanisms should suffice for the conference 1443 authorization mechanisms described here. 1445 8 Contributors 1447 This document is the result of discussions amongst the conferencing 1448 design team. The members of this team include: 1450 Alan Johnston 1451 Brian Rosen 1452 Rohan Mahy 1453 Henning Schulzrinne 1454 Orit Levin 1455 Roni Even 1456 Tom Taylor 1457 Petri Koskelainen 1458 Nermeen Ismail 1459 Andy Zmolek 1460 Joerg Ott 1461 Dan Petrie 1462 +---------+ 1463 |Partcpnt | 1464 media | | media 1465 ...............| |.................. 1466 . | Mixers | . 1467 . |C.Pol.Srv| . 1468 . +---------+ . 1469 . | . 1470 . | . 1471 . | . 1472 . dialog | . 1473 . | . 1474 . | . 1475 . | . 1476 . +---------+ . 1477 . |Cnf.Srvr.| . 1478 . | | . 1479 . | Focus | . 1480 . |C.Pol.Srv| . 1481 . / | | \ . 1482 . / +---------+ \ . 1483 . / \ . 1484 . / \ . 1485 . / dialog \ . 1486 . / \ . 1487 . /dialog \ . 1488 . / \ . 1489 . / \ . 1490 . / \ . 1491 . . 1492 +---------+ +---------+ 1493 |Partcpnt | |Partcpnt | 1494 | | | | 1495 | | ......................... | | 1496 | Mixers | | Mixers | 1497 |C.Pol.Srv| media |C.Pol.Srv| 1498 +---------+ +---------+ 1500 Figure 7: Dialog and media streams in a distributed mixed conference 1502 9 Changes since draft-rosenberg-sipping-conferencing-framework-00 1504 o Rework of terminology. 1506 o More details on moderating policy changes. 1508 o Rework of the overview, and in particular, a shift of focus 1509 from basic/complex conferences (a term which has been removed) 1510 to conference aware/unaware participants. 1512 o Removal of explicit reference to megaco for controlling a 1513 mixer. 1515 o Discussion of a lot more conferencing operations. 1517 o New sidebar mechanism. 1519 10 Authors Addresses 1521 Jonathan Rosenberg 1522 dynamicsoft 1523 72 Eagle Rock Avenue 1524 First Floor 1525 East Hanover, NJ 07936 1526 email: jdrosen@dynamicsoft.com 1528 11 Normative References 1530 12 Informative References 1532 [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. 1533 Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session 1534 initiation protocol," RFC 3261, Internet Engineering Task Force, June 1535 2002. 1537 [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a 1538 transport protocol for real-time applications," RFC 1889, Internet 1539 Engineering Task Force, Jan. 1996. 1541 [3] O. Levin et al. , "Requirements for tightly coupled SIP 1542 conferencing," internet draft, Internet Engineering Task Force, Nov. 1543 2002. Work in progress. 1545 [4] A. B. Roach, "Session initiation protocol (sip)-specific event 1546 notification," RFC 3265, Internet Engineering Task Force, June 2002. 1548 [5] B. Campbell and J. Rosenberg, "Instant message sessions in 1549 SIMPLE," internet draft, Internet Engineering Task Force, Oct. 2002. 1550 Work in progress. 1552 [6] J. Rosenberg, "A framework and requirements for application 1553 interaction in sip," Internet Draft, Internet Engineering Task Force, 1554 Oct. 2002. Work in progress. 1556 [7] A. Johnston and O. Levin, "Session initiation protocol call 1557 control - conferencing for user agents," internet draft, Internet 1558 Engineering Task Force, Feb. 2003. Work in progress. 1560 [8] T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform resource 1561 identifiers (URI): generic syntax," RFC 2396, Internet Engineering 1562 Task Force, Aug. 1998. 1564 [9] H. Schulzrinne and J. Rosenberg, "Session initiation protocol 1565 (SIP) caller preferences and callee capabilities," internet draft, 1566 Internet Engineering Task Force, Nov. 2002. Work in progress. 1568 [10] R. Mahy and D. Petrie, "The session inititation protocol (SIP) 1569 'join' header," internet draft, Internet Engineering Task Force, Oct. 1570 2002. Work in progress. 1572 [11] J. Rosenberg and H. Schulzrinne, "A session initiation protocol 1573 (SIP) event package for dialog state," internet draft, Internet 1574 Engineering Task Force, June 2002. Work in progress. 1576 [12] R. Sparks, "The SIP refer method," internet draft, Internet 1577 Engineering Task Force, Dec. 2002. Work in progress. 1579 [13] "Session initiation protocol (SIP) extension for instant 1580 messaging," RFC 3428, Internet Engineering Task Force, Dec. 2002. 1582 [14] T. Moran and S. Addagatla, "Architecture for event notification 1583 filters," internet draft, Internet Engineering Task Force, Oct. 2002. 1584 Work in progress. 1586 [15] J. Rosenberg, "A presence event package for the session 1587 initiation protocol (SIP)," internet draft, Internet Engineering Task 1588 Force, Jan. 2003. Work in progress. 1590 [16] J. Rosenberg, "A watcher information event template-package for 1591 the session initiation protocol (SIP)," internet draft, Internet 1592 Engineering Task Force, Jan. 2003. Work in progress. 1594 +---------+ 1595 +-----------------------| |------------------------+ 1596 | ++++++++++++++++++++| |++++++++++++++++++ | 1597 | + +------| Focus |---------+ + | 1598 | + | | | | + | 1599 | + | +-| |--+ | + | 1600 | + | | +---------+ | | + | 1601 | + | | + | | + | 1602 | + | | + | | + | 1603 | + | | + | | + | 1604 | + | | +---------+ | | + | 1605 | + | | | | | | + | 1606 | + | | | Mixer 2 | | | + | 1607 | + | | | | | | + | 1608 | + | | +---------+ | | + | 1609 | + | |... . .... | | + | 1610 | + .|....| . .|.... | + | 1611 | + ...... | | . | ..|... + | 1612 | + ... | | . | | ....+ | 1613 | +---------+ | | +---------+ | | +---------+ | 1614 | | | | | | | | | | | | 1615 | | Mixer 2 | | | | Mixer 3 | | | | Mixer 4 | | 1616 | | | | | | | | | | | | 1617 | +---------+ | | +---------+ | | +---------+ | 1618 | . . | | . . | | . . | 1619 | . . | | .. . | | .. . | 1620 | . . | | . . | | . . | 1621 +---------+ . | +---------+ . | +---------+ . | 1622 | Prtcpnt | . | | Prtcpnt | . | | Prtcpnt | . | 1623 | 1 | . | | 1 | . | | 1 | . | 1624 +---------+ . | +---------+ . | +---------+ . | 1625 . | . | . | 1626 +---------+ +---------+ +---------+ 1627 | Prtcpnt | | Prtcpnt | | Prtcpnt | 1628 | 1 | | 1 | | 1 | 1629 +---------+ +---------+ +---------+ 1631 ------- SIP Dialog 1632 ....... Media Flow 1633 +++++++ Control Protocol 1635 Figure 8: Cascaded Mixers 1637 [17] J. Rosenberg and H. Schulzrinne, "An offer/answer model with 1638 session description protocol (SDP)," RFC 3264, Internet Engineering 1639 Task Force, June 2002. 1641 [18] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application 1642 server component architecture for SIP," internet draft, Internet 1643 Engineering Task Force, Mar. 2001. Work in progress. 1645 [19] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, 1646 "Best current practices for third party call control in the session 1647 initiation protocol," internet draft, Internet Engineering Task 1648 Force, June 2002. Work in progress. 1650 Intellectual Property Statement 1652 The IETF takes no position regarding the validity or scope of any 1653 intellectual property or other rights that might be claimed to 1654 pertain to the implementation or use of the technology described in 1655 this document or the extent to which any license under such rights 1656 might or might not be available; neither does it represent that it 1657 has made any effort to identify any such rights. Information on the 1658 IETF's procedures with respect to rights in standards-track and 1659 standards-related documentation can be found in BCP-11. Copies of 1660 claims of rights made available for publication and any assurances of 1661 licenses to be made available, or the result of an attempt made to 1662 obtain a general license or permission for the use of such 1663 proprietary rights by implementors or users of this specification can 1664 be obtained from the IETF Secretariat. 1666 The IETF invites any interested party to bring to its attention any 1667 copyrights, patents or patent applications, or other proprietary 1668 rights which may cover technology that may be required to practice 1669 this standard. Please address the information to the IETF Executive 1670 Director. 1672 Full Copyright Statement 1674 Copyright (c) The Internet Society (2003). All Rights Reserved. 1676 This document and translations of it may be copied and furnished to 1677 others, and derivative works that comment on or otherwise explain it 1678 or assist in its implementation may be prepared, copied, published 1679 and distributed, in whole or in part, without restriction of any 1680 kind, provided that the above copyright notice and this paragraph are 1681 included on all such copies and derivative works. However, this 1682 document itself may not be modified in any way, such as by removing 1683 the copyright notice or references to the Internet Society or other 1684 Internet organizations, except as needed for the purpose of 1685 developing Internet standards in which case the procedures for 1686 copyrights defined in the Internet Standards process must be 1687 followed, or as required to translate it into languages other than 1688 English. 1690 The limited permissions granted above are perpetual and will not be 1691 revoked by the Internet Society or its successors or assigns. 1693 This document and the information contained herein is provided on an 1694 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1695 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1696 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1697 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1698 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.