idnits 2.17.1 draft-ietf-sipping-conferencing-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 13. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1680. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1657. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1664. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1670. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 1686), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 35. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 9 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 29, 2004) is 7238 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-levin-sipping-conferencing-requirements-01 -- Obsolete informational reference (is this intentional?): RFC 3265 (ref. '4') (Obsoleted by RFC 6665) == Outdated reference: A later version (-19) exists of draft-ietf-simple-message-sessions-06 == Outdated reference: A later version (-05) exists of draft-ietf-sipping-app-interaction-framework-01 == Outdated reference: A later version (-07) exists of draft-ietf-sipping-cc-conferencing-03 -- Obsolete informational reference (is this intentional?): RFC 2396 (ref. '8') (Obsoleted by RFC 3986) == Outdated reference: A later version (-06) exists of draft-ietf-sipping-dialog-package-04 Summary: 8 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SIP J. Rosenberg 2 Internet-Draft dynamicsoft 3 Expires: December 28, 2004 June 29, 2004 5 A Framework for Conferencing with the Session Initiation Protocol 6 draft-ietf-sipping-conferencing-framework-02 8 Status of this Memo 10 By submitting this Internet-Draft, I certify that any applicable 11 patent or other IPR claims of which I am aware have been disclosed, 12 and any of which I become aware will be disclosed, in accordance with 13 RFC 3668. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as 18 Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on December 28, 2004. 33 Copyright Notice 35 Copyright (C) The Internet Society (2004). All Rights Reserved. 37 Abstract 39 The Session Initiation Protocol (SIP) supports the initiation, 40 modification, and termination of media sessions between user agents. 41 These sessions are managed by SIP dialogs, which represent a SIP 42 relationship between a pair of user agents. Because dialogs are 43 between pairs of user agents, SIP's usage for two-party 44 communications (such as a phone call), is obvious. Communications 45 sessions with multiple participants, generally known as conferencing, 46 are more complicated. This document defines a framework for how such 47 conferencing can occur. This framework describes the overall 48 architecture, terminology, and protocol components needed for 49 multi-party conferencing. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 55 3. Overview of Conferencing Architecture . . . . . . . . . . . . 8 56 3.1 Usage of URIs . . . . . . . . . . . . . . . . . . . . . . 11 57 4. Functions of the Elements . . . . . . . . . . . . . . . . . . 13 58 4.1 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . 13 59 4.2 Conference Policy Server . . . . . . . . . . . . . . . . . 14 60 4.3 Mixers . . . . . . . . . . . . . . . . . . . . . . . . . . 15 61 4.4 Conference Notification Service . . . . . . . . . . . . . 15 62 4.5 Participants . . . . . . . . . . . . . . . . . . . . . . . 16 63 4.6 Conference Policy . . . . . . . . . . . . . . . . . . . . 16 64 5. Common Operations . . . . . . . . . . . . . . . . . . . . . . 18 65 5.1 Creating Conferences . . . . . . . . . . . . . . . . . . . 18 66 5.1.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 18 67 5.1.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 19 68 5.1.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 19 69 5.2 Adding Participants . . . . . . . . . . . . . . . . . . . 19 70 5.2.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 19 71 5.2.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 20 72 5.2.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 20 73 5.3 Conditional Joins . . . . . . . . . . . . . . . . . . . . 20 74 5.4 Removing Participants . . . . . . . . . . . . . . . . . . 21 75 5.4.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 21 76 5.4.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 21 77 5.4.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 21 78 5.5 Approving Policy Changes . . . . . . . . . . . . . . . . . 22 79 5.6 Creating Sidebars . . . . . . . . . . . . . . . . . . . . 24 80 5.7 Destroying Conferences . . . . . . . . . . . . . . . . . . 24 81 5.7.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 25 82 5.7.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 25 83 5.7.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 25 84 5.8 Obtaining Membership Information . . . . . . . . . . . . . 25 85 5.8.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 25 86 5.8.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 25 87 5.8.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 25 88 5.9 Adding and Removing Media . . . . . . . . . . . . . . . . 26 89 5.9.1 SIP Mechanisms . . . . . . . . . . . . . . . . . . . . 26 90 5.9.2 CPCP Mechanisms . . . . . . . . . . . . . . . . . . . 26 91 5.9.3 Non-Automated Mechanisms . . . . . . . . . . . . . . . 26 92 5.10 Conference Announcements and Recordings . . . . . . . . . 26 93 5.11 Floor Control . . . . . . . . . . . . . . . . . . . . . . 28 94 5.12 Camera and Video Controls . . . . . . . . . . . . . . . . 28 95 6. Physical Realization . . . . . . . . . . . . . . . . . . . . . 29 96 6.1 Centralized Server . . . . . . . . . . . . . . . . . . . . 29 97 6.2 Endpoint Server . . . . . . . . . . . . . . . . . . . . . 29 98 6.3 Media Server Component . . . . . . . . . . . . . . . . . . 31 99 6.4 Distributed Mixing . . . . . . . . . . . . . . . . . . . . 32 100 6.5 Cascaded Mixers . . . . . . . . . . . . . . . . . . . . . 34 101 7. Security Considerations . . . . . . . . . . . . . . . . . . . 36 102 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37 103 9. Changes from draft-ietf-sipping-conferencing-framework-00 . . 38 104 10. Changes since 105 draft-rosenberg-sipping-conferencing-framework-01 . . . . . 39 106 11. Changes since 107 draft-rosenberg-sipping-conferencing-framework-00 . . . . . 40 108 12. Informative References . . . . . . . . . . . . . . . . . . . 40 109 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 42 110 Intellectual Property and Copyright Statements . . . . . . . . 43 112 1. Introduction 114 The Session Initiation Protocol (SIP) [1] supports the initiation, 115 modification, and termination of media sessions between user agents. 116 These sessions are managed by SIP dialogs, which represent a SIP 117 relationship between a pair of user agents. Because dialogs are 118 between pairs of user agents, SIP's usage for two-party 119 communications (such as a phone call), is obvious. Communications 120 sessions with multiple participants, however, are more complicated. 121 SIP can support many models of multi-party communications. One, 122 referred to as loosely coupled conferences, makes use of multicast 123 media groups. In the loosely coupled model, there is no signaling 124 relationship between participants in the conference. There is no 125 central point of control or conference server. Participation is 126 gradually learned through control information that is passed as part 127 of the conference (using the Real Time Control Protocol (RTCP) [2], 128 for example). Loosely coupled conferences are easily supported in 129 SIP by using multicast addresses within its session descriptions. 131 In another model, referred to as fully distributed multiparty 132 conferencing, each participant maintains a signaling relationship 133 with each other participant, using SIP. There is no central point of 134 control; it is completely distributed amongst the participants. This 135 model is outside the scope of this document. 137 In another model, sometimes referred to as the tightly coupled 138 conference, there is a central point of control. Each participant 139 connects to this central point. It provides a variety of conference 140 functions, and may possibly perform media mixing functions as well. 141 Tightly coupled conferences are not directly addressed by RFC 3261, 142 although basic participation is possible without any additional 143 protocol support. 145 This document is one of a series of specifications that discusses 146 tightly coupled conferences. Here, we present the overall framework 147 for tightly coupled conferencing, referred to simply as 148 "conferencing" from this point forward. This framework presents a 149 general architectural model for these conferences, presents 150 terminology used to discuss such conferences, and describes the sets 151 of protocols involved in a conference. The aim of the framework is 152 to meet the general requirements for conferencing that are outlined 153 in [3]. 155 2. Terminology 156 Conference: Conference is an overused term which has different 157 meanings in different contexts. In SIP, a conference is an 158 instance of a multi-party conversation. Within the context of 159 this specification, a conference is always a tightly coupled 160 conference. 161 Loosely Coupled Conference: A loosely coupled conference is a 162 conference without coordinated signaling relationships amongst 163 participants. Loosely coupled conferences frequently use 164 multicast for distribution of conference memberships. 165 Tightly Coupled Conference: A tightly coupled conference is a 166 conference in which a single user agent, referred to as a focus, 167 maintains a dialog with each participant. The focus plays the 168 role of the centralized manager of the conference, and is 169 addressed by a conference URI. 170 Focus: The focus is a SIP user agent that is addressed by a 171 conference URI and identifies a conference (recall that a 172 conference is a unique instance of a multi-party conversation). 173 The focus maintains a SIP signaling relationship with each 174 participant in the conference. The focus is responsible for 175 ensuring, in some way, that each participant receives the media 176 that make up the conference. The focus also implements conference 177 policies. The focus is a logical role. 178 Conference URI: A URI, usually a SIP URI, which identifies the focus 179 of a conference. 180 Participant: The software element that connects a user or automata to 181 a conference. It implements, at a minimum, a SIP user agent, but 182 may also include a conference policy control protocol client, for 183 example. 184 Conference Notification Service: A conference notification service is 185 a logical function provided by the focus. The focus can act as a 186 notifier [4], accepting subscriptions to the conference state, and 187 notifying subscribers about changes to that state. The state 188 includes the state maintained by the focus itself, the conference 189 policy, and the media policy. 190 Conference Policy Server: A conference policy server is a logical 191 function which can store and manipulate the conference policy. 192 The conference policy is the overall set of rules governing 193 operation of the conference. It is broken into membership policy 194 and media policy. Unlike the focus, there is not an instance of 195 the conference policy server for each conference. Rather, there 196 is an instance of the membership and media policies for each 197 conference. 198 Conference Policy: The complete set of rules for a particular 199 conference manipulated by the conference policy server. It 200 includes the membership policy and the media policy. There is an 201 instance of conference policy for each conference. 203 Membership Policy: A set of rules manipulated by the conference 204 policy server regarding participation in a specific conference. 205 These rules include directives on the lifespan of the conference, 206 who can and cannot join the conference, definitions of roles 207 available in the conference and the responsibilities associated 208 with those roles, and policies on who is allowed to request which 209 roles. 210 Media Policy: A set of rules manipulated by the conference policy 211 server regarding the media composition of the conference. The 212 media policy is used by the focus to determine the mixing 213 characteristics for the conference. The media policy includes 214 rules about which participants receive media from which other 215 participants, and the ways in which that media is combined for 216 each participant. In the case of audio, these rules can include 217 the relative volumes at which each participant is mixed. In the 218 case of video, these rules can indicate whether the video is 219 tiled, whether the video indicates the loudest speaker, and so on. 220 Conference Policy Control Protocol (CPCP): The protocol used by 221 clients to manipulate the conference policy. 222 Mixer: A mixer receives a set of media streams of the same type, and 223 combines their media in a type-specific manner, redistributing the 224 result to each participant. This includes media transported using 225 RTP \cite{rfc1889}. As a result, the term defined here is a 226 superset of the mixer concept defined in RFC 1889, since it allows 227 for non-RTP-based media such as instant messaging sessions [5]. 228 Conference-Unaware Participant: A conference-unaware participant is a 229 participant in a conference that is not aware that it is actually 230 in a conference. As far as the UA is concerned, it is a 231 point-to-point call. 232 Cascaded Conferencing: A mechanism for group communications in which 233 a set of conferences are linked by having their focuses interact 234 in some fashion. 235 Simplex Cascaded Conferences: a group of conferences which are linked 236 such that the user agent which represents the focus of one 237 conference is a conference-unaware participant in another 238 conference. 239 Conference-Aware Participant: A conference-aware participant is a 240 participant in a conference that has learned, through automated 241 means, that it is in a conference, and that can use a conference 242 policy control protocol, media policy control protocol, or 243 conference subscription, to implement advanced functionality. 244 Conference Server: A conference server is a physical server which 245 contains, at a minimum, the focus. It may also include a 246 conference policy server and mixers. 247 Mass Invitation: A conference policy control protocol request to 248 invite a large number of users into the conference. 250 Mass Ejection: A conference policy control protocol request to remove 251 a large number of users from the conference. 252 Sidebar: A sidebar appears to the users within the sidebar as a 253 "conference within the conference". It is a conversation amongst 254 a subset of the participants to which the remaining participants 255 are not privy. 256 Anonymous Participant: An anonymous participant is one that is known 257 to other participants through the conference notification service, 258 but whose identity is being withheld. 259 Hidden Participant: A hidden participant is one that is not known to 260 other participants in the conference. They may be known to the 261 moderator, depending on conference policy. 263 3. Overview of Conferencing Architecture 265 +-----------+ 266 | | 267 | | 268 |Participant| 269 | 4 | 270 | | 271 +-----------+ 272 | 273 |SIP 274 |Dialog 275 |4 276 | 277 +-----------+ +-----------+ +-----------+ 278 | | | | | | 279 | | | | | | 280 |Participant|-----------| Focus |------------|Participant| 281 | 1 | SIP | | SIP | 3 | 282 | | Dialog | | Dialog | | 283 +-----------+ 1 +-----------+ 3 +-----------+ 284 | 285 | 286 |SIP 287 |Dialog 288 |2 289 | 290 +-----------+ 291 | | 292 | | 293 |Participant| 294 | 2 | 295 | | 296 +-----------+ 298 Figure 1 300 The central component (literally) in a SIP conference is the focus. 301 The focus maintains a SIP signaling relationship with each 302 participant in the conference. The result is a star topology, shown 303 in Figure Figure 1. 305 The focus is responsible for making sure that the media streams which 306 constitute the conference are available to the participants in the 307 conference. It does that through the use of one or more mixers, each 308 of which combines a number of input media streams to produce one or 309 more output media streams. The focus uses the media policy to 310 determine the proper configuration of the mixers. 312 The focus has access to the conference policy (composed of the 313 membership and media policies), an instance of which exist for each 314 conference. Effectively, the conference policy can be thought of as 315 a database which describes the way that the conference should 316 operate. It is the responsibility of the focus to enforce those 317 policies. Not only does the focus need read access to the database, 318 but it needs to know when it has changed. Such changes might result 319 in SIP signaling (for example, the ejection of a user from the 320 conference using BYE), and most changes will require a notification 321 to be sent to subscribers using the conference notification service. 323 The conference is represented by a URI, which identifies the focus. 324 Each conference has a unique focus and a unique URI identifying that 325 focus. Requests to the conference URI are routed to the focus for 326 that specific conference. 328 Users usually join the conference by sending an INVITE to the 329 conference URI. As long as the conference policy allows, the INVITE 330 is accepted by the focus and the user is brought into the conference. 331 Users can leave the conference by sending a BYE, as they would in a 332 normal call. 334 Similarly, the focus can terminate a dialog with a participant, 335 should the conference policy change to indicate that the participant 336 is no longer allowed in the conference. A focus can also initiate an 337 INVITE, should the conference policy indicate that the focus needs to 338 bring a participant into the conference. 340 The notion of a conference-unaware participant is important in this 341 framework. A conference-unaware participant does not even know that 342 the UA it is communicating with happens to be a focus. As far as 343 it's concerned, its a UA just like any other. The focus, of course, 344 knows that its a focus, and it performs the tasks needed for the 345 conference to operate. 347 Conference-unaware participants have access to a good deal of 348 functionality. They can join and leave conferences using SIP, and 349 obtain more advanced features through stimulus signaling, as 350 discussed in [6]. However, if the participant wishes to explicitly 351 control aspects of the conference using functional signaling 352 protocols, the participant must be conference-aware. 354 ..................................... 355 . . 356 . . 357 . . 358 . . 359 . Conference . 360 . Policy . 361 Conference . . 362 Policy . +-----------+ //-----\\ . 363 Control . | | || || . 364 Protocol . | Conference| \\-----// . 365 +---------------->| Policy | | | . 366 | . | Server |----> |Membership . 367 | . | | | | . 368 | . +-----------+ | & | . 369 | . | | . 370 | . | Media | . 371 +-----------+ . +-----------+ | Policy| . 372 | | . | | \ // . 373 | | . | | \-----/ . 374 |Participant|<--------->| Focus | | . 375 | | SIP . | | | . 376 | | Dialog . | |<-----------+ . 377 +-----------+ . |...........| . 378 ^ . | Conference| . 379 | . |Notification . 380 +------------>| Service | . 381 Subscription. +-----------+ . 382 . . 383 . . 384 . . 385 . . 386 ..................................... 388 Conference 389 Functions 391 Figure 2 393 A conference-aware participant is one that has access to advanced 394 functionality through additional protocol interfaces. The client 395 uses these protocols to interact with the conference policy server 396 and the focus. A model for this interaction is shown in Figure 397 Figure 2. The participant can interact with the focus using 398 extensions, such as REFER, in order to access enhanced call control 399 functions [7]. The participant can SUBSCRIBE to the conference URI, 400 and be connected to the conference notification service provided by 401 the focus. Through this mechanism, it can learn about changes in 402 participants (effectively, the state of the dialogs), the media 403 policy, and the membership policy. 405 The participant can communicate with the conference policy server 406 using a conference policy control protocol. Through this protocol, 407 it can affect the conference policy. The conference policy server 408 need not be available in any particular conference, although there is 409 always a conference policy. 411 The interfaces between the focus and the conference policy, and the 412 conference policy server and the conference policy, are not subject 413 to standardization at the time of this writing. They are intended 414 primarily to show the logical roles involved in a conference, as 415 opposed to suggesting a physical decomposition. The separation of 416 these functions is documented here to encourage clarity in the 417 requirements and to allow individual implementations the flexibility 418 to compose a conferencing system in a scalable and robust manner. 420 3.1 Usage of URIs 422 It is fundamental to this framework that a conference is uniquely 423 identified by a URI, and that this URI identifies the focus which is 424 responsible for the conference. The conference URI is unique, such 425 that no two conferences have the same conference URI. A conference 426 URI is always a SIP or SIPS URI. 428 The conference URI is opaque to any participants which might use it. 429 There is no way to look at the URI, and know for certain whether it 430 identifies a focus, as opposed to a user or an interface on a PSTN 431 gateway. This is in line with the general philosophy of URI usage 432 [8]. However, contextual information surrounding the URI (for 433 example, SIP header parameters) may indicate that the URI represents 434 a conference. 436 When a SIP request is sent to the conference URI, that request is 437 routed to the focus, and only to the focus. The element or system 438 that creates the conference URI is responsible for guaranteeing this 439 property. 441 The conference URI can represent a long-lived conference or interest 442 group, such as "sip:discussion-on-dogs@example.com". The focus 443 identified by this URI would always exist, and always be managing the 444 conference for whatever participants are currently joined. Other 445 conference URIs can represent short-lived conferences, such as an 446 ad-hoc conference. 448 Ideally, a conference URI is never constructed or guessed by a user. 449 Rather, conference URIs are learned through many mechanisms. A 450 conference URI can be emailed or sent in an instant message. A 451 conference URI can be linked on a web page. A conference URI can be 452 obtained from a conference policy control protocol, which can be used 453 to create conferences and the policies associated with them. 455 To determine that a SIP URI does represent a focus, standard 456 techniques for URI capability discovery can be used. Specifically, 457 the callee capabilities specification [9] provides the "isfocus" 458 feature tag to indicate that the URI is a focus. Caller preferences 459 parameters are also used to indicate that a focus supports the 460 conference notification service. This is done by declaring support 461 for the SUBSCRIBE method and the relevant package(s) in the caller 462 preferences feature parameters associated with the conference URI. 464 The other functions in a conference are also represented by URIs. If 465 the conference policy server is implemented through web pages, this 466 server is identified by HTTP URIs. If it is accessed using an 467 explicit protocol, it is a URI defined for that protocol. 469 Starting with the conference URI, the URIs for the other logical 470 entities in the conference can be learned using the conference 471 notification service. 473 4. Functions of the Elements 475 This section gives a more detailed description of the functions 476 typically implemented in each of the elements. 478 4.1 Focus 480 As its name implies, the focus is the center of the conference. All 481 participants in the conference are connected to it by a SIP dialog. 482 The focus is responsible for maintaining the dialogs connected to it. 483 It ensures that the dialogs are connected to a set of participants 484 who are allowed to participate in the conference, as defined by the 485 membership policy. The focus also uses SIP to manipulate the media 486 sessions, in order to make sure each participant obtains all the 487 media for the conference. To do that, the focus makes use of mixers. 489 When a focus receives an INVITE, it checks the membership policy. 490 The membership policy might indicate that this participant is not 491 allowed to join, in which case the call can be rejected. It might 492 indicate that another participant, acting as a moderator, needs to 493 approve this new participant. In that case, the INVITE might be 494 parked on a music-on-hold server, or a 183 response might be sent to 495 indicate progress. A notification, using the conference notification 496 service, would be sent to the moderator. The moderator then has the 497 ability to manipulate the policies using the conference policy 498 control protocol. If the policies are changed to allow this new 499 participant, the focus can accept the INVITE (or unpark it from the 500 music-on-hold server). The interpretation of the membership policy 501 by the focus is, itself, a matter of local policy, and not subject to 502 standardization. 504 If a participant manipulated the membership policy to indicate that a 505 certain other participant was no longer allowed in the conference, 506 the focus would send a BYE to that other participant to remove them. 507 This is often referred to as "ejecting" a user from the conference. 508 The process of ejecting fundamentally constitutes these two steps - 509 the establishment of the policy through the conference policy 510 protocol, and the implementation of that policy (using a BYE) by the 511 focus. 513 Similarly, if a user manipulated the membership policy to indicate 514 that a number of users need to be added to the conference, the focus 515 would send an INVITE to those participants. This is often referred 516 to as the "mass invitation" function. As with ejection, it is 517 fundamentally composed of the policy functions that specify the 518 participants which should be present, and the implementation of those 519 functions. A policy request to add a set of users might not require 520 an INVITE to execute it; those users might already be participants in 521 the conference. 523 A similar model exists for media policy. If the media policy 524 indicates that a participant should not receive any video, the focus 525 might implement that policy by sending a re-INVITE, removing the 526 media stream to that participant. Alternatively, if the video is 527 being centrally mixed, it could inform the mixer to send a black 528 screen to that participant. The means by which the policy is 529 implemented are not subject to specification. 531 4.2 Conference Policy Server 533 The conference policy server allows clients to manipulate and 534 interact with the conference policy. The conference policy is used 535 by the focus to make authorization decisions and guide its overall 536 behavior. Logically speaking, there is a one-to-one mapping between 537 a conference policy and a focus. 539 The conference policy is represented by a URI. There is a unique 540 conference policy for each conference. The conference policy URI 541 points to a conference policy server which can manipulate that 542 conference policy. A conference policy server also has a "top level" 543 URI which can be used to access functions that are independent of any 544 conference. Perhaps the most important of these functions is the 545 creation of a new conference. Creation of a new conference will 546 result in the construction of a new focus and a corresponding 547 conference URI, which can then be used to join the conference itself, 548 along with a media policy and conference policy. 550 The conference policy server is accessed using a client-server 551 transactional protocol. The client can be a participant in the 552 conference, or it can be a third party. Access control lists for who 553 can modify a conference policy are themselves part of the conference 554 policy. 556 The conference policy server is responsible for reconciliation of 557 potentially conflicting requests regarding the policy for the 558 conference. 560 The client of the conference policy control protocol can be any 561 entity interested in manipulating the conference policy. Clearly, 562 participants might be interested in manipulating them. A participant 563 might want to raise or lower the volume for one of the other 564 participants it is hearing. Or, a participant might want to add a 565 user to the conference. 567 A client of the conference policy protocol could also be another 568 server whose job is to determine the conference policy. As an 569 example, a floor control server is responsible for determining which 570 participant(s) in a conference are allowed to speak at any given 571 time, based on participant requests and access rules. The floor 572 control server would act as a client of the conference policy server, 573 and change the media policy based on who is allowed to speak. 575 The client of the conference policy control protocol could also be 576 another conference policy server. 578 4.3 Mixers 580 A mixer is responsible for combining the media streams that make up 581 the conference, and generating one or more output streams that are 582 distributed to recipients (which could be participants or other 583 mixers). The process of combining media is specific to the media 584 type, and is directed by the focus, under the guidance of the rules 585 described in the media policy. 587 A mixer is not aware of a "conference" as an entity, per se. A mixer 588 receives media streams as inputs, and based on directions provided by 589 the focus, generates media streams as outputs. There is no grouping 590 of media streams beyond the policies that describe the ways in which 591 the streams are mixed. 593 A mixer is always under the control of a focus. The focus is 594 responsible for interpreting the media policy, and then installing 595 the appropriate rules in the mixer. If the focus is directly 596 controlling a mixer, the mixer can either be co-resident with the 597 focus, or can be controlled through some kind of protocol. 599 However, a focus need not directly control a mixer. Rather, a focus 600 can delegate the mixing to the participants, each of which has their 601 own mixer. This is described in Section Section 6.4. 603 4.4 Conference Notification Service 605 The focus can provide a conference notification service. In this 606 role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts 607 subscriptions from clients for the conference URI, and generates 608 notifications to them as the state of the conference changes. 610 This state is composed of two separate pieces. The first is the 611 state of the focus and the second is the conference policy. A 612 subscriber to the conference notification service can use 613 capabilities defined in the SIP events framework [4] to request that 614 it receive focus state changes only, conference policy changes only, 615 or both. 617 The state of the focus includes the participants connected to the 618 focus, and information about the dialogs associated with them. As 619 new participants join, this state changes, and is reported through 620 the notification service. Similarly, when someone leaves, this state 621 also changes, allowing subscribers to learn about this fact. 623 As described previously, the conference policy includes the 624 membership policy and the media policy. As those policies change, 625 due to usage of the CPCP, direct change by the focus, or through an 626 application, the conference notification service informs subscribers 627 of these changes. 629 4.5 Participants 631 A participant in a conference is any SIP user agent that has a dialog 632 with the focus. This SIP user agent can be a PC application, a SIP 633 hardphone, or a PSTN gateway. It can also be another focus. A 634 conference which has a participant that is the focus of another 635 conference is called a simplex cascaded conference. They can also be 636 used to provide scalable conferences where there are regional 637 sub-conferences, each of which is connected to the main conference. 639 4.6 Conference Policy 641 The conference policy contains the rules that guide the operation of 642 the focus. The rules can be simple, such as an access list that 643 defines the set of allowed participants in a conference. The rules 644 can also be incredibly complex, specifying time-of-day based rules on 645 participation conditional on the presence of other participants. It 646 is important to understand that there is no restriction on the type 647 of rules that can be encapsulated in a conference policy. 649 The conference policy can be manipulated using web applications or 650 voice applications. It can also be manipulated with proprietary 651 protocols. However, the conference policy control protocol can be 652 used as a standardized means of manipulating the conference policy. 653 By the nature of conference policies, not all aspects of the policy 654 can be manipulated with the conference policy control protocol. 656 The conference policy includes the membership policy and the media 657 policy. The membership policy includes per-participant policies that 658 specify how the focus is to handle a particular participant. These 659 include whether or not the participant is anonymous, for example. 661 The media policy describes the way in which the set of inputs to a 662 mixer are combined to generate the set of outputs. Media policies 663 can span media types. In other words, the policy on how one media 664 stream is mixed can be based on characteristics of other media 665 streams. Media policies can be based on any quantifiable 666 characteristic of the media stream (its source, volume, codecs, 667 speaking/silence, etc.), and they can be based on internal or 668 external variables accessible by the media policy. 670 Some examples of media policies include: 671 o The video output is the picture of the loudest speaker (video 672 follows audio). 673 o The audio from each participant will be mixed with equal weight, 674 and distributed to all other participants. 675 o The audio and video that is distributed is the one selected by the 676 floor control server. 678 5. Common Operations 680 There are a large number of ways in which users can interact with a 681 conference. They can join, leave, set policies, approve members, and 682 so on. This section is meant as an overview of the major 683 conferencing operations, summarizing how they operate. More detailed 684 examples of the SIP mechanisms can be found in [7]. 686 5.1 Creating Conferences 688 There are many ways in which a conference can be created. The 689 creation of a conference actually constructs several elements all at 690 the same time. It results in the creation of a focus and a 691 conference policy. It also results in the construction of a 692 conference URI, which uniquely identifies the focus. Since the 693 conference URI needs to be unique, the element which creates 694 conferences is responsible for guaranteeing that uniqueness. This 695 can be accomplished deterministically, by keeping records of 696 conference URIs, or by generating URIs algorithmically, or 697 probabilistically, by creating random URI with sufficiently low 698 probabilities of collision. 700 When a media and conference policy are created, they are established 701 with default rules that are implementation dependent. If the creator 702 of the conference wishes to change those rules, they would do so 703 using the conference policy control protocol (CPCP), for example. 705 Of course, using the CPCP requires that an element know the URI for 706 manipulating the policy. That requires a means to learn the 707 conference policy URI from the conference URI, since the conference 708 URI is frequently the sole result returned to the client as a result 709 of conference creation. Any other URIs associated with the 710 conference are learned through the conference notification service. 711 They are carried as elements in the notifications. 713 5.1.1 SIP Mechanisms 715 SIP can be used to create conferences hosted in a central server by 716 sending an INVITE to a conferencing application that would 717 automatically create a new conference and then place a user into it. 719 Creation of conferences where the focus resides in an endpoint 720 operates differently. There, the endpoint itself creates the 721 conference URI, and hands it out to other endpoints which are to be 722 the participants. What differs from case to case is how the endpoint 723 decides to create a conference. 725 One important case is the ad-hoc conference described in Section 6.2. 727 There, an endpoint unilaterally decides to create the conference 728 based on local policy. The dialogs that were connected to the UA are 729 migrated to the endpoint-hosted focus, using a re-INVITE to pass the 730 conference URI to the newly joined participants. 732 Alternatively, one UA can ask another UA to create an endpoint-hosted 733 conference. This is accomplished with the SIP Join header [10]. The 734 UA which receives the Join header in an invitation may need to create 735 a new conference URI (a new one is not needed if the dialog that is 736 being joined is already part of a conference). The conference URI is 737 then handed to the recently joined participants through a re-INVITE. 739 5.1.2 CPCP Mechanisms 741 Another way to create a conference is through interaction with the 742 conference policy server. Using the conference policy control 743 protocol, a client can instruct the conference policy server to 744 create a new conference and return the conference URI and conference 745 policy URI. 747 5.1.3 Non-Automated Mechanisms 749 One way to create a conference is through interaction with an IVR 750 application. The user would send a SIP INVITE to the conferencing 751 application. This application would interact with the user, collect 752 information about the desired conference, and create it. The user 753 can then be placed into their newly created conference. 755 Of course, a user can also create conferences by interacting with a 756 web server. The web server would prompt the user for the neccessary 757 information (start and stop times of the conference, participants, 758 etc.) and return the conference URI to the user. The user would copy 759 this URI into their SIP phone, and send it an INVITE in order to join 760 the newly-created conference. 762 5.2 Adding Participants 764 There are many mechanisms for adding participants to a conference. 765 These include SIP, the conference policy control protocol, and 766 non-automated means. In all cases, participant additions can be 767 first party (a user adds themself) or third party (a user adds 768 another user). 770 5.2.1 SIP Mechanisms 772 First person additions using SIP are trivially accomplished with a 773 standard INVITE. A participant can send an INVITE request to the 774 conference URI, and if the conference policy allows them to join, 775 they are added to the conference. 777 If a UA does not know the conference URI, but has learned about a 778 dialog which is connected to a conference (by using the dialog event 779 package, for example [11]), the UA can join the conference by using 780 the Join header to join the dialog. 782 Third party additions with SIP are done using REFER [12]. The client 783 can send a REFER request to the participant, asking them to send an 784 INVITE request to the conference URI. Additionally, the client can 785 send a REFER request to the focus, asking it to send an INVITE to the 786 participant. The latter technique has the benefit of allowing a 787 client to add a conference-unaware participant that does not support 788 the REFER method. 790 5.2.2 CPCP Mechanisms 792 A basic function of the conference policy control protocol is to add 793 participants. A client of the protocol can specify any SIP URI 794 (which may identify themself) that is to be added. If the URI does 795 not identify a user that is already a participant in the conference, 796 the focus will send an INVITE to that URI in order to add them in. 798 5.2.3 Non-Automated Mechanisms 800 There are countless non-automated means for asking a participant to 801 join the conference. Generally, they involve conveying the 802 conference URI to the desired participant, so that they can send an 803 INVITE to it. These mechanisms all require some kind of human 804 interaction. 806 As an example, a user can send an instant message [13] to the third 807 party, containing an HTML document which requests the user to click 808 on the hyperlink to join the conference: 810 811 Hey, would you like to join 812 the conference now? 813 815 5.3 Conditional Joins 817 In many cases, a new participant will not wish to join the conference 818 unless they can join with a particular set of policies. As an 819 example, a participant may want to join anonymously, so that other 820 participants know that someone has joined, but not who. To 821 accomplish this, the conference policy control protocol is used to 822 establish these policies prior to the generation or acceptance of an 823 invitation to the conference. For example, if a user wishes to join 824 a conference with a known conference URI, the user would obtain the 825 URI for the conference policy, manipulate the policy to set themself 826 as an anonymous participant, and then actually join the conference by 827 sending an INVITE request to the conference URI. 829 5.4 Removing Participants 831 As with additions, there are several mechanisms for departures. 832 These include SIP mechanisms and CPCP mechanisms. Removals can also 833 be first person or third person. 835 5.4.1 SIP Mechanisms 837 First person departures are trivially accomplished by sending a BYE 838 request to the focus. This terminates the dialog with the focus and 839 removes the participant from the conference. 841 Third person departures can also be done using SIP, through the REFER 842 method. 844 5.4.2 CPCP Mechanisms 846 The CPCP can be used by a client to remove any participant (including 847 themself). When CPCP is used for this purpose, the focus will send a 848 BYE request to the participant that is being removed. The focus will 849 execute any other signaling that is needed to remove them (for 850 example, manipulate other dialogs in order to manage the change in 851 media streams). 853 The conference policy control protocol can also be used to remove a 854 large number of users. This is generally referred to as mass 855 ejection. 857 5.4.3 Non-Automated Mechanisms 859 As with the other common conferencing functions, there are many 860 non-automated ways to remove a participant. The identity of the 861 participant can be entered into a web form. When the user clicks 862 submit, the focus sends a BYE to that participant, removing them from 863 the conference. Alternatively, the conference can expose an IM 864 interface, where the user can send an IM to the conference saying 865 "remove Bob", causing the conference server to remove Bob. 867 5.5 Approving Policy Changes 868 OPEN ISSUE: The basic mechanism described here depends on the 869 actual protocols used for conference and media policy 870 manipulation. If the protocol itself provides change 871 notifications, sip-events may not be needed for that purpose. 872 Thus, this description here is tentative. 874 A conference policy for a particular conference may designate one or 875 more users as moderators for some set of media policy or conference 876 policy change requests. This means that those moderators need to 877 approve the specific policy change. Typically, moderators are used 878 to approve member additions and removals. However, the framework 879 allows for moderators to be associated with any policy change that 880 can be made. 882 Moderating a policy request is done using a combination of the 883 conference notification service and the CPCP protocol. 885 First, a client makes a policy change. This can be directly, using 886 the CPCP, or indirectly. An indirect policy change request is any 887 non-CPCP action that requires approval. The simplest example is an 888 INVITE to the focus from a new participant. That represents a 889 request to change the membership of the conference. From a 890 moderation perspective, it is handled identically to the case where a 891 client used the CPCP to request that the same user to be added to the 892 conference. 894 Part of the conference policy itself may designate any policy change 895 as moderated. This means that they change cannot be performed by the 896 client directly. As a result, the CPCP request will be answered with 897 a response saying that the action will be done pending authorization. 898 That completes the CPCP transaction. In the case of a policy change 899 requested indirectly through some other means, the behavior depends 900 on the mechanism. For example, if a user sends a SIP INVITE request 901 to the conference in order to join, and that join request is 902 moderated, the focus would normally accept it and play music-on-hold 903 until the request is approved. 905 Even though the CPCP transaction failed, it does result in a change 906 in internal state. Specifically, the requested change shows up as a 907 "pending" state within the media and conference policies. This means 908 that the change has been requested, but has not taken effect. It is 909 almost a form of change request history. However, because it is a 910 state change, it is something that can result in notifications 911 through the conference notification service. 913 Therefore, in order to moderate requests, the moderator subscribes to 914 the conference policy notification service. Normally, the 915 notifications from the focus do not reflect pending state changes. 916 That is, the service will not normally send a notification informing 917 a subscriber that a policy change request was made and failed due to 918 lack of authorization. However, notifications to the moderator do 919 reflect these changes. That is because the policy of the focus is to 920 inform moderators, and only moderators, of these changes. Indeed, 921 different users can be moderators for different parts of the 922 conference and media policies. For example, one user can be a 923 moderator for membership changes, and another, a moderator for 924 whether users can be anonymously joined or not. 926 There are two ways that the focus knows whether a subscriber to the 927 conference notification service is a moderator. The first is 928 configured policy (once again through CPCP). That policy can specify 929 that a particular user is the moderator for a particular piece of 930 policy. Therefore, if that user subscribes to the conference 931 notification service, any notification sent to that user will include 932 pending changes to that piece of policy. As an alternative, a 933 SUBSCRIBE request from a user can include a filter [14] that requests 934 receipt of these pending state changes. If the conference policy 935 allows, that request is honored, and the subscriber will receive 936 notifications about pending state changes. 938 Once the moderator receives a notification about the pending state 939 change, they use the CPCP to implement their decision. If the 940 moderator decides to approve the change, they use the CPCP or MPCP to 941 actually perform the change themselves. Since the moderator for a 942 piece of policy is allowed to change that piece of policy, by 943 definition, their change is accepted and performed. If the moderator 944 decides to reject the change, they use the CPCP to remove the pending 945 state from the database. 947 The pending state persists in the database for a period of time which 948 is, itself, part of the conference policy. If the moderator does not 949 either approve or reject the change, the pending state eventually 950 disappears, as if the change was explicitly rejected. 952 If the pending state is approved, a real change to the conference or 953 media policy takes place, and this change will be reflected in the 954 conference notification service. In this way, if a client makes a 955 policy change, and their request is rejected because they are not 956 authorized, the client can subscribe to the conference notification 957 service to learn if their change is eventually approved or rejected. 959 This general mechanism for moderating policy requests is consistent 960 with the moderation of presence subscriptions [15][16]. 962 5.6 Creating Sidebars 964 A sidebar is a "conference within a conference", allowing a subset of 965 the participants to converse amongst themselves. Frequently, 966 participants in a sidebar will still receive media from the main 967 conference, but "in the background". For audio, this may mean that 968 the volume of the media is reduced, for example. 970 A sidebar is represented by a separate conference URI. This URI is a 971 type of "alias" for the main conference URI. Both route to the same 972 focus. Like any other conference, the sidebar conference URI has a 973 conference policy and a media policy associated with it. Like any 974 other conference, one can join it by sending an INVITE to this URI, 975 or ask others to join by referring them to it. However, it differs 976 from a normal conference URI in several ways. First, users in the 977 main conference do not need to establish a separate dialog to the 978 sidebar conference. The focus recognizes the sidebar as a special 979 URI, and knows to use the existing dialog to the main conference as a 980 "virtual" connection to the sidebar URI. 982 The second difference is the way in which conference and media 983 policies are implemented. If the conference policy control protocol 984 is used to add a user to a normal conference, the focus will 985 typically send an INVITE to the participant to ask them to join. For 986 a sidebar conference, it is done differently. If the conference 987 policy control protocol is used to add a user to it, and that user is 988 already part of the main conference, the focus will use the 989 conference notification service to alert the existing participant 990 that they have been asked to join the sidebar. The invited user can 991 then make use of the CPCP to formally add themselves to the sidebar. 993 5.7 Destroying Conferences 995 Conferences can be destroyed in several ways. Generally, whether 996 those means are applicable for any particular conference is a 997 component of the conference policy. 999 When a conference is destroyed, the conference and media policies 1000 associated with it are destroyed. Any attempts to read or write 1001 those policies results in a protocol error. Furthermore, the 1002 conference URI becomes invalid. Any attempts to send an INVITE to 1003 it, or SUBSCRIBE to it, would result in a SIP error response. 1005 Typically, if a conference is destroyed while there are still 1006 participants, the focus would send a BYE to those participants before 1007 actually destroying the conference. Similarly, if there were any 1008 users subscribed to the conference notification service, those 1009 subscriptions would be terminated by the server before the actual 1010 destruction. 1012 5.7.1 SIP Mechanisms 1014 There is no explicit means in SIP to destroy a conference. However, 1015 a conference may be destroyed as a by-product of a user leaving the 1016 conference, which can be done with BYE. In particular, if the 1017 conference policy states that the conference is destroyed once the 1018 last user leaves, when that user does leave (using a SIP BYE 1019 request), the conference is destroyed. 1021 5.7.2 CPCP Mechanisms 1023 The CPCP contains mechanisms for explicitly destroying a conference. 1025 5.7.3 Non-Automated Mechanisms 1027 As with conference creation, a conference can be destroyed by 1028 interacting with a web application or voice application that prompts 1029 the user for the conference to be destroyed. 1031 5.8 Obtaining Membership Information 1033 A participant in a conference will frequently wish to know the set of 1034 other users in the conference. This information can be obtained many 1035 ways. 1037 5.8.1 SIP Mechanisms 1039 The conference notification service allows a conference aware 1040 participant to subscribe to it, and receive notifications that 1041 contain the list of participants. When a new participant joins or 1042 leaves, subscribers are notified. The conference notification 1043 service also allows a user to do a "fetch" [4] to obtain the current 1044 listing. 1046 5.8.2 CPCP Mechanisms 1048 The CPCP contains mechanisms for querying for the current set of 1049 conference participants. 1051 5.8.3 Non-Automated Mechanisms 1053 Users can also interact with applications to obtain conference 1054 membership. There may be a conference web page associated with the 1055 conference, which has a link that will fetch the current list of 1056 participants and display them in the browser. Similarly, an 1057 interactive voice response application connected to the focus can be 1058 used to obtain the current membership. A user in the conference 1059 could press the pound key on their phone, and hear a listing of the 1060 current participants. 1062 5.9 Adding and Removing Media 1064 Each conference is composed of a particular set of media that the 1065 focus is managing. For example, a conference might contain a video 1066 stream and an audio stream. The set of media streams that constitute 1067 the conference can be changed by participants. When the set of media 1068 in the conference change, the focus will need to generate a re-INVITE 1069 to each participant in order to add or remove the media stream to 1070 each participant. When a media stream is being added, a participant 1071 can reject the offered media stream, in which case it will not 1072 receive or contribute to that stream. Rejection of a stream by a 1073 participant does not imply that that the stream is no longer part of 1074 the conference - just that the participant is not involved in it. 1076 There are several ways in which a media stream can be added or 1077 removed from a conference. 1079 5.9.1 SIP Mechanisms 1081 A SIP re-INVITE can be used by a participant to add or remove a media 1082 stream. This is accomplished using the standard offer/answer 1083 techniques for adding media streams to a session [17]. This will 1084 trigger the focus to generate its own re-INVITEs. 1086 5.9.2 CPCP Mechanisms 1088 The CPCP can be used to add or remove a media stream. This too will 1089 trigger the focus to generate a re-INVITE to each participant in 1090 order to affect the change. 1092 5.9.3 Non-Automated Mechanisms 1094 As with most of the other common functions, addition and removal of 1095 media streams can be accomplished with a web application or 1096 interactive voice application. 1098 5.10 Conference Announcements and Recordings 1100 Conference announcements and recordings play a key role in many real 1101 conferencing systems. Examples of such features include: 1102 o Asking a user to state their name before joining the conference, 1103 in order to support a roll call 1104 o Allowing a user to request a roll call, so they can hear who else 1105 is in the conference 1107 o Allowing a user to press some keys on their keypad in order to 1108 record the conference 1109 o Allowing a user to press some keys on their keypad in order to be 1110 connected with a human operator 1111 o Allowing a user to press some keys on their keypad to mute or 1112 unmute their line 1114 User 1 1115 +-----------+ 1116 | | 1117 | | 1118 |Participant| 1119 | 1 | 1120 | | 1121 +-----------+ 1122 |SIP 1123 |Dialog 1124 Conference |1 1125 Policy +---|--------+ 1126 User 2 Server | | | Application 1127 +-----------+ +-----------+ | CPCP ************* 1128 | | | | |-------- * * 1129 | | | | | * * 1130 |Participant|-----------| Focus |------------*Participant* 1131 | 2 | SIP | | | SIP * 4 * 1132 | | Dialog | |--+ Dialog * * 1133 +-----------+ 2 +-----------+ 4 ************* 1134 | 1135 | 1136 |SIP 1137 |Dialog 1138 |3 1139 | 1140 +-----------+ 1141 | | 1142 | | 1143 |Participant| 1144 | 3 | 1145 | | 1146 +-----------+ 1147 User 3 1149 Figure 4 1151 In this framework, these capabilities are modeled as an application 1152 which acts as a participant in the conference. This is shown 1153 pictorially in Figure 4. The conference has four participants. 1154 Three of these participants are end users, and the fourth is the 1155 announcement application. 1157 If the announcement application wishes to play an announcement to all 1158 the conference members (for example, to announce a join), it merely 1159 sends media to the mixer as would any other participant. The 1160 announcement is mixed in with the conversation and played to the 1161 participants. 1163 Similarly, the announcement application can play an announcement to a 1164 specific user by using the CPCP to configure its media policy so that 1165 the media it generates is only heard by the target user. The 1166 application then generates the desired announcement, and it will be 1167 heard only by the selected recipient. 1169 The announcement application can also receive input from a specific 1170 user through the conference. The announcement application would use 1171 the CPCP to cause in-band DTMF to be dropped from the mix, and sent 1172 only to itself. When a user wishes to invoke an operation, such as 1173 to obtain a roll call, the user would press the appropriate key 1174 sequence. That sequence would be heard only by the announcement 1175 application. Once the application determines that the user wishes to 1176 hear a roll call, it can use the CPCP to set the media policy so that 1177 media from that user is delivered only to the announcement 1178 application. This "disconnects" the user from the rest of the 1179 conference so they can interact with the application. Once the 1180 interaction is done, and announcement application uses the CPCP to 1181 "reconnect" the user to the conference. 1183 5.11 Floor Control 1185 Floor control is similar to a conference announcement application. 1186 Within this framework, floor control is managed by an application 1187 (possibly one that is not a participant) that uses the CPCP to 1188 enforce the resulting floor control decisions. 1190 [[Need more work here]] 1192 5.12 Camera and Video Controls 1193 OPEN ISSUE: Originally, I was just going to say that this is 1194 outside the scope of conferencing. But, it does impact 1195 conferencing. Effectively, camera control is treated like a media 1196 stream. The mixer would combine the various requests across 1197 participants and direct them to the appropriate device. How does 1198 that work though? In a video conference with 4 participants, the 1199 camera control needs to identify the specific user whose camera is 1200 to be controlled. That is something unique to conferencing. 1202 6. Physical Realization 1204 In this section, we present several physical instantiations of these 1205 components, to show how these basic functions can be combined to 1206 solve a variety of problems. 1208 6.1 Centralized Server 1210 In the most simplistic realization of this framework, there is a 1211 single physical server in the network which implements the focus, the 1212 conference policy server, and the mixers. This is the classic "one 1213 box" solution, shown in Figure 5. 1215 Conference Server 1216 ................................... 1217 . . 1218 . +------------+ . 1219 . | Conference | . 1220 . |Notification| . 1221 . | Server | . 1222 . +------------+ . 1223 . +----------+ . 1224 . |Conference| +-----+ . 1225 . | Policy | +-------+ +-----+| . 1226 . | Server | | Focus | |Mixer|+ . 1227 . +----------+ +-------+ +-----+ . 1228 ................//.\.....***....... 1229 // \ *** * 1230 // *** * RTP 1231 SIP // *** \ * 1232 // *** \SIP * 1233 // *** RTP \ * 1234 / ** \ * 1235 +-----------+ +-----------+ 1236 |Participant| |Participant| 1237 +-----------+ +-----------+ 1239 Figure 5 1241 6.2 Endpoint Server 1243 Another important model is that of a locally-mixed ad-hoc conference. 1244 In this scenario, two users (A and B) are in a regular point-to-point 1245 call. One of the participants (A) decides to conference in a third 1246 participant, C. To do this, A begins acting as a focus. Its 1247 existing dialog with B becomes the first dialog attached to the 1248 focus. A would re-INVITE B on that dialog, changing its Contact URI 1249 to a new value which identifies the focus. In essence, A "mutates" 1250 from a single-user UA to a focus plus a single user UA, and in the 1251 process of such a mutation, its URI changes. Then, the focus makes 1252 an outbound INVITE to C. When C accepts, it mixes the media from B 1253 and C together, redistributing the results. The mixed media is also 1254 played locally. Figure 6 shows a diagram of this transition. 1256 B B 1257 +------+ +------+ 1258 | | | | 1259 | UA | | UA | 1260 | | | | 1261 +------+ +------+ 1262 | . | . 1263 | . | . 1264 | . | . 1265 | . Transition | . 1266 | . ------------> | . 1267 SIP| .RTP SIP| .RTP 1268 | . | . 1269 | . | . 1270 | . | . 1271 | . | . 1272 | . +----------+ 1273 +------+ | +------+ | SIP +------+ 1274 | | | |Focus | |----------| | 1275 | UA | | |C.Pol.| | | UA | 1276 | | | |Mixers| |..........| | 1277 +------+ | | | | RTP +------+ 1278 | +------+ | 1279 A | + | C 1280 | + <..|....... 1281 | + | . 1282 | +------+ | . 1283 | |Parti-| | . 1284 | |cipant| | . 1285 | | | | . 1286 | +------+ | . 1287 +----------+ . 1288 A . 1289 . 1291 Internal 1292 Interface 1294 Figure 6 1296 It is important to note that the external interfaces in this model, 1297 between A and B, and between B and C, are exactly the same to those 1298 that would be used in a centralized server model. B could also 1299 include a conference policy server and conference notification 1300 service, allowing the participants to have access to them if they so 1301 desired. Just because the focus is co-resident with a participant 1302 does not mean any aspect of the behaviors and external interfaces 1303 will change. 1305 6.3 Media Server Component 1307 +------------+ +------------+ 1308 | App Server| SIP |Conf. Cmpnt.| 1309 | |-------------| | 1310 | Focus | Conf. Proto | Focus | 1311 | C.Pol |-------------| C.Pol | 1312 | | Media Proto | Mixers | 1313 |Notification|-------------| | 1314 | | | | 1315 +------------+ +------------+ 1316 | \ .. . 1317 | \\ RTP... . 1318 | \\ .. . 1319 | SIP \\ ... . 1320 SIP | \\ ... .RTP 1321 | ..\ . 1322 | ... \\ . 1323 | ... \\ . 1324 | .. \\ . 1325 | ... \\ . 1326 | .. \ . 1327 +-----------+ +-----------+ 1328 |Participant| |Participant| 1329 +-----------+ +-----------+ 1331 Figure 7 1333 In this model, shown in Figure 7, each conference involves two 1334 centralized servers. One of these servers, referred to as the 1335 "application server" owns and manages the membership and media 1336 policies, and maintains a dialog with each participant. As a result, 1337 it represents the focus seen by all participants in a conference. 1338 However, this server doesn't provide any media support. To perform 1339 the actual media mixing function, it makes use of a second server, 1340 called the "mixing server". This server includes a focus, and a 1341 conference policy server, but has no conference notification service. 1342 It has a default membership policy, which accepts all invitations 1343 from the top-level focus. Its conference policy server accepts any 1344 controls made by the application server. The focus in the 1345 application server uses third party call control to connect the media 1346 streams of each user to the mixing server, as needed. If the focus 1347 in the application server receives a conference policy control 1348 command from a client, it delegates that to the media server by 1349 making the same media policy control command to it. 1351 This model allows for the mixing server to be used as a resource for 1352 a variety of different conferencing applications. This is because it 1353 is unaware of any conference or media policies; it is merely a 1354 "slave" to the top-level server, doing whatever it asks. 1356 6.4 Distributed Mixing 1358 In a distributed mixed conference, there is still a centralized 1359 server which implements the focus, conference policy server, and 1360 media policy server. However, there are no centralized mixers. 1361 Rather, there are mixers in each endpoint, along with a conference 1362 policy server. The focus distributes the media by using third party 1363 call control [18] to move a media stream between each participant and 1364 each other participant. As a result, if there are N participants in 1365 the conference, there will be a single dialog between each 1366 participant and the focus, but the session description associated 1367 with that dialog will be constructed to allow media to be distributed 1368 amongst the participants. This is shown in Figure 8. 1370 +---------+ 1371 |Partcpnt | 1372 media | | media 1373 ...............| |.................. 1374 . | Mixers | . 1375 . |C.Pol.Srv| . 1376 . +---------+ . 1377 . | . 1378 . | . 1379 . | . 1380 . dialog | . 1381 . | . 1382 . | . 1383 . | . 1384 . +---------+ . 1385 . |Cnf.Srvr.| . 1386 . | | . 1387 . | Focus | . 1388 . |C.Pol.Srv| . 1389 . / | | \ . 1390 . / +---------+ \ . 1391 . / \ . 1392 . / \ . 1393 . / dialog \ . 1394 . / \ . 1395 . /dialog \ . 1396 . / \ . 1397 . / \ . 1398 . / \ . 1399 . . 1400 +---------+ +---------+ 1401 |Partcpnt | |Partcpnt | 1402 | | | | 1403 | | ......................... | | 1404 | Mixers | | Mixers | 1405 |C.Pol.Srv| media |C.Pol.Srv| 1406 +---------+ +---------+ 1408 Figure 8 1410 There are several ways in which the media can be distributed to each 1411 participant for mixing. In a multi-unicast model, each participant 1412 sends a copy of its media to each other participant. In this case, 1413 the session description manages N-1 media streams. In a multicast 1414 model, each participant joins a common multicast group, and each 1415 participant sends a single copy of its media stream to that group. 1416 The underlying multicast infrastructure then distributes the media, 1417 so that each participant gets a copy. In a single-source multicast 1418 model (SSM), each participant sends its media stream to a central 1419 point, using unicast. The central point then redistributes the media 1420 to all participants using multicast. The focus is responsible for 1421 selecting the modality of media distribution, and for handling any 1422 hybrids that would be necessitated from clients with mixed 1423 capabilities. 1425 When a new participant joins or is added, the focus will perform the 1426 necessary third party call control to distribute the media from the 1427 new participant to all the other participants, and vice-a-versa. 1429 The central conference server also includes a conference policy 1430 server. Of course, the central conference server cannot implement 1431 any of the media policies directly. Rather, it would delegate the 1432 implementation to the conference policy servers co-resident with a 1433 participant. As an example, if a participant decides to switch the 1434 overall conference mode from "voice activated" to "continuous 1435 presence", they would communicate with the central conference policy 1436 server. The conference policy server, in turn, would communicate 1437 with the conference policy servers co-resident with each participant, 1438 using the same conference policy control protocol, and instruct them 1439 to use "continuous presence". 1441 This model requires additional functionality in user agents, which 1442 may or may not be present. The participants, therefore, must be able 1443 to advertise this capability to the focus. 1445 6.5 Cascaded Mixers 1447 In very large conferences, it may not be possible to have a single 1448 mixer that can handle all of the media. A solution to this is to use 1449 cascaded mixers. In this architecture, there is a centralized focus, 1450 but the mixing function is implemented by a multiplicity of mixers, 1451 scattered throughout the network. Each participant is connected to 1452 one, and only one of the mixers. The focus uses some kind of control 1453 protocol to connect the mixers together, so that all of the 1454 participants can hear each other. 1456 +---------+ 1457 +-----------------------| |------------------------+ 1458 | ++++++++++++++++++++| |++++++++++++++++++ | 1459 | + +------| Focus |---------+ + | 1460 | + | | | | + | 1461 | + | +-| |--+ | + | 1462 | + | | +---------+ | | + | 1463 | + | | + | | + | 1464 | + | | + | | + | 1465 | + | | + | | + | 1466 | + | | +---------+ | | + | 1467 | + | | | | | | + | 1468 | + | | | Mixer 2 | | | + | 1469 | + | | | | | | + | 1470 | + | | +---------+ | | + | 1471 | + | |... . .... | | + | 1472 | + .|....| . .|.... | + | 1473 | + ...... | | . | ..|... + | 1474 | + ... | | . | | ....+ | 1475 | +---------+ | | +---------+ | | +---------+ | 1476 | | | | | | | | | | | | 1477 | | Mixer 2 | | | | Mixer 3 | | | | Mixer 4 | | 1478 | | | | | | | | | | | | 1479 | +---------+ | | +---------+ | | +---------+ | 1480 | . . | | . . | | . . | 1481 | . . | | .. . | | .. . | 1482 | . . | | . . | | . . | 1483 +---------+ . | +---------+ . | +---------+ . | 1484 | Prtcpnt | . | | Prtcpnt | . | | Prtcpnt | . | 1485 | 1 | . | | 1 | . | | 1 | . | 1486 +---------+ . | +---------+ . | +---------+ . | 1487 . | . | . | 1488 +---------+ +---------+ +---------+ 1489 | Prtcpnt | | Prtcpnt | | Prtcpnt | 1490 | 1 | | 1 | | 1 | 1491 +---------+ +---------+ +---------+ 1493 ------- SIP Dialog 1494 ....... Media Flow 1495 +++++++ Control Protocol 1497 Figure 9 1499 This architecture is shown in Figure 9. 1501 7. Security Considerations 1503 Conferences frequently require security features in order to properly 1504 operate. The conference policy may dictate that only certain 1505 participants can join, or that certain participants can create new 1506 policies. Generally speaking, conference applications are very 1507 concerned about authorization decisions. Mechanisms for establishing 1508 and enforcing such authorization rules is a central concept 1509 throughout this document. 1511 Of course, authorization rules require authentication. Normal SIP 1512 authentication mechanisms should suffice for the conference 1513 authorization mechanisms described here. 1515 Privacy is an important aspect of conferencing. Users may wish to 1516 join a conference without anyone knowing that they have joined, in 1517 order to silently listen in. In other applications, a participant 1518 may wish just to hide their identity from other participants, but 1519 otherwise let them know of their presence. These functions need to 1520 be provided by the conferencing system. 1522 8. Contributors 1524 This document is the result of discussions amongst the conferencing 1525 design team. The members of this team include: 1527 Alan Johnston 1528 Brian Rosen 1529 Rohan Mahy 1530 Henning Schulzrinne 1531 Orit Levin 1532 Roni Even 1533 Tom Taylor 1534 Petri Koskelainen 1535 Nermeen Ismail 1536 Andy Zmolek 1537 Joerg Ott 1538 Dan Petrie 1540 9. Changes from draft-ietf-sipping-conferencing-framework-00 1542 Updated references and formatting cleanup. 1544 10. Changes since draft-rosenberg-sipping-conferencing-framework-01 1545 o Clarified that the conference notification service uses a single 1546 package with some kind of filtering to select whether you get the 1547 focus or policy state. 1549 11. Changes since draft-rosenberg-sipping-conferencing-framework-00 1550 o Rework of terminology. 1551 o More details on moderating policy changes. 1552 o Rework of the overview, and in particular, a shift of focus from 1553 basic/complex conferences (a term which has been removed) to 1554 conference aware/unaware participants. 1555 o Removal of explicit reference to megaco for controlling a mixer. 1556 o Discussion of a lot more conferencing operations. 1557 o New sidebar mechanism. 1559 12 Informative References 1561 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1562 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 1563 Session Initiation Protocol", RFC 3261, June 2002. 1565 [2] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, 1566 "RTP: A Transport Protocol for Real-Time Applications", RFC 1567 3550, July 2003. 1569 [3] Levin, O., "Requirements for Tightly Coupled SIP Conferencing", 1570 draft-levin-sipping-conferencing-requirements-01 (work in 1571 progress), July 2002. 1573 [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1574 Notification", RFC 3265, June 2002. 1576 [5] Campbell, B., "The Message Session Relay Protocol", 1577 draft-ietf-simple-message-sessions-06 (work in progress), May 1578 2004. 1580 [6] Rosenberg, J., "A Framework for Application Interaction in the 1581 Session Initiation Protocol (SIP)", 1582 draft-ietf-sipping-app-interaction-framework-01 (work in 1583 progress), February 2004. 1585 [7] Johnston, A. and O. Levin, "Session Initiation Protocol Call 1586 Control - Conferencing for User Agents", 1587 draft-ietf-sipping-cc-conferencing-03 (work in progress), 1588 February 2004. 1590 [8] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform 1591 Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1592 1998. 1594 [9] Rosenberg, J., "Indicating User Agent Capabilities in the 1595 Session Initiation Protocol (SIP)", 1596 draft-ietf-sip-callee-caps-03 (work in progress), January 2004. 1598 [10] Mahy, R. and D. Petrie, "The Session Inititation Protocol (SIP) 1599 'Join' Header", draft-ietf-sip-join-03 (work in progress), 1600 February 2004. 1602 [11] Rosenberg, J. and H. Schulzrinne, "An INVITE Inititiated Dialog 1603 Event Package for the Session Initiation Protocol (SIP)", 1604 draft-ietf-sipping-dialog-package-04 (work in progress), 1605 February 2004. 1607 [12] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1608 Method", RFC 3515, April 2003. 1610 [13] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C. and 1611 D. Gurle, "Session Initiation Protocol (SIP) Extension for 1612 Instant Messaging", RFC 3428, December 2002. 1614 [14] Khartabil, H., Leppanen, E. and T. Moran, "Requirements for 1615 Presence Specific Event Notification Filtering", 1616 draft-ietf-simple-pres-filter-reqs-03 (work in progress), 1617 January 2004. 1619 [15] Rosenberg, J., "A Presence Event Package for the Session 1620 Initiation Protocol (SIP)", draft-ietf-simple-presence-10 (work 1621 in progress), January 2003. 1623 [16] Rosenberg, J., "A Watcher Information Event Template-Package 1624 for the Session Initiation Protocol (SIP)", 1625 draft-ietf-simple-winfo-package-05 (work in progress), January 1626 2003. 1628 [17] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1629 Session Description Protocol (SDP)", RFC 3264, June 2002. 1631 [18] Rosenberg, J., Peterson, J., Schulzrinne, H. and G. Camarillo, 1632 "Best Current Practices for Third Party Call Control in the 1633 Session Initiation Protocol", draft-ietf-sipping-3pcc-06 (work 1634 in progress), January 2004. 1636 Author's Address 1638 Jonathan Rosenberg 1639 dynamicsoft 1640 600 Lanidex Plaza 1641 Parsippany, NJ 07054 1642 US 1644 Phone: +1 973 952-5000 1645 EMail: jdrosen@dynamicsoft.com 1646 URI: http://www.jdrosen.net 1648 Intellectual Property Statement 1650 The IETF takes no position regarding the validity or scope of any 1651 Intellectual Property Rights or other rights that might be claimed to 1652 pertain to the implementation or use of the technology described in 1653 this document or the extent to which any license under such rights 1654 might or might not be available; nor does it represent that it has 1655 made any independent effort to identify any such rights. Information 1656 on the procedures with respect to rights in RFC documents can be 1657 found in BCP 78 and BCP 79. 1659 Copies of IPR disclosures made to the IETF Secretariat and any 1660 assurances of licenses to be made available, or the result of an 1661 attempt made to obtain a general license or permission for the use of 1662 such proprietary rights by implementers or users of this 1663 specification can be obtained from the IETF on-line IPR repository at 1664 http://www.ietf.org/ipr. 1666 The IETF invites any interested party to bring to its attention any 1667 copyrights, patents or patent applications, or other proprietary 1668 rights that may cover technology that may be required to implement 1669 this standard. Please address the information to the IETF at 1670 ietf-ipr@ietf.org. 1672 Disclaimer of Validity 1674 This document and the information contained herein are provided on an 1675 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1676 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1677 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1678 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1679 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1680 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1682 Copyright Statement 1684 Copyright (C) The Internet Society (2004). This document is subject 1685 to the rights, licenses and restrictions contained in BCP 78, and 1686 except as set forth therein, the authors retain all their rights. 1688 Acknowledgment 1690 Funding for the RFC Editor function is currently provided by the 1691 Internet Society.