| < draft-rosenberg-sipping-conferencing-framework-00.txt | draft-rosenberg-sipping-conferencing-framework-01.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force SIPPING WG | Internet Engineering Task Force SIPPING WG | |||
| Internet Draft J. Rosenberg | Internet Draft J. Rosenberg | |||
| dynamicsoft | dynamicsoft | |||
| draft-rosenberg-sipping-conferencing-framework-00.txt | draft-rosenberg-sipping-conferencing-framework-01.txt | |||
| October 28, 2002 | February 12, 2003 | |||
| Expires: April 2003 | Expires: August 2003 | |||
| A Framework for Conferencing with the Session Initiation Protocol | A Framework for Conferencing with the Session Initiation Protocol | |||
| STATUS OF THIS MEMO | STATUS OF THIS MEMO | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 42 ¶ | |||
| Abstract | Abstract | |||
| The Session Initiation Protocol (SIP) supports the initiation, | The Session Initiation Protocol (SIP) supports the initiation, | |||
| modification, and termination of media sessions between user agents. | modification, and termination of media sessions between user agents. | |||
| These sessions are managed by SIP dialogs, which represent a SIP | These sessions are managed by SIP dialogs, which represent a SIP | |||
| relationship between a pair of user agents. Because dialogs are | relationship between a pair of user agents. Because dialogs are | |||
| between pairs of user agents, SIP's usage for two-party | between pairs of user agents, SIP's usage for two-party | |||
| communications (such as a phone call), is obvious. Communications | communications (such as a phone call), is obvious. Communications | |||
| sessions with multiple participants, generally known as conferencing, | sessions with multiple participants, generally known as conferencing, | |||
| is more complicated. This document defines a framework for how such | are more complicated. This document defines a framework for how such | |||
| conferencing can occur. This framework describes the overall | conferencing can occur. This framework describes the overall | |||
| architecture, terminology, and protocol components needed for multi- | architecture, terminology, and protocol components needed for multi- | |||
| party conferencing. | party conferencing. | |||
| Table of Contents | Table of Contents | |||
| 1 Introduction ........................................ 3 | 1 Introduction ........................................ 4 | |||
| 2 Terminology ......................................... 3 | 2 Terminology ......................................... 4 | |||
| 3 Basic Architecture .................................. 7 | 3 Overview of Conferencing Architecture ............... 7 | |||
| 4 Usage of URIs ....................................... 11 | 3.1 Usage of URIs ....................................... 10 | |||
| 5 Functions of the Elements ........................... 12 | 4 Functions of the Elements ........................... 12 | |||
| 5.1 Focus ............................................... 12 | 4.1 Focus ............................................... 12 | |||
| 5.2 Conference Policy Server ............................ 13 | 4.2 Conference Policy Server ............................ 13 | |||
| 5.3 Mixers .............................................. 14 | 4.3 Mixers .............................................. 14 | |||
| 5.4 Media Policy Server ................................. 14 | 4.4 Conference Notification Service ..................... 15 | |||
| 5.5 Conference Notification Service ..................... 15 | 4.5 Participants ........................................ 15 | |||
| 5.6 Participants ........................................ 16 | 4.6 Conference Policy ................................... 15 | |||
| 5.7 Conference Policy ................................... 16 | 5 Common Operations ................................... 16 | |||
| 5.8 Media Policy ........................................ 17 | 5.1 Creating Conferences ................................ 16 | |||
| 6 Physical Realization ................................ 17 | 5.1.1 SIP Mechanisms ...................................... 17 | |||
| 6.1 Centralized Server .................................. 17 | 5.1.2 CPCP Mechanisms ..................................... 18 | |||
| 6.2 Endpoint Server ..................................... 17 | 5.1.3 Non-Automated Mechanisms ............................ 18 | |||
| 6.3 Media Server Component .............................. 18 | 5.2 Adding Participants ................................. 18 | |||
| 6.4 Distributed Mixing .................................. 21 | 5.2.1 SIP Mechanisms ...................................... 18 | |||
| 6.5 Cascaded Mixers ..................................... 22 | 5.2.2 CPCP Mechanisms ..................................... 18 | |||
| 7 Common Operations ................................... 22 | 5.2.3 Non-Automated Mechanisms ............................ 19 | |||
| 7.1 Creating Conferences ................................ 22 | 5.3 Conditional Joins ................................... 19 | |||
| 7.2 Adding Participants ................................. 25 | 5.4 Removing Participants ............................... 19 | |||
| 7.3 Removing Participants ............................... 27 | 5.4.1 SIP Mechanisms ...................................... 19 | |||
| 7.4 Approving Policy Changes ............................ 27 | 5.4.2 CPCP Mechanisms ..................................... 20 | |||
| 7.5 Creating Sidebars ................................... 28 | 5.4.3 Non-Automated Mechanisms ............................ 20 | |||
| 8 Security Considerations ............................. 28 | 5.5 Approving Policy Changes ............................ 20 | |||
| 9 Contributors ........................................ 29 | 5.6 Creating Sidebars ................................... 22 | |||
| 10 Authors Addresses ................................... 29 | 5.7 Destroying Conferences .............................. 23 | |||
| 11 Normative References ................................ 29 | 5.7.1 SIP Mechanisms ...................................... 23 | |||
| 12 Informative References .............................. 29 | 5.7.2 CPCP Mechanisms ..................................... 23 | |||
| 5.7.3 Non-Automated Mechanisms ............................ 23 | ||||
| 5.8 Obtaining Membership ................................ 24 | ||||
| 5.8.1 SIP Mechanisms ...................................... 24 | ||||
| 5.8.2 CPCP Mechanisms ..................................... 24 | ||||
| 5.8.3 Non-Automated Mechanisms ............................ 24 | ||||
| 5.9 Adding and Removing Media ........................... 24 | ||||
| 5.9.1 SIP Mechanisms ...................................... 25 | ||||
| 5.9.2 CPCP Mechanisms ..................................... 25 | ||||
| 5.9.3 Non-Automated Mechanisms ............................ 25 | ||||
| 5.10 Conference Announcements and Recordings ............. 25 | ||||
| 5.11 Floor Control ....................................... 27 | ||||
| 5.12 Camera and Video Controls ........................... 27 | ||||
| 6 Physical Realization ................................ 28 | ||||
| 6.1 Centralized Server .................................. 28 | ||||
| 6.2 Endpoint Server ..................................... 28 | ||||
| 6.3 Media Server Component .............................. 28 | ||||
| 6.4 Distributed Mixing .................................. 31 | ||||
| 6.5 Cascaded Mixers ..................................... 33 | ||||
| 7 Security Considerations ............................. 33 | ||||
| 8 Contributors ........................................ 33 | ||||
| 9 Changes since draft-rosenberg-sipping- | ||||
| conferencing-framework-00 ...................................... 35 | ||||
| 10 Authors Addresses ................................... 35 | ||||
| 11 Normative References ................................ 35 | ||||
| 12 Informative References .............................. 35 | ||||
| 1 Introduction | 1 Introduction | |||
| The Session Initiation Protocol (SIP) [1] supports the initiation, | The Session Initiation Protocol (SIP) [1] supports the initiation, | |||
| modification, and termination of media sessions between user agents. | modification, and termination of media sessions between user agents. | |||
| These sessions are managed by SIP dialogs, which represent a SIP | These sessions are managed by SIP dialogs, which represent a SIP | |||
| relationship between a pair of user agents. Because dialogs are | relationship between a pair of user agents. Because dialogs are | |||
| between pairs of user agents, SIP's usage for two-party | between pairs of user agents, SIP's usage for two-party | |||
| communications (such as a phone call), is obvious. Communications | communications (such as a phone call), is obvious. Communications | |||
| sessions with multiple participants, however, are more complicated. | sessions with multiple participants, however, are more complicated. | |||
| skipping to change at page 3, line 27 ¶ | skipping to change at page 4, line 27 ¶ | |||
| relationship between participants in the conference. There is no | relationship between participants in the conference. There is no | |||
| central point of control or conference server. Participation is | central point of control or conference server. Participation is | |||
| gradually learned through control information that is passed as part | gradually learned through control information that is passed as part | |||
| of the conference (using the Real Time Control Protocol (RTCP) [2], | of the conference (using the Real Time Control Protocol (RTCP) [2], | |||
| for example). Loosely coupled conferences are easily supported in SIP | for example). Loosely coupled conferences are easily supported in SIP | |||
| by using multicast addresses within its session descriptions. | by using multicast addresses within its session descriptions. | |||
| In another model, referred to as fully distributed multiparty | In another model, referred to as fully distributed multiparty | |||
| conferencing, each participant maintains a signaling relationship | conferencing, each participant maintains a signaling relationship | |||
| with each other participant, using SIP. There is no central point of | with each other participant, using SIP. There is no central point of | |||
| control; it is completely distributed amongst the participants. SIP | control; it is completely distributed amongst the participants. This | |||
| does not yet support this model. | model is outside the scope of this document. | |||
| In another model, sometimes referrred to as the tightly coupled | In another model, sometimes referred to as the tightly coupled | |||
| conference, there is a central point of control. Each participant | conference, there is a central point of control. Each participant | |||
| connects to this central point. It provides a variety of conference | connects to this central point. It provides a variety of conference | |||
| functions, and may possibly perform media mixing functions as well. | functions, and may possibly perform media mixing functions as well. | |||
| Tightly coupled conferences are not directly addressed by the SIP | Tightly coupled conferences are not directly addressed by RFC 3261, | |||
| specification, although basic ones are possible without any | although basic participation is possible without any additional | |||
| additional protocol support. | protocol support. | |||
| This document is one of a series of specifications that discusses | This document is one of a series of specifications that discusses | |||
| tightly coupled conferences. Here, we present the overall framework | tightly coupled conferences. Here, we present the overall framework | |||
| for tightly coupled conferencing, referred to simply as | for tightly coupled conferencing, referred to simply as | |||
| "conferencing" from this point forward. This framework presents a | "conferencing" from this point forward. This framework presents a | |||
| general architectural model for these conferences, presents | general architectural model for these conferences, presents | |||
| terminology used to discuss such conferences, and describes the sets | terminology used to discuss such conferences, and describes the sets | |||
| of protocols involved in a conference. The aim of the framework is to | of protocols involved in a conference. The aim of the framework is to | |||
| meet the general requirements for conferencing that are outlined in | meet the general requirements for conferencing that are outlined in | |||
| [3]. | [3]. | |||
| 2 Terminology | 2 Terminology | |||
| Conference: Sadly, conference is an overused term which has | Conference: Conference is an overused term which has different | |||
| different meanings in different contexts. In SIP, a | meanings in different contexts. In SIP, a conference is an | |||
| conference is an instance of a multi-party conversation. | instance of a multi-party conversation. Within the context | |||
| of this specification, a conference is always a tightly | ||||
| Within the context of this specification, a conference is | coupled conference. | |||
| always a tightly coupled conference. | ||||
| Loosely Coupled Conference: A loosely coupled conference is a | Loosely Coupled Conference: A loosely coupled conference is a | |||
| conference without coordinated signaling relationships | conference without coordinated signaling relationships | |||
| amongst participants. Loosely coupled conferences use | amongst participants. Loosely coupled conferences | |||
| multicast for distribution of conference memberships. | frequently use multicast for distribution of conference | |||
| memberships. | ||||
| Tightly Coupled Conference: A tightly coupled conference is a | Tightly Coupled Conference: A tightly coupled conference is a | |||
| conference in which a single user agent, referred to as a | conference in which a single user agent, referred to as a | |||
| focus, maintains a dialog with each participant. The focus | focus, maintains a dialog with each participant. The focus | |||
| plays the role of the centralized manager of the | plays the role of the centralized manager of the | |||
| conference, and is addressed by a conference URI. | conference, and is addressed by a conference URI. | |||
| Focus: The focus is a SIP user agent that is addressed by a | Focus: The focus is a SIP user agent that is addressed by a | |||
| conference URI. The focus maintains a SIP signaling | conference URI and identifies a conference (recall that a | |||
| conference is a unique instance of a multi-party | ||||
| conversation). The focus maintains a SIP signaling | ||||
| relationship with each participant in the conference. The | relationship with each participant in the conference. The | |||
| focus is responsible for insuring, in some way, that each | focus is responsible for ensuring, in some way, that each | |||
| participant receives the media that make up the conference. | participant receives the media that make up the conference. | |||
| The focus also implements conference policies. The focus is | The focus also implements conference policies. The focus is | |||
| a logical role. | a logical role. | |||
| Conference URI: A URI, usually a SIP URI, which identifies the | Conference URI: A URI, usually a SIP URI, which identifies the | |||
| focus of a conference. | focus of a conference. | |||
| Participants: The set of user agents, each identified by a URI, | Participant: The software element that connects a user or | |||
| which are connected to the focus for a particular | automata to a conference. It implements, at a minimum, a | |||
| conference. | SIP user agent, but may also include a conference policy | |||
| control protocol client, for example. | ||||
| Conference Notification Service: A conference notification | Conference Notification Service: A conference notification | |||
| service is a logical function provided by the focus. The | service is a logical function provided by the focus. The | |||
| focus can act as a notifier [4], accepting subscriptions to | focus can act as a notifier [4], accepting subscriptions to | |||
| the conference state, and notifying subscribers about | the conference state, and notifying subscribers about | |||
| changes to that state. The state includes the state | changes to that state. The state includes the state | |||
| maintained by the focus itself, the conference policy, and | maintained by the focus itself, the conference policy, and | |||
| the media policy. | the media policy. | |||
| Conference Policy Server: A conference policy server is a | Conference Policy Server: A conference policy server is a | |||
| logical function which can store and manipulate rules | logical function which can store and manipulate the | |||
| associated with participation in a conference. These rules | conference policy. The conference policy is the overall set | |||
| include directives on the lifespan of the conference, who | of rules governing operation of the conference. It is | |||
| can and cannot join the conference, definitions of roles | broken into membership policy and media policy. Unlike the | |||
| available in the conference and the responsibilities | focus, there is not an instance of the conference policy | |||
| associated with those roles, and policies on who is allowed | server for each conference. Rather, there is an instance of | |||
| to request which roles. The conference policy server is a | the membership and media policies for each conference. | |||
| logical role. | ||||
| Media Policy Server: A media policy server is a logical function | ||||
| which can store and manipulate rules associated with the | ||||
| media distribution of the conference. These rules can | ||||
| specify which participants receive media from which other | ||||
| participants, and the ways in which that media is combined | ||||
| for each participant. In the case of audio, these rules can | ||||
| include the relative volumes at which each participant is | ||||
| mixed. In the case of video, these rules can indicate | ||||
| whether the video is tiled, whether the video indicates the | ||||
| loudest speaker, and so on. | ||||
| Conference Policy: The set of rules manipulated by the | ||||
| conference policy server. | ||||
| Conference Policy Control Protocol: The client-server protocol | Conference Policy: The complete set of rules manipulated by the | |||
| used by clients to manipulate the conference policy. | conference policy server. It includes the membership policy | |||
| and the media policy. | ||||
| Media Policy: The set of rules manipulated by the media policy | Membership Policy: A set of rules manipulated by the conference | |||
| server. The media policy is used by the focus to determine | policy server regarding participation in the conference. | |||
| the mixing characteristics for the conference. | These rules include directives on the lifespan of the | |||
| conference, who can and cannot join the conference, | ||||
| definitions of roles available in the conference and the | ||||
| responsibilities associated with those roles, and policies | ||||
| on who is allowed to request which roles. | ||||
| Media Policy Control Protocol: The client-server protocol used | Media Policy: A set of rules manipulated by the conference | |||
| by clients to manipulate the media policy. | policy server regarding the media composition of the | |||
| conference. The media policy is used by the focus to | ||||
| determine the mixing characteristics for the conference. | ||||
| The media policy includes rules about which participants | ||||
| receive media from which other participants, and the ways | ||||
| in which that media is combined for each participant. In | ||||
| the case of audio, these rules can include the relative | ||||
| volumes at which each participant is mixed. In the case of | ||||
| video, these rules can indicate whether the video is tiled, | ||||
| whether the video indicates the loudest speaker, and so on. | ||||
| Mixer: As defined in the Real Time Transport Protocol [2], a | Conference Policy Control Protocol (CPCP): The protocol used by | |||
| mixer receives a set of media streams, and combines their | clients to manipulate the conference policy. | |||
| media in a type-specific manner, redistributing the result | ||||
| to each participant. We use the term here to include | ||||
| combining of non-RTP media streams as well, such as instant | ||||
| messaging sessions [5]. | ||||
| Basic Conference: A basic conference is one where there is no | Mixer: A mixer receives a set of media streams of the same type, | |||
| conference policy server, media policy server, or | and combines their media in a type-specific manner, | |||
| conference subscription server - only a focus. | redistributing the result to each participant. This | |||
| includes media transported using RTP [2]. As a result, the | ||||
| term defined here is a superset of the mixer concept | ||||
| defined in RFC 1889, since it allows for non-RTP-based | ||||
| media such as instant messaging sessions [5]. | ||||
| Basic Participant: A basic participant is a participant in a | Conference-Unaware Participant: A conference-unaware participant | |||
| conference that is not aware that it is actually in a | is a participant in a conference that is not aware that it | |||
| conference. As far as the UA is concerned, it is a point- | is actually in a conference. As far as the UA is concerned, | |||
| to-point call. | it is a point-to-point call. | |||
| Cascaded Conference: A conference in which a participant is the | Cascaded Conferencing: A mechanism for group communications in | |||
| focus of another conference. | which a set of conferences are linked by having their | |||
| focuses interact in some fashion. | ||||
| Complex Conference: A complex conference includes at least one | Simplex Cascaded Conferences: a group of conferences which are | |||
| of a conference policy server, media policy server, or | linked such that the user agent which represents the focus | |||
| conference subscription server, in addition to the focus. | of one conference is a conference-unaware participant in | |||
| another conference. | ||||
| Complex Participant: A complex participant is a participant in a | Conference-Aware Participant: A conference-aware participant is | |||
| conference that has learned, through automated means, that | a participant in a conference that has learned, through | |||
| it is in a conference, and that can use a conference policy | automated means, that it is in a conference, and that can | |||
| control protocol, media policy control protocol, or | use a conference policy control protocol, media policy | |||
| conference subscription, to implement advanced | control protocol, or conference subscription, to implement | |||
| functionality. | advanced functionality. | |||
| Conference Server: A conference server is a physical server | Conference Server: A conference server is a physical server | |||
| which contains, at a minimum, the focus. It may also | which contains, at a minimum, the focus. It may also | |||
| include a media policy server, a conference policy server, | include a conference policy server and mixers. | |||
| and a mixer. | ||||
| Singleton: In this context, a singleton is a conference | ||||
| participant that is not a focus. A singleton represents a | ||||
| single user in a conference. | ||||
| Conference Topology: The conference topology is a graph that | ||||
| defines the connectivity amongst participants connected | ||||
| through conferences. Each node in the graph represents a | ||||
| user agent, whether it is a focus or a singleton. Each leaf | ||||
| node in the tree represents an singleton, and an internal | ||||
| node represents a focus. An edge between two nodes implies | ||||
| that there is a SIP dialog between them. Ideally, | ||||
| conference topologies are trees, not arbitrary graphs. | ||||
| Conversation Space: For each conference URI, there is a unique | ||||
| conversation space. The conversation space is defined as | ||||
| the set of singleton in the conference topology associated | ||||
| with that URI. The conference topology associated with a | ||||
| conference URI is the one that is constructed by starting | ||||
| with the focus for that URI. Under normal circumstances, | ||||
| the set of singleton in a conversation space will all | ||||
| receive each others media. | ||||
| Instant Conference: A conference in which the focus is | ||||
| constructed the instant the first INVITE for a URI is | ||||
| received, and then destroyed in which the last participant | ||||
| has left. | ||||
| Mass Invitation: A conference policy control protocol request to | Mass Invitation: A conference policy control protocol request to | |||
| invite a large number of users into the conference. | invite a large number of users into the conference. | |||
| Mass Ejection: A conference policy control protocol request to | Mass Ejection: A conference policy control protocol request to | |||
| remove a large number of users from the conference. | remove a large number of users from the conference. | |||
| Sidebar: A sidebar appears to the users as a "conference within | Sidebar: A sidebar appears to the users within the sidebar as a | |||
| the conference". It is a dicsussion amongst a subset of the | "conference within the conference". It is a conversation | |||
| participants, not heard by the remaining participants in | amongst a subset of the participants to which the remaining | |||
| the conference. | participants are not privy. | |||
| Anonymous Participant: An anonymous participant is one that is | Anonymous Participant: An anonymous participant is one that is | |||
| known to other participants (through the conference | known to other participants through the conference | |||
| notification service), but whose identity is being | notification service, but whose identity is being withheld. | |||
| withheld. | ||||
| Invisible Participant: An invisible participant is one that is | ||||
| not known to other participants in the conference. They may | ||||
| be known to the moderator, depending on conference policy. | ||||
| 3 Basic Architecture | ||||
| A SIP conference is represented by a URI. This URI identifies the | ||||
| focus, which is the user agent at the center of the conference. Any | ||||
| participant that is involved in the conference is connected to the | ||||
| focus by a SIP dialog. The result is a star topology, shown in Figure | ||||
| 1. | ||||
| The focus has access to a conference policy and media policy, an | Hidden Participant: A hidden participant is one that is not | |||
| instance of which exist for each focus. In a basic SIP conference, | known to other participants in the conference. They may be | |||
| these policies are administratively defined. | known to the moderator, depending on conference policy. | |||
| Users join the conference by sending an INVITE to the conference URI. | 3 Overview of Conferencing Architecture | |||
| As long as the conference policy allows, the INVITE is accepted by | ||||
| the focus and the user is brought into the conference. Users can | ||||
| leave the conference by sending a BYE, as they would in a normal | ||||
| call. Indeed, a participant in a basic conference does not need to | ||||
| know that the focus is anything other than a normal SIP user agent. | ||||
| Similarly, the focus can terminate a dialog with a participant, | The central component (literally) in a SIP conference is the focus. | |||
| should the conference policy change to indicate that the participant | The focus maintains a SIP signaling relationship with each | |||
| is no longer allowed in the conference. A focus can also initiate an | participant in the conference. The result is a star topology, shown | |||
| INVITE, should the conference policy indicate that the focus needs to | in Figure 1. | |||
| bring a participant into the conference. | ||||
| The focus is responsible for making sure that the media streams which | The focus is responsible for making sure that the media streams which | |||
| constitute the conference are available to the participants in the | constitute the conference are available to the participants in the | |||
| conference. It does that through the use of one or more mixers, each | conference. It does that through the use of one or more mixers, each | |||
| of which combines a number of input media streams to produce one or | of which combines a number of input media streams to produce one or | |||
| more output media streams. The focus uses the media policy to | more output media streams. The focus uses the media policy to | |||
| determine the proper configuration of the mixers. | determine the proper configuration of the mixers. | |||
| With these basic capabilities, a large number of common conferencing | ||||
| applications can be built. None of them require any extensions to | ||||
| SIP; they merely require that the focus is aware of its role and | ||||
| responsibilities in maintaining the conference. However, basic | ||||
| conferences do not allow for the participants to control the way in | ||||
| which the conference operates. | ||||
| +-----------+ | +-----------+ | |||
| | | | | | | |||
| | | | | | | |||
| |Participant| | |Participant| | |||
| | | | | 4 | | |||
| | | | | | | |||
| +-----------+ | +-----------+ | |||
| | | | | |||
| |SIP | |SIP | |||
| |Dialog | |Dialog | |||
| | | |4 | |||
| | | | | |||
| +-----------+ +-----------+ +-----------+ | +-----------+ +-----------+ +-----------+ | |||
| | | | | | | | | | | | | | | |||
| | | | | | | | | | | | | | | |||
| |Participant|-----------| Focus |------------|Participant| | |Participant|-----------| Focus |------------|Participant| | |||
| | | SIP | | SIP | | | | 1 | SIP | | SIP | 3 | | |||
| | | Dialog | | Dialog | | | | | Dialog | | Dialog | | | |||
| +-----------+ +-----------+ +-----------+ | +-----------+ 1 +-----------+ 3 +-----------+ | |||
| | | | | |||
| | | | | |||
| |SIP | |SIP | |||
| |Dialog | |Dialog | |||
| | | |2 | |||
| | | | | |||
| +-----------+ | +-----------+ | |||
| | | | | | | |||
| | | | | | | |||
| |Participant| | |Participant| | |||
| | | | | 2 | | |||
| | | | | | | |||
| +-----------+ | +-----------+ | |||
| Figure 1: Basic SIP Conference | Figure 1: SIP Conference Architecture | |||
| A complex SIP conference is one in which additional interfaces are | The focus has access to the conference policy (composed of the | |||
| exposed, allowing for a richer set of controls and information on the | membership and media policies), an instance of which exist for each | |||
| conference. In particular, a complex SIP conference can include a | conference. Effectively, the conference policy can be thought of as a | |||
| conference policy server and a media policy server, and the focus can | database which describes the way that the conference should operate. | |||
| expose a conference notification service. The model for these | It is the responsibility of the focus to enforce those policies. Not | |||
| conferences is shown in Figure 2. This figure shows the view from one | only does the focus need read access to the database, but it needs to | |||
| participant. The conference now encompasses an additional set of | know when it has changed. Such changes might result in SIP signaling | |||
| functions. In addition to maintaining the dialog with the focus, the | (for example, the ejection of a user from the conference using BYE), | |||
| participant now has access to these other functions. It can, using a | and most changes will require a notification to be sent to | |||
| conference event package [6], SUBSCRIBE to the conference URI, and be | subscribers using the conference notification service. | |||
| connected to the conference notification service provided by the | ||||
| focus. Through this package, it can learn about changes in | ||||
| participants (effectively, the state of the dialogs), the media | ||||
| policy, and the conference policy. | ||||
| The participant can also communicate with the conference policy | The conference is represented by a URI, which identifies the focus. | |||
| server, using a conference policy control protocol. This is a | Each conference has a unique focus and a unique URI identifying that | |||
| strictly client-server transactional protocol. This protocol might | focus. Requests to the conference URI are routed to the focus for | |||
| not be a protocol at all; it can be performed using a web interface. | that specific conference. | |||
| In this case, no standardized protocols or policies are needed. | ||||
| However, the web interface can only be manipulated by humans, not | ||||
| automata. For this reason, the participant can use a protocol | ||||
| designed specifically for this purpose. | ||||
| The participant can also communicate with the media policy server, | Users usually join the conference by sending an INVITE to the | |||
| using a media policy control protocol. This is a strictly client- | conference URI. As long as the conference policy allows, the INVITE | |||
| server transactional operation. This can also be through a web | is accepted by the focus and the user is brought into the conference. | |||
| interface, or through an explicit protocol. | Users can leave the conference by sending a BYE, as they would in a | |||
| normal call. | ||||
| The focus will access the media and conference policies. There is a | Similarly, the focus can terminate a dialog with a participant, | |||
| tight coupling between these policies and the focus. Not only does it | should the conference policy change to indicate that the participant | |||
| need read access to these policies, but it needs to know when they | is no longer allowed in the conference. A focus can also initiate an | |||
| have changed. Such changes might result in SIP signaling (for | INVITE, should the conference policy indicate that the focus needs to | |||
| example, the ejection of a user from the conference using BYE), and | bring a participant into the conference. | |||
| most changes will require a notification to be sent to subscribers to | ||||
| the conference notification service. | ||||
| The conference policy and media policy servers need not be available | The notion of a conference-unaware participant is important in this | |||
| in any particular conference. Even when available, they need not be | framework. A conference-unaware participant does not even know that | |||
| used by all participants. A participant in a conference that does not | the UA it is communicating with happens to be a focus. As far as its | |||
| access any of these functions, and which doesn't even know that the | concerned, its a UA just like any other. The focus, of course, knows | |||
| focus is a focus, is called a basic participant. A conference | that its a focus, and it performs the tasks needed for the conference | |||
| participant that can discover and access these additional function is | to operate. | |||
| a complex participant. Any conference can include basic and complex | ||||
| participants. | ||||
| The interfaces between (1) the focus and the media policy, (2) the | Conference-unaware participants have access to a good deal of | |||
| focus and the conference policy, (3) the conference policy server and | functionality. They can join and leave conferences using SIP, and | |||
| the conference policy, and (4) the media policy server and the media | obtain more advanced features through stimulus signaling, as | |||
| policy are not subject to standardization at the time of this | discussed in [6]. However, if the participant wishes to explicitly | |||
| writing. They are intended primarily to show the logical roles | control aspects of the conference using functional signaling | |||
| Conference ..................................... | protocols, the participant must be conference-aware. | |||
| Policy . +-----------+ . | ||||
| Control . | | . | A conference-aware participant is one that has access to advanced | |||
| Protocol . |Participant| . | functionality through additional protocol interfaces. The client uses | |||
| +------------------->| Policy | . | these protocols to interact with the conference policy server and the | |||
| | . | Server | . | focus. A model for this interaction is shown in Figure 2. The | |||
| | . | | \ . | participant can interact with the focus using extensions, such as | |||
| | Media . +-----------+ \ . | REFER, in order to access enhanced call control functions [7]. The | |||
| | Policy . +-----------+ \ //-----\\ . | participant can SUBSCRIBE to the conference URI, and be connected to | |||
| | Control . | | > || || . | the conference notification service provided by the focus. Through | |||
| | Protocol . | Media | \\-----// . | this mechanism, it can learn about changes in participants | |||
| | +------------->| Policy | | | . | (effectively, the state of the dialogs), the media policy, and the | |||
| | | . | Server |----> |Conference . | membership policy. | |||
| | | . | | | | . | ||||
| | | . +-----------+ | & | . | The participant can communicate with the conference policy server | |||
| | | . | | . | using a conference policy control protocol. Through this protocol, it | |||
| | | . | Media | . | can affect the conference policy. The conference policy server need | |||
| not be available in any particular conference, although there is | ||||
| always a conference policy. | ||||
| The interfaces between the focus and the conference policy, and the | ||||
| conference policy server and the conference policy, are not subject | ||||
| to standardization at the time of this writing. They are intended | ||||
| primarily to show the logical roles involved in a conference, as | ||||
| opposed to suggesting a physical decomposition. The separation of | ||||
| these functions is documented here to encourage clarity in the | ||||
| requirements and to allow individual implementations the flexibility | ||||
| to compose a conferencing system in a scalable and robust manner. | ||||
| 3.1 Usage of URIs | ||||
| It is fundamental to this framework that a conference is uniquely | ||||
| identified by a URI, and that this URI identifies the focus which is | ||||
| responsible for the conference. The conference URI is unique, such | ||||
| that no two conferences have the same conference URI. A conference | ||||
| URI is always a SIP or SIPS URI. | ||||
| The conference URI is opaque to any participants which might use it. | ||||
| There is no way to look at the URI, and know for certain whether it | ||||
| identifies a focus, as opposed to a user or an interface on a PSTN | ||||
| gateway. This is in line with the general philosophy of URI usage | ||||
| [8]. However, contextual information surrounding the URI (for | ||||
| example, SIP header parameters) may indicate that the URI represents | ||||
| a conference. | ||||
| When a SIP request is sent to the conference URI, that request is | ||||
| routed to the focus, and only to the focus. The element or system | ||||
| that creates the conference URI is responsible for guaranteeing this | ||||
| property. | ||||
| The conference URI can represent a long-lived conference or interest | ||||
| group, such as "sip:discussion-on-dogs@example.com". The focus | ||||
| identified by this URI would always exist, and always be managing the | ||||
| conference for whatever participants are currently joined. Other | ||||
| conference URIs can represent short-lived conferences, such as an | ||||
| ad-hoc conference. | ||||
| Ideally, a conference URI is never constructed or guessed by a user. | ||||
| ..................................... | ||||
| . . | ||||
| . . | ||||
| . . | ||||
| . . | ||||
| . Conference . | ||||
| . Policy . | ||||
| Conference . . | ||||
| Policy . +-----------+ //-----\\ . | ||||
| Control . | | || || . | ||||
| Protocol . | Conference| \\-----// . | ||||
| +---------------->| Policy | | | . | ||||
| | . | Server |----> |Membership . | ||||
| | . | | | | . | ||||
| | . +-----------+ | & | . | ||||
| | . | | . | ||||
| | . | Media | . | ||||
| +-----------+ . +-----------+ | Policy| . | +-----------+ . +-----------+ | Policy| . | |||
| | | . | | \ // . | | | . | | \ // . | |||
| | | . | | \-----/ . | | | . | | \-----/ . | |||
| |Participant|<--------->| Focus | | . | |Participant|<--------->| Focus | | . | |||
| | | SIP . | | | . | | | SIP . | | | . | |||
| | | Dialog . | |<-----------+ . | | | Dialog . | |<-----------+ . | |||
| +-----------+ . |...........| . | +-----------+ . |...........| . | |||
| ^ . | Conference| . | ^ . | Conference| . | |||
| | . |Notification . | | . |Notification . | |||
| +------------>| Service | . | +------------>| Service | . | |||
| Subscription. +-----------+ . | Subscription. +-----------+ . | |||
| . . | . . | |||
| . . | . . | |||
| . . | . . | |||
| . . | . . | |||
| ..................................... | ..................................... | |||
| Conference | Conference | |||
| Functions | Functions | |||
| Figure 2: Complex SIP Conference | Figure 2: Conference-Aware Participant | |||
| to encourage clarity in the requirements and to allow individual | ||||
| implementations the flexibility to compose a conferencing system in a | ||||
| scalable and robust manner. | ||||
| 4 Usage of URIs | ||||
| It is fundamental to this framework that a conference is uniquely | ||||
| identified by a URI, and that this URI identify the focus which is | ||||
| responsible for the conference. This URI is always a SIP or SIPS URI. | ||||
| The conference URI is opaque to any participants which might use it. | ||||
| There is no way to look at the URI, and know for certain whether it | ||||
| identifies a focus, as opposed to a user or an interface on a PSTN | ||||
| gateway. This is in line with the general philosophy of URI usage | ||||
| [7]. However, contextual information surrounding the URI (for | ||||
| example, SIP header parameters) may indicate that the URI represents | ||||
| a conference. | ||||
| The conference URI can represent a long-lived conference or interest | ||||
| group, such as "sip:discussion-on-dogs@example.com". The focus | ||||
| identified by this URI would always exist, and always be managing the | ||||
| conference for whatever participants are currently joined. The | ||||
| conference URI can also represent an "instant" conference, for | ||||
| example, "sip:a8sd9998as-9s8daa@example.com". An instant conference | ||||
| is one where the focus is instantiated when the first URI for it | ||||
| arrives, and then destroyed when the last participant leaves. Both of | ||||
| these represent variations in the policies implemented by the focus, | ||||
| and cannot be determined from inspection of the URI. | ||||
| Ideally, a conference URI is never constructed or guessed by a user. | ||||
| Rather, conference URIs are learned through many mechanisms. A | Rather, conference URIs are learned through many mechanisms. A | |||
| conference URI can be emailed or sent in an instant message. A | conference URI can be emailed or sent in an instant message. A | |||
| conference URI can be linked on a web page. A conference URI can be | conference URI can be linked on a web page. A conference URI can be | |||
| obtained from a conference policy control protocol, which can be used | obtained from a conference policy control protocol, which can be used | |||
| to create conferences and the policies associated with them. | to create conferences and the policies associated with them. | |||
| To determine that a SIP URI does represent a focus, standard | To determine that a SIP URI does represent a focus, standard | |||
| techniques for URI capability discovery can be used. First, a | techniques for URI capability discovery can be used. Specifically, | |||
| participant can send an OPTIONS to a SIP URI, and if it represents a | the caller preferences specification [9] provides the "isfocus" | |||
| focus, the response will indicate such [TBD]. The response will also | feature tag to indicate that the URI is a focus. Caller preferences | |||
| indicate whether or not the focus has implemented the subscription | parameters are also used to indicate that a focus supports the | |||
| notification service. This is known by the presence of an Allow | conference notification service. This is done by declaring support | |||
| header in the response, indicating support for the SUBSCRIBE method, | for the SUBSCRIBE method and the relevant package(s) in the caller | |||
| along with an Allow-Events header, indicating support for the | preferences feature parameters associated with the conference URI. | |||
| conferencing package. A second method for determining that a URI | ||||
| represents a focus is through a refresh request. The Allow and | ||||
| Allow-Events headers, along with the caller preferences specification | ||||
| [8] can indicate the same information that would be learned through | ||||
| an OPTIONS query. | ||||
| The other functions in a conference are also represented by URIs. If | The other functions in a conference are also represented by URIs. If | |||
| the conference policy and media policy servers are implemented | the conference policy server is implemented through web pages, this | |||
| through web pages, these servers are regular HTTP URIs. If they are | server is identified by HTTP URIs. If it is accessed using an | |||
| accessed using an explicit protocol, they are the URIs defined for | explicit protocol, it is a URI defined for that protocol. | |||
| those protocols. | ||||
| Starting with the conference URI, the URIs for the other logical | Starting with the conference URI, the URIs for the other logical | |||
| entities in the conference can be learned using [TBD]. | entities in the conference can be learned using the conference | |||
| notification service. | ||||
| OPEN ISSUE: I suppose we cannot say more until the protocol | ||||
| work is done. But, we have a requirement here - that there | ||||
| be a way to learn these URIs starting only with the | ||||
| conference URI. | ||||
| 5 Functions of the Elements | 4 Functions of the Elements | |||
| This section gives a more detailed description of the functions | This section gives a more detailed description of the functions | |||
| typically implemented in each of the elements. | typically implemented in each of the elements. | |||
| 5.1 Focus | 4.1 Focus | |||
| As its name implies, the focus is the center of the conference. All | As its name implies, the focus is the center of the conference. All | |||
| participants in the conference are connected to it using a SIP | participants in the conference are connected to it using a SIP | |||
| dialog. The focus is responsible for maintaining the dialogs | dialog. The focus is responsible for maintaining the dialogs | |||
| connected to it. It insures that the dialogs are connected to a set | connected to it. It ensures that the dialogs are connected to a set | |||
| of participants who are allowed to participate in the conference, as | of participants who are allowed to participate in the conference, as | |||
| defined by the conference policy. The focus also uses SIP to | defined by the membership policy. The focus also uses SIP to | |||
| manipulate the media sessions, in order to make sure each participant | manipulate the media sessions, in order to make sure each participant | |||
| obtains all the media for the conference. To do that, the focus makes | obtains all the media for the conference. To do that, the focus makes | |||
| use of the services of a mixer. | use of mixers. | |||
| When a focus receives an INVITE, it checks the conference policy. The | When a focus receives an INVITE, it checks the membership policy. The | |||
| conference policy might indicate that this participant is not allowed | membership policy might indicate that this participant is not allowed | |||
| to join, in which case the call can be rejected. It might indicate | to join, in which case the call can be rejected. It might indicate | |||
| that another participant, acting as a moderator, needs to approve | that another participant, acting as a moderator, needs to approve | |||
| this new participant. In that case, the INVITE might be parked on a | this new participant. In that case, the INVITE might be parked on a | |||
| music-on-hold server, or a 183 response might be sent to indicate | music-on-hold server, or a 183 response might be sent to indicate | |||
| progress. A notification, using the conference notification service, | progress. A notification, using the conference notification service, | |||
| would be sent to the moderator. The moderator then has the ability to | would be sent to the moderator. The moderator then has the ability to | |||
| manipulate the policies using the conference policy control protocol. | manipulate the policies using the conference policy control protocol. | |||
| If the policies are changed to allow this new participant, the focus | If the policies are changed to allow this new participant, the focus | |||
| can accept the INVITE (or unpark it from the music-on-hold server). | can accept the INVITE (or unpark it from the music-on-hold server). | |||
| The interpretation of the conference policy by the focus is, itself, | The interpretation of the membership policy by the focus is, itself, | |||
| a matter of local policy, and not subject to standardization. | a matter of local policy, and not subject to standardization. | |||
| If a participant manipulated the conference policy to indicate that a | If a participant manipulated the membership policy to indicate that a | |||
| certain other participant was no longer allowed in the conference, | certain other participant was no longer allowed in the conference, | |||
| the focus would send a BYE to that other participant to remove them. | the focus would send a BYE to that other participant to remove them. | |||
| This is often referred to as "ejecting" a user from the conference. | This is often referred to as "ejecting" a user from the conference. | |||
| The process of ejecting fundamentally constitutes these two steps - | The process of ejecting fundamentally constitutes these two steps - | |||
| the establishment of the policy through the conference policy | the establishment of the policy through the conference policy | |||
| protocol, and the implementation of that policy (using a BYE) by the | protocol, and the implementation of that policy (using a BYE) by the | |||
| focus. | focus. | |||
| Similarly, if a participant manipulated the conference policy to | Similarly, if a user manipulated the membership policy to indicate | |||
| indicate that a number of users need to be added to the conference, | that a number of users need to be added to the conference, the focus | |||
| the focus would send an INVITE to those participants. This is often | would send an INVITE to those participants. This is often referred to | |||
| referred to as the "mass invitation" function. As with ejection, it | as the "mass invitation" function. As with ejection, it is | |||
| is fundamentally composed of the policy functions that specify the | fundamentally composed of the policy functions that specify the | |||
| participants which should be present, and the implementation of those | participants which should be present, and the implementation of those | |||
| functions using SIP. A policy request to add a set of users might not | functions. A policy request to add a set of users might not require | |||
| require an INVITE to execute it; those users might already be | an INVITE to execute it; those users might already be participants in | |||
| participants in the conference. | the conference. | |||
| A similar model exists for media policy. If the media policy | A similar model exists for media policy. If the media policy | |||
| indicates that a participant should not receive any video, the focus | indicates that a participant should not receive any video, the focus | |||
| might implement that policy by sending a re-INVITE, removing the | might implement that policy by sending a re-INVITE, removing the | |||
| media stream to that participant. Alternatively, if the video is | media stream to that participant. Alternatively, if the video is | |||
| being centrally mixed, it could inform the mixer to send a black | being centrally mixed, it could inform the mixer to send a black | |||
| screen to that participant. The means by which the policy is | screen to that participant. The means by which the policy is | |||
| implemented are not subject to specification. | implemented are not subject to specification. | |||
| 5.2 Conference Policy Server | 4.2 Conference Policy Server | |||
| The conference policy server allows clients to manipulate and | The conference policy server allows clients to manipulate and | |||
| interact with the conference policy. The conference policy is used by | interact with the conference policy. The conference policy is used by | |||
| the focus to make authorization decisions and guide its overall | the focus to make authorization decisions and guide its overall | |||
| behavior. Logically speaking, there is a one-to-one mapping between a | behavior. Logically speaking, there is a one-to-one mapping between a | |||
| conference policy and a focus. | conference policy and a focus. | |||
| The conference policy is represented by a URI. There is a unique | The conference policy is represented by a URI. There is a unique | |||
| conference policy for each focus. The conference policy URI points to | conference policy for each conference. The conference policy URI | |||
| a conference policy server which can manipulate that conference | points to a conference policy server which can manipulate that | |||
| policy. A conference policy server also has a "top level" URI which | conference policy. A conference policy server also has a "top level" | |||
| can be used to access functions that are independent of any | URI which can be used to access functions that are independent of any | |||
| conference. Perhaps the most important of these functions is the | conference. Perhaps the most important of these functions is the | |||
| creation of a new conference. This will result in the construction of | creation of a new conference. Creation of a new conference will | |||
| a new conference URI, which can then be used to join the conference | result in the construction of a new focus and a corresponding | |||
| itself. | conference URI, which can then be used to join the conference itself, | |||
| along with a media policy and conference policy. | ||||
| The conference policy server is accessed using a client-server | The conference policy server is accessed using a client-server | |||
| transactional protocol. The client can be a participant in the | transactional protocol. The client can be a participant in the | |||
| conference, or it can be a third party. Access control lists for who | conference, or it can be a third party. Access control lists for who | |||
| can modify a conference policy are themselves part of the conference | can modify a conference policy are themselves part of the conference | |||
| policy. The conference policy server also allows clients to create | policy. | |||
| new conferences. This would result in the instantiation of a focus | ||||
| (and therefore, a conference URI associated with that focus), a | ||||
| conference policy, and a media policy. The conference policy server | ||||
| will also have rules about who can create conferences. | ||||
| The conference policy also includes per-participant policies that | The conference policy server is responsible for reconciliation of | |||
| specify how the focus is to handle a particular participant. These | potentially conflicting requests regarding the policy for the | |||
| include whether or not the participant is anonymous, for example. | conference. | |||
| 5.3 Mixers | The client of the conference policy control protocol can be any | |||
| entity interested in manipulating the conference policy. Clearly, | ||||
| participants might be interested in manipulating them. A participant | ||||
| might want to raise or lower the volume for one of the other | ||||
| participants it is hearing. Or, a participant might want to add a | ||||
| user to the conference. | ||||
| A client of the conference policy protocol could also be another | ||||
| server whose job is to determine the conference policy. As an | ||||
| example, a floor control server is responsible for determining which | ||||
| participant(s) in a conference are allowed to speak at any given | ||||
| time, based on participant requests and access rules. The floor | ||||
| control server would act as a client of the conference policy server, | ||||
| and change the media policy based on who is allowed to speak. | ||||
| The client of the conference policy control protocol could also be | ||||
| another conference policy server. | ||||
| 4.3 Mixers | ||||
| A mixer is responsible for combining the media streams that make up | A mixer is responsible for combining the media streams that make up | |||
| the conference, and generating one or more output streams that are | the conference, and generating one or more output streams that are | |||
| distributed to recipients (which could be participants or other | distributed to recipients (which could be participants or other | |||
| mixers). The combination process is specific to the media type, and | mixers). The process of combining media is specific to the media | |||
| is directed by the focus, under the guidance of the rules described | type, and is directed by the focus, under the guidance of the rules | |||
| in the media policy. | described in the media policy. | |||
| A mixer is not aware of a "conference" as an entity, per se. A mixer | A mixer is not aware of a "conference" as an entity, per se. A mixer | |||
| receives media streams as inputs, and based on directions provided by | receives media streams as inputs, and based on directions provided by | |||
| the focus, generates media streams as outputs. There is no grouping | the focus, generates media streams as outputs. There is no grouping | |||
| of media streams beyond the policies that describe the ways in which | of media streams beyond the policies that describe the ways in which | |||
| the streams are mixed. | the streams are mixed. | |||
| A mixer is always under the control of a focus. The focus is | A mixer is always under the control of a focus. The focus is | |||
| responsible for interpreting the media policy, and then installing | responsible for interpreting the media policy, and then installing | |||
| the appropriate rules in the mixer. If the focus is directly | the appropriate rules in the mixer. If the focus is directly | |||
| controlling a mixer, the mixer can either be co-resident with the | controlling a mixer, the mixer can either be co-resident with the | |||
| focus, or can be controlled through a protocol like Megaco [9]. | focus, or can be controlled through some kind of protocol. | |||
| However, a focus need not directly control a mixer. Rather, a focus | However, a focus need not directly control a mixer. Rather, a focus | |||
| can delegate the mixing to the participants, each of which has their | can delegate the mixing to the participants, each of which has their | |||
| own mixer. This is described in Section 6.4. | own mixer. This is described in Section 6.4. | |||
| 5.4 Media Policy Server | 4.4 Conference Notification Service | |||
| The media policy server is similar to the conference policy server. | The focus can provide a conference notification service. In this | |||
| It is accessed using a transactional client-server protocol. It | role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts | |||
| manipulates a media policy, identified by a URI. The focus has the | subscriptions from clients for the conference URI, and generates | |||
| responsibility of acting on that media policy, implementing it | notifications to them as the state of the conference changes. | |||
| through direct or indirect control of mixers. | ||||
| The media policy describes the way in which the set of inputs to the | This state is composed of two separate pieces. The first is the state | |||
| of the focus and the second is the conference policy. | ||||
| The state of the focus includes the participants connected to the | ||||
| focus, and information about the dialogs associated with them. As new | ||||
| participants join, this state changes, and is reported through the | ||||
| notification service. Similarly, when someone leaves, this state also | ||||
| changes, allowing subscribers to learn about this fact. | ||||
| As described previously, the conference policy includes the | ||||
| membership policy and the media policy. As those policies change, due | ||||
| to usage of the CPCP, direct change by the focus, or through an | ||||
| application, the conference notification service informs subscribers | ||||
| of these changes. | ||||
| 4.5 Participants | ||||
| A participant in a conference is any SIP user agent that has a dialog | ||||
| with the focus. This SIP user agent can be a PC application, a SIP | ||||
| hardphone, or a PSTN gateway. It can also be another focus. A | ||||
| conference which has a participant that is the focus of another | ||||
| conference is called a simplex cascaded conference. They can also be | ||||
| used to provide scalable conferences where there are regional sub- | ||||
| conferences, each of which is connected to the main conference. | ||||
| 4.6 Conference Policy | ||||
| The conference policy contains the rules that guide the operation of | ||||
| the focus. The rules can be simple, such as an access list that | ||||
| defines the set of allowed participants in a conference. The rules | ||||
| can also be incredibly complex, specifying time-of-day based rules on | ||||
| participation conditional on the presence of other participants. It | ||||
| is important to understand that there is no restriction on the type | ||||
| of rules that can be encapsulated in a conference policy. | ||||
| The conference policy can be manipulated using web applications or | ||||
| voice applications. It can also be manipulated with proprietary | ||||
| protocols. However, the conference policy control protocol can be | ||||
| used as a standardized means of manipulating the conference policy. | ||||
| By the nature of conference policies, not all aspects of the policy | ||||
| can be manipulated with the conference policy control protocol. | ||||
| The conference policy includes the membership policy and the media | ||||
| policy. The membership policy includes per-participant policies that | ||||
| specify how the focus is to handle a particular participant. These | ||||
| include whether or not the participant is anonymous, for example. | ||||
| The media policy describes the way in which the set of inputs to a | ||||
| mixer are combined to generate the set of outputs. Media policies can | mixer are combined to generate the set of outputs. Media policies can | |||
| span media types. In other words, the policy on how one media stream | span media types. In other words, the policy on how one media stream | |||
| is mixed can be based on characteristics of other media streams. | is mixed can be based on characteristics of other media streams. | |||
| Media policies can be based on any quantifiable characteristic of the | Media policies can be based on any quantifiable characteristic of the | |||
| media stream (its source, volume, codecs, speaking/silence, etc.), | media stream (its source, volume, codecs, speaking/silence, etc.), | |||
| and they can be based on internal or external variables accessible by | and they can be based on internal or external variables accessible by | |||
| the media policy. | the media policy. | |||
| The media policy server is responsible for reconciliation of | ||||
| potentially conflicting requests regarding the media policy for the | ||||
| conference. | ||||
| The client of the media policy protocol can be any entity interested | ||||
| in manipulating media policies. Clearly, participants might be | ||||
| interested in manipulating them. A participant might want to raise or | ||||
| lower the volume for one of the other participants it is hearing. Or, | ||||
| a participant might want to switch from a tiled video view, to just | ||||
| viewing the active speaker. A client of the media policy protocol | ||||
| could also be another server whose job is to determine the media | ||||
| policy. As an example, a floor control server is responsible for | ||||
| determining which participant(s) in a conference are allowed to speak | ||||
| at any given time, based on participant requests and access rules. | ||||
| The floor control server would act as a client of the media policy | ||||
| server, and inform the media policy server about who is allowed to | ||||
| speak. | ||||
| The client of the media policy protocol could also be another media | ||||
| policy server, as described in Section 6.4. | ||||
| Some examples of media policies include: | Some examples of media policies include: | |||
| o The video output is the picture of the loudest speaker (video | o The video output is the picture of the loudest speaker (video | |||
| follows audio). | follows audio). | |||
| o The audio from each participant will be mixed with equal | o The audio from each participant will be mixed with equal | |||
| weight, and distributed to all other participants. | weight, and distributed to all other participants. | |||
| o The audio and video that is distributed is the one selected by | o The audio and video that is distributed is the one selected by | |||
| the floor control server. | the floor control server. | |||
| 5.5 Conference Notification Service | 5 Common Operations | |||
| The focus can provide a conference notification service. In this | There are a large number of ways in which users can interact with a | |||
| role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts | conference. They can join, leave, set policies, approve members, and | |||
| subscriptions from clients for the conference URI, and generates | so on. This section is meant as an overview of the major conferencing | |||
| notifications to them as the state of the conference changes. | operations, summarizing how they operate. More detailed examples of | |||
| the SIP mechanisms can be found in [7]. | ||||
| This state is composed of three separate pieces. The first is the | 5.1 Creating Conferences | |||
| state of the focus, the second is the conference policy, and the | ||||
| third is the media policy. | ||||
| The state of the focus includes the participants connected to the | There are many ways in which a conference can be created. The | |||
| focus, and information about the dialogs associated with them. As new | creation of a conference actually constructs several elements all at | |||
| participants join, this state would change, allowing subscribers to | the same time. It results in the creation of a focus and a conference | |||
| learn about them. Similarly, when someone leaves, this state also | policy. It also results in the construction of a conference URI, | |||
| changes, allowing subscribers to learn about this fact. | which uniquely identifies the focus. Since the conference URI needs | |||
| to be unique, the element which creates conferences is responsible | ||||
| for guaranteeing that uniqueness. This can be accomplished | ||||
| deterministically, by keeping records of conference URIs, or | ||||
| probabilistically, by creating random URI with sufficiently low | ||||
| probabilities of collision. | ||||
| The state of the conference policy includes the set of participants | When a media and conference policy are created, they are established | |||
| that are allowed, or not allowed, to join the conference, and the set | with default rules that are implementation dependent. If the creator | |||
| of participants who are to be explicitly added to the conference. It | of the conference wishes to change those rules, they would do so | |||
| includes the roles which are assigned to each participant, such as | using the conference policy control protocol (CPCP), for example. | |||
| whether they are a moderator. If there was a change in role, for | ||||
| example, a new moderator was selected, the focus would inform | ||||
| subscribers. | ||||
| The state of the media policy includes the media streams being | Of course, using the CPCP requires that an element know the URI for | |||
| received by each participant, the audio or video modalities, and so | manipulating the policy. That requires a means to learn the | |||
| on. | conference policy URI from the conference URI, since the conference | |||
| URI is frequently the sole result returned to the client as a result | ||||
| of conference creation. Any other URIs associated with the conference | ||||
| are learned through the conference notification service. They are | ||||
| carried as elements in the notifications. | ||||
| 5.6 Participants | 5.1.1 SIP Mechanisms | |||
| A participant in a conference is any SIP user agent that has a dialog | One way to create a conference is through a conferencing application. | |||
| with the focus. This SIP user agent can be a PC application, a SIP | As an example, a user can send an INVITE request to | |||
| hardphone, or a PSTN gateway. It can also be another focus. A | sip:conferences@service.com. This URI identifies an IVR application | |||
| conference which has a participant that is the focus of another | which interacts with the user, collects information about the desired | |||
| conference is called a cascaded conference. They can also be used to | conference, and creates it. The user can then be placed into their | |||
| provide scalable conferences where there are regional sub- | newly created conference. | |||
| conferences, each of which is connected to the main conference. A | ||||
| conference topology refers to a graph which shows each focus and each | ||||
| participant as a vertex, with a connection between each participant | ||||
| and its focus. | ||||
| 5.7 Conference Policy | Creation of conferences where the focus resides in an endpoint | |||
| operates differently. There, the endpoint itself creates the | ||||
| conference URI, and hands it out to other endpoints which are to be | ||||
| the participants. What differs from case to case is how the endpoint | ||||
| decides to create a conference. | ||||
| The conference policy contains the rules that guide the operation of | One important case is the ad-hoc conference described in Section 6.2. | |||
| the focus. These rules can be simple, such as an access list that | There, an endpoint unilaterally decides to create the conference | |||
| defines the set of allowed participants in a conference. The rules | based on local policy. The dialogs that were connected to the UA are | |||
| can also be incredibly complex, specifying time-of-day based rules on | migrated to the endpoint-hosted focus, using a re-INVITE to pass the | |||
| participation conditional on the presence of other participants. It | conference URI to the newly joined participants. | |||
| is important to understand that there is no restriction on the type | ||||
| of rules that can be encapsulated in a conference policy. | ||||
| However, there does exist a protocol means by which a client can | Alternatively, one UA can ask another UA to create an endpoint-hosted | |||
| request a change in the conference policy. This is done by | conference. This is accomplished with the SIP Join header [10]. The | |||
| communicating with the conference policy server, which manipulates | UA which receives the Join header in an invitation may need to create | |||
| the conference policy. By the nature of conference policies, not all | a new conference URI (a new one is not needed if the dialog that is | |||
| aspects of the policy can be manipulated with the conference policy | being joined is already part of a conference). The conference URI is | |||
| control protocol. It is the responsibility of the conference policy | then handed to the recently joined participants through a re-INVITE. | |||
| server to reconcile the various requests with the conference policy. | ||||
| 5.8 Media Policy | 5.1.2 CPCP Mechanisms | |||
| The media policy contains the rules that guide the operation of the | Another way to create a conference is through interaction with the | |||
| mixer. The focus uses these rules to interact with the mixer to | conference policy server. Using the conference policy control | |||
| implement them. These rules can be simple (mix all media from all | protocol, a client can instruct the conference policy server to | |||
| participants), or they can be incredibly complex. It is important to | create a new conference and return the conference URI and conference | |||
| understand that there is no restriction on the type of rules that can | policy URI. | |||
| be encapsulated in a media policy. | ||||
| However, there does exist a protocol means by which a client can | 5.1.3 Non-Automated Mechanisms | |||
| request a change in the media policy. This is done by communicating | ||||
| with the media policy server, which manipulates the media policy. By | Of course, a user can also create conferences by interacting with a | |||
| the nature of media policies, not all aspects of the policy can be | web server. The web server would prompt the user for the neccessary | |||
| manipulated with the media policy control protocol. It is the | information (start and stop times of the conference, participants, | |||
| responsibility of the media policy server to reconcile the various | etc.) and return the conference URI to the user. The user would copy | |||
| requests with the media policy. | this URI into their SIP phone, and send it an INVITE in order to join | |||
| the newly-created conference. | ||||
| 5.2 Adding Participants | ||||
| There are many mechanisms for adding participants to a conference. | ||||
| These include SIP, the conference policy control protocol, and non- | ||||
| automated means. In all cases, participant additions can be first | ||||
| party (a user adds themself) or third party (a user adds another | ||||
| user). | ||||
| 5.2.1 SIP Mechanisms | ||||
| First person additions using SIP are trivially accomplished with a | ||||
| standard INVITE. A participant can send an INVITE request to the | ||||
| conference URI, and if the conference policy allows them to join, | ||||
| they are added to the conference. | ||||
| If a UA does not know the conference URI, but has learned about a | ||||
| dialog which is connected to a conference (by using the dialog event | ||||
| package, for example [11]), the UA can join the conference by using | ||||
| the Join header to join the dialog. | ||||
| Third party additions with SIP are done using REFER [12]. The client | ||||
| can send a REFER request to the participant, asking them to send an | ||||
| INVITE request to the conference URI. Additionally, the client can | ||||
| send a REFER request to the focus, asking it to send an INVITE to the | ||||
| participant. The latter technique has the benefit of allowing a | ||||
| client to add a conference-unaware participant that does not support | ||||
| the REFER method. | ||||
| 5.2.2 CPCP Mechanisms | ||||
| A basic function of the conference policy control protocol is to add | ||||
| participants. A client of the protocol can specify any SIP URI (which | ||||
| may identify themself) that is to be added. If the URI does not | ||||
| identify a user that is already a participant in the conference, the | ||||
| focus will send an INVITE to that URI in order to add them in. | ||||
| 5.2.3 Non-Automated Mechanisms | ||||
| There are countless non-automated means for asking a participant to | ||||
| join the conference. Generally, they involve conveying the conference | ||||
| URI to the desired participant, so that they can send an INVITE to | ||||
| it. These mechanisms all require some kind of human interaction. | ||||
| As an example, a user can send an instant message [13] to the third | ||||
| party, containing an HTML document which requests the user to click | ||||
| on the hyperlink to join the conference: | ||||
| <html> | ||||
| Hey, would you like to <a href="sip:9sf88fk-99sd@conferences.com">join | ||||
| </a> the conference now? | ||||
| </html> | ||||
| 5.3 Conditional Joins | ||||
| In many cases, a new participant will not wish to join the conference | ||||
| unless they can join with a particular set of policies. As an | ||||
| example, a participant may want to join anonymously, so that other | ||||
| participants know that someone has joined, but not who. To accomplish | ||||
| this, the conference policy control protocol is used to establish | ||||
| these policies prior to the generation or acceptance of an invitation | ||||
| to the conference. For example, if a user wishes to join a conference | ||||
| with a known conference URI, the user would obtain the URI for the | ||||
| conference policy, manipulate the policy to set themself as an | ||||
| anonymous participant, and then actually join the conference by | ||||
| sending an INVITE request to the conference URI. | ||||
| 5.4 Removing Participants | ||||
| As with additions, there are several mechanisms for departures. These | ||||
| include SIP mechanisms and CPCP mechanisms. Removals can also be | ||||
| first person or third person. | ||||
| 5.4.1 SIP Mechanisms | ||||
| First person departures are trivially accomplished by sending a BYE | ||||
| request to the focus. This terminates the dialog with the focus and | ||||
| removes the participant from the conference. | ||||
| Third person departures can also be done using SIP, through the REFER | ||||
| method. | ||||
| 5.4.2 CPCP Mechanisms | ||||
| The CPCP can be used by a client to remove any participant (including | ||||
| themself). When CPCP is used for this purpose, the focus will send a | ||||
| BYE request to the participant that is being removed. The focus will | ||||
| execute any other signaling that is needed to remove them (for | ||||
| example, manipulate other dialogs in order to manage the change in | ||||
| media streams). | ||||
| The conference policy control protocol can also be used to remove a | ||||
| large number of users. This is generally referred to as mass | ||||
| ejection. | ||||
| 5.4.3 Non-Automated Mechanisms | ||||
| As with the other common conferencing functions, there are many non- | ||||
| automated ways to remove a participant. The identity of the | ||||
| participant can be entered into a web form. When the user clicks | ||||
| submit, the focus sends a BYE to that participant, removing them from | ||||
| the conference. Alternatively, the conference can expose an IM | ||||
| interface, where the user can send an IM to the conference saying | ||||
| "remove Bob", causing the conference server to remove Bob. | ||||
| 5.5 Approving Policy Changes | ||||
| OPEN ISSUE: The basic mechanism described here depends on | ||||
| the actual protocols used for conference and media policy | ||||
| manipulation. If the protocol itself provides change | ||||
| notifications, sip-events may not be needed for that | ||||
| purpose. Thus, this description here is tentative. | ||||
| A conference policy for a particular conference may designate one or | ||||
| more users as moderators for some set of media policy or conference | ||||
| policy change requests. This means that those moderators need to | ||||
| approve the specific policy change. Typically, moderators are used to | ||||
| approve member additions and removals. However, the framework allows | ||||
| for moderators to be associated with any policy change that can be | ||||
| made. | ||||
| Moderating a policy request is done using a combination of the | ||||
| conference notification service and the CPCP protocol. | ||||
| First, a client makes a policy change. This can be directly, using | ||||
| the CPCP, or indirectly. An indirect policy change request is any | ||||
| non-CPCP action that requires approval. The simplest example is an | ||||
| INVITE to the focus from a new participant. That represents a request | ||||
| to change the membership of the conference. From a moderation | ||||
| perspective, it is handled identically to the case where a client | ||||
| used the CPCP to request that the same user to be added to the | ||||
| conference. | ||||
| Part of the conference policy itself may designate any policy change | ||||
| as moderated. This means that they change cannot be performed by the | ||||
| client directly. As a result, any CPCP request will fail, and the | ||||
| failure response informs the client that their request failed due to | ||||
| insufficient authorization. That completes the CPCP transaction. In | ||||
| the case of a policy change requested indirectly through some other | ||||
| means, the behavior depends on the mechanism. For example, if a user | ||||
| sends a SIP INVITE request to the conference in order to join, and | ||||
| that join request is moderated, the focus can reject the INVITE, or | ||||
| it can accept it and play music-on-hold until the request is | ||||
| approved. | ||||
| Even though the CPCP transaction failed, it does result in a change | ||||
| in internal state. Specifically, the requested change shows up as a | ||||
| "pending" state within the media and conference policies. This means | ||||
| that the change has been requested, but has not taken effect. It is | ||||
| almost a form of change request history. However, because it is a | ||||
| state change, it is something that can result in notifications | ||||
| through the conference notification service. | ||||
| Therefore, in order to moderate requests, the moderator subscribes to | ||||
| the conference policy notification service. Normally, the | ||||
| notifications from the focus do not reflect pending state changes. | ||||
| That is, the service will not normally send a notification informing | ||||
| a subscriber that a policy change request was made and failed due to | ||||
| lack of authorization. However, notifications to the moderator do | ||||
| reflect these changes. That is because the policy of the focus is to | ||||
| inform moderators, and only moderators, of these changes. Indeed, | ||||
| different users can be moderators for different parts of the | ||||
| conference and media policies. For example, one user can be a | ||||
| moderator for membership changes, and another, a moderator for | ||||
| whether users can be anonymously joined or not. | ||||
| There are two ways that the focus knows whether a subscriber to the | ||||
| conference notification service is a moderator. The first is | ||||
| configured policy (once again through CPCP). That policy can specify | ||||
| that a particular user is the moderator for a particular piece of | ||||
| policy. Therefore, if that user subscribes to the conference | ||||
| notification service, any notification sent to that user will include | ||||
| pending changes to that piece of policy. As an alternative, a | ||||
| SUBSCRIBE request from a user can include a filter [14] that requests | ||||
| receipt of these pending state changes. If the conference policy | ||||
| allows, that request is honored, and the subscriber will receive | ||||
| notifications about pending state changes. | ||||
| Once the moderator receives a notification about the pending state | ||||
| change, they use the CPCP to implement their decision. If the | ||||
| moderator decides to approve the change, they use the CPCP or MPCP to | ||||
| actually perform the change themselves. Since the moderator for a | ||||
| piece of policy is allowed to change that piece of policy, by | ||||
| definition, their change is accepted and performed. If the moderator | ||||
| decides to reject the change, they use the CPCP to remove the pending | ||||
| state from the database. | ||||
| The pending state persists in the database for a period of time which | ||||
| is, itself, part of the conference policy. If the moderator does not | ||||
| either approve or reject the change, the pending state eventually | ||||
| disappears, as if the change was explicitly rejected. | ||||
| If the pending state is approved, a real change to the conference or | ||||
| media policy takes place, and this change will be reflected in the | ||||
| conference notification service. In this way, if a client makes a | ||||
| policy change, and their request is rejected because they are not | ||||
| authorized, the client can subscribe to the conference notification | ||||
| service to learn if their change is eventually approved or rejected. | ||||
| This general mechanism for moderating policy requests is consistent | ||||
| with the moderation of presence subscriptions [15] [16]. | ||||
| 5.6 Creating Sidebars | ||||
| A sidebar is a "conference within a conference", allowing a subset of | ||||
| the participants to converse amongst themselves. Frequently, | ||||
| participants in a sidebar will still receive media from the main | ||||
| conference, but "in the background". For audio, this may mean that | ||||
| the volume of the media is reduced, for example. | ||||
| A sidebar is represented by a separate conference URI. This URI is a | ||||
| type of "alias" for the main conference URI. Both route to the same | ||||
| focus. Like any other conference, the sidebar conference URI has a | ||||
| conference policy and a media policy associated with it. Like any | ||||
| other conference, one can join it by sending an INVITE to this URI, | ||||
| or ask others to join by referring them to it. However, it differs | ||||
| from a normal conference URI in several ways. First, users in the | ||||
| main conference do not need to establish a separate dialog to the | ||||
| sidebar conference. The focus recognizes the sidebar as a special | ||||
| URI, and knows to use the existing dialog to the main conference as a | ||||
| "virtual" connection to the sidebar URI. | ||||
| The second difference is the way in which conference and media | ||||
| policies are implemented. If the conference policy control protocol | ||||
| is used to add a user to a normal conference, the focus will | ||||
| typically send an INVITE to the participant to ask them to join. For | ||||
| a sidebar conference, it is done differently. If the conference | ||||
| policy control protocol is used to add a user to it, and that user is | ||||
| already part of the main conference, the focus will use the | ||||
| conference notification service to alert the existing participant | ||||
| that they have been asked to join the sidebar. The invited user can | ||||
| then make use of the CPCP to formally add themselves to the sidebar. | ||||
| 5.7 Destroying Conferences | ||||
| Conferences can be destroyed in several ways. Generally, whether | ||||
| those means are applicable for any particular conference is a | ||||
| component of the conference policy. | ||||
| When a conference is destroyed, the conference and media policies | ||||
| associated with it are destroyed. Any attempts to read or write those | ||||
| policies results in a protocol error. Furthermore, the conference URI | ||||
| becomes invalid. Any attempts to send an INVITE to it, or SUBSCRIBE | ||||
| to it, would result in a SIP error response. | ||||
| Typically, if a conference is destroyed while there are still | ||||
| participants, the focus would send a BYE to those participants before | ||||
| actually destroying the conference. Similarly, if there were any | ||||
| users subscribed to the conference notification service, those | ||||
| subscriptions would be terminated by the server before the actual | ||||
| destruction. | ||||
| 5.7.1 SIP Mechanisms | ||||
| There is no explicit means in SIP to destroy a conference. However, a | ||||
| conference may be destroyed as a by-product of a user leaving the | ||||
| conference, which can be done with BYE. In particular, if the | ||||
| conference policy states that the conference is destroyed once the | ||||
| last user leaves, when that user does leave (using a SIP BYE | ||||
| request), the conference is destroyed. | ||||
| 5.7.2 CPCP Mechanisms | ||||
| The CPCP contains mechanisms for explicitly destroying a conference. | ||||
| 5.7.3 Non-Automated Mechanisms | ||||
| As with conference creation, a conference can be destroyed by | ||||
| interacting with a web application or voice application that prompts | ||||
| the user for the conference to be destroyed. | ||||
| 5.8 Obtaining Membership | ||||
| A participant in a conference will frequently wish to know the set of | ||||
| other users in the conference. This information can be obtained many | ||||
| ways. | ||||
| 5.8.1 SIP Mechanisms | ||||
| The conference notification service allows a conference aware | ||||
| participant to subscribe to it, and receive notifications that | ||||
| contain the list of participants. When a new participant joins or | ||||
| leaves, subscribers are notified. The conference notification service | ||||
| also allows a user to do a "fetch" [4] to obtain the current listing. | ||||
| 5.8.2 CPCP Mechanisms | ||||
| The CPCP contains mechanisms for querying for the current set of | ||||
| conference participants. | ||||
| 5.8.3 Non-Automated Mechanisms | ||||
| Users can also interact with applications to obtain conference | ||||
| membership. There may be a conference web page associated with the | ||||
| conference, which has a link that will fetch the current list of | ||||
| participants and display them in the browser. Similarly, an | ||||
| interactive voice response application connected to the focus can be | ||||
| used to obtain the current membership. A user in the conference could | ||||
| press the pound key on their phone, and hear a listing of the current | ||||
| participants. | ||||
| 5.9 Adding and Removing Media | ||||
| Each conference is composed of a particular set of media that the | ||||
| focus is managing. For example, a conference might contain a video | ||||
| stream and an audio stream. The set of media streams that constitute | ||||
| the conference can be changed by participants. When the set of media | ||||
| in the conference change, the focus will need to generate a re-INVITE | ||||
| to each participant in order to add or remove the media stream to | ||||
| each participant. When a media stream is being added, a participant | ||||
| can reject the offered media stream, in which case it will not | ||||
| receive or contribute to that stream. Rejection of a stream by a | ||||
| participant does not imply that that the stream is no longer part of | ||||
| the conference - just that the participant is not involved in it. | ||||
| There are several ways in which a media stream can be added or | ||||
| removed from a conference. | ||||
| 5.9.1 SIP Mechanisms | ||||
| A SIP re-INVITE can be used by a participant to add or remove a media | ||||
| stream. This is accomplished using the standard offer/answer | ||||
| techniques for adding media streams to a session [17]. This will | ||||
| trigger the focus to generate its own re-INVITEs. | ||||
| 5.9.2 CPCP Mechanisms | ||||
| The CPCP can be used to add or remove a media stream. This too will | ||||
| trigger the focus to generate a re-INVITE to each participant in | ||||
| order to affect the change. | ||||
| 5.9.3 Non-Automated Mechanisms | ||||
| As with most of the other common functions, addition and removal of | ||||
| media streams can be accomplished with a web application or | ||||
| interactive voice application. | ||||
| 5.10 Conference Announcements and Recordings | ||||
| Conference announcements and recordings play a key role in many real | ||||
| conferencing systems. Examples of such features include: | ||||
| 1. Asking a user to state their name before joining the | ||||
| conference, in order to support a roll call | ||||
| 2. Allowing a user to request a roll call, so they can hear | ||||
| who else is in the conference | ||||
| 3. Allowing a user to press some keys on their keypad in order | ||||
| to record the conference | ||||
| 4. Allowing a user to press some keys on their keypad in order | ||||
| to be connected with a human operator | ||||
| 5. Allowing a user to press some keys on their keypad to mute | ||||
| or unmute their line | ||||
| In this framework, these capabilities are modeled as an application | ||||
| which acts as a participant in the conference. This is shown | ||||
| pictorially in Figure 3. The conference has four participants. Three | ||||
| of these participants are end users, and the fourth is the | ||||
| announcement application. | ||||
| User 1 | ||||
| +-----------+ | ||||
| | | | ||||
| | | | ||||
| |Participant| | ||||
| | 4 | | ||||
| | | | ||||
| +-----------+ | ||||
| |SIP | ||||
| |Dialog | ||||
| Conference |1 | ||||
| Policy +---|--------+ | ||||
| User 2 Server | | | Application | ||||
| +-----------+ +-----------+ | CPCP ************* | ||||
| | | | | |-------- * * | ||||
| | | | | | * * | ||||
| |Participant|-----------| Focus |------------*Participant* | ||||
| | 1 | SIP | | | SIP * 3 * | ||||
| | | Dialog | |--+ Dialog * * | ||||
| +-----------+ 2 +-----------+ 4 ************* | ||||
| | | ||||
| | | ||||
| |SIP | ||||
| |Dialog | ||||
| |3 | ||||
| | | ||||
| +-----------+ | ||||
| | | | ||||
| | | | ||||
| |Participant| | ||||
| | 2 | | ||||
| | | | ||||
| +-----------+ | ||||
| User 3 | ||||
| Figure 3: Conference announcement application | ||||
| If the announcement application wishes to play an announcement to all | ||||
| the conference members (for example, to announce a join), it merely | ||||
| sends media to the mixer as would any other participant. The | ||||
| announcement is mixed in with the conversation and played to the | ||||
| participants. | ||||
| Similarly, the announcement application can play an announcement to a | ||||
| specific user by using the CPCP to configure its media policy so that | ||||
| the media it generates is only heard by the target user. The | ||||
| application then generates the desired announcement, and it will be | ||||
| heard only by the selected recipient. | ||||
| The announcement application can also receive input from a specific | ||||
| user through the conference. The announcement application would use | ||||
| the CPCP to cause in-band DTMF to be dropped from the mix, and sent | ||||
| only to itself. When a user wishes to invoke an operation, such as to | ||||
| obtain a roll call, the user would press the appropriate key | ||||
| sequence. That sequence would be heard only by the announcement | ||||
| application. Once the application determines that the user wishes to | ||||
| hear a roll call, it can use the CPCP to set the media policy so that | ||||
| media from that user is delivered only to the announcement | ||||
| application. This "disconnects" the user from the rest of the | ||||
| conference so they can interact with the application. Once the | ||||
| interaction is done, and announcement application uses the CPCP to | ||||
| "reconnect" the user to the conference. | ||||
| 5.11 Floor Control | ||||
| Floor control is similar to a conference announcement application. | ||||
| Within this framework, floor control is managed by an application | ||||
| (possibly one that is not a participant) that uses the CPCP to | ||||
| enforce the resulting floor control decisions. | ||||
| [[Need more work here]] | ||||
| 5.12 Camera and Video Controls | ||||
| OPEN ISSUE: Originally, I was just going to say that this | ||||
| is outside the scope of conferencing. But, it does impact | ||||
| conferencing. Effectively, camera control is treated like a | ||||
| media stream. The mixer would combine the various requests | ||||
| across participants and direct them to the appropriate | ||||
| device. How does that work though? In a video conference | ||||
| with 4 participants, the camera control needs to identify | ||||
| the specific user whose camera is to be controlled. That is | ||||
| something unique to conferencing. | ||||
| 6 Physical Realization | 6 Physical Realization | |||
| In this section, we present several physical instantiations of these | In this section, we present several physical instantiations of these | |||
| components, to show how these basic functions can be combined to | components, to show how these basic functions can be combined to | |||
| solve a variety of problems. | solve a variety of problems. | |||
| 6.1 Centralized Server | 6.1 Centralized Server | |||
| In the most simplistic realization of this framework, there is a | In the most simplistic realization of this framework, there is a | |||
| single physical server in the network which implements the focus, the | single physical server in the network which implements the focus, the | |||
| conference policy server, the media policy server, and the mixer. | conference policy server, and the mixers. This is the classic "one | |||
| This is the classic "one box" solution, shown in Figure 3. | box" solution, shown in Figure 4. | |||
| 6.2 Endpoint Server | 6.2 Endpoint Server | |||
| Another important model is that of a locally-mixed ad-hoc conference. | Another important model is that of a locally-mixed ad-hoc conference. | |||
| In this scenario, two users (A and B) are in a regular point-to-point | In this scenario, two users (A and B) are in a regular point-to-point | |||
| call. One of the participants (A) decides to conference in a third | call. One of the participants (A) decides to conference in a third | |||
| participant, C. To do this, A begins acting as a focus. Its existing | participant, C. To do this, A begins acting as a focus. Its existing | |||
| dialog with B becomes the first dialog attached to the focus. B would | dialog with B becomes the first dialog attached to the focus. A would | |||
| re-INVITE A on that dialog, changing its Contact URI to a new value | re-INVITE B on that dialog, changing its Contact URI to a new value | |||
| which identifies the focus. In essence, A "mutates" from a single- | which identifies the focus. In essence, A "mutates" from a single- | |||
| user UA to a focus plus a single user UA, and in the process of such | user UA to a focus plus a single user UA, and in the process of such | |||
| a mutation, its URI changes. Then, the focus makes an outbound INVITE | a mutation, its URI changes. Then, the focus makes an outbound INVITE | |||
| to C. When C accepts, it mixes the media from A and C together, | to C. When C accepts, it mixes the media from B and C together, | |||
| redistributing the results. The mixed media is also played locally. | redistributing the results. The mixed media is also played locally. | |||
| Figure 4 shows a diagram of this transition. | Figure 5 shows a diagram of this transition. | |||
| It is important to note that the external interfaces in this model, | It is important to note that the external interfaces in this model, | |||
| between A and B, and between B and C, are exactly the same to those | ||||
| that would be used in a centralized server model. B could also | ||||
| include a conference policy server and conference notification | ||||
| service, allowing the participants to have access to them if they so | ||||
| desired. Just because the focus is co-resident with a participant | ||||
| does not mean any aspect of the behaviors and external interfaces | ||||
| will change. | ||||
| 6.3 Media Server Component | ||||
| In this model, shown in Figure 6, each conference involves two | ||||
| centralized servers. One of these servers, referred to as the | ||||
| "application server" owns and manages the membership and media | ||||
| policies, and maintains a dialog with each participant. As a result, | ||||
| it represents the focus seen by all participants in a conference. | ||||
| However, this server doesn't provide any media support. To perform | ||||
| Conference Server | Conference Server | |||
| ................................... | ................................... | |||
| . . | . . | |||
| . +------+ +------------+ . | . +------------+ . | |||
| . |Media | | Conference | . | . | Conference | . | |||
| . |Policy| |Notification| . | . |Notification| . | |||
| . |Server| | Server | . | . | Server | . | |||
| . +------+ +------------+ . | . +------------+ . | |||
| . +----------+ . | . +----------+ . | |||
| . |Conference| . | . |Conference| +-----+ . | |||
| . | Policy | +-------+ +-----+ . | . | Policy | +-------+ +-----+| . | |||
| . | Server | | Focus | |Mixer| . | . | Server | | Focus | |Mixer|+ . | |||
| . +----------+ +-------+ +-----+ . | . +----------+ +-------+ +-----+ . | |||
| ................//.\.......--./.... | ................//.\.....***....... | |||
| // \ ---- / | // \ *** * | |||
| // -\- /RTP | // *** * RTP | |||
| SIP // ---- \ / | SIP // *** \ * | |||
| // --- \SIP / | // *** \SIP * | |||
| // ---- RTP \ / | // *** RTP \ * | |||
| / -- \ / | / ** \ * | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| |Participant| |Participant| | |Participant| |Participant| | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| Figure 3: Centralized server architecture | Figure 4: Centralized server architecture | |||
| between A and B, and between B and C, are exactly the same to those | ||||
| that would be used in a centralized server model. B could also | ||||
| include a media policy server and conference subscription server too, | ||||
| allowing the participants to have access to them if they so desired. | ||||
| Just because the focus is co-resident with a participant does not | ||||
| mean any aspect of the behaviors and external interfaces will change. | ||||
| 6.3 Media Server Component | the actual media mixing function, it makes use of a second server, | |||
| called the "mixing server". This server includes a focus, and a | ||||
| conference policy server, but has no conference notification service. | ||||
| It has a default membership policy, which accepts all invitations | ||||
| from the top-level focus. Its conference policy server accepts any | ||||
| controls made by the application server. The focus in the application | ||||
| B B | B B | |||
| +------+ +------+ | +------+ +------+ | |||
| | | | | | | | | | | |||
| | UA | | UA | | | UA | | UA | | |||
| | | | | | | | | | | |||
| +------+ +------+ | +------+ +------+ | |||
| | . | . | | . | . | |||
| | . | . | | . | . | |||
| | . | . | | . | . | |||
| | . Transition | . | | . Transition | . | |||
| | . ------------> | . | | . ------------> | . | |||
| SIP| .RTP SIP| .RTP | SIP| .RTP SIP| .RTP | |||
| | . | . | | . | . | |||
| | . | . | | . | . | |||
| | . | . | | . | . | |||
| | . | . | | . | . | |||
| | . +----------+ | | . +----------+ | |||
| +------+ | +------+ | SIP +------+ | +------+ | +------+ | SIP +------+ | |||
| | | | |Focus | |----------| | | | | | |Focus | |----------| | | |||
| | UA | | |M.Pol.| | | UA | | | UA | | |C.Pol.| | | UA | | |||
| | | | |C.Pol.| |..........| | | | | | |Mixers| |..........| | | |||
| +------+ | |Mixer | | RTP +------+ | +------+ | | | | RTP +------+ | |||
| | +------+ | | | +------+ | | |||
| A | + | C | A | + | C | |||
| | + <..|....... | | + <..|....... | |||
| | + | . | | + | . | |||
| | +------+ | . | | +------+ | . | |||
| | |Parti-| | . | | |Parti-| | . | |||
| | |cipant| | . | | |cipant| | . | |||
| | | | | . | | | | | . | |||
| | +------+ | . | | +------+ | . | |||
| +----------+ . | +----------+ . | |||
| B . | A . | |||
| . | . | |||
| Internal | Internal | |||
| Interface | Interface | |||
| Figure 4: Transition from two-party call to conference | Figure 5: Transition from two-party call to conference | |||
| server uses third party call control to connect the media streams of | ||||
| each user to the mixing server, as needed. If the focus in the | ||||
| application server receives a conference policy control command from | ||||
| +------------+ +------------+ | +------------+ +------------+ | |||
| | App Server| SIP |Conf. Cmpnt.| | | App Server| SIP |Conf. Cmpnt.| | |||
| | |-------------| | | | |-------------| | | |||
| | Focus | Conf. Proto | Focus | | | Focus | Conf. Proto | Focus | | |||
| | C.Pol |-------------| M.Pol | | | C.Pol |-------------| C.Pol | | |||
| | M.Pol | Media Proto | Mixer | | | | Media Proto | Mixers | | |||
| |Notification|-------------| | | |Notification|-------------| | | |||
| | | | | | | | | | | |||
| +------------+ +------------+ | +------------+ +------------+ | |||
| | \ .. . | | \ .. . | |||
| | \\ RTP... . | | \\ RTP... . | |||
| | \\ .. . | | \\ .. . | |||
| | SIP \\ ... . | | SIP \\ ... . | |||
| SIP | \\ ... .RTP | SIP | \\ ... .RTP | |||
| | ..\ . | | ..\ . | |||
| | ... \\ . | | ... \\ . | |||
| | ... \\ . | | ... \\ . | |||
| | .. \\ . | | .. \\ . | |||
| | ... \\ . | | ... \\ . | |||
| | .. \ . | | .. \ . | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| |Participant| |Participant| | |Participant| |Participant| | |||
| +-----------+ +-----------+ | +-----------+ +-----------+ | |||
| Figure 5: Media server component model | Figure 6: Media server component model | |||
| In this model, shown in Figure 5, each conference involves two | ||||
| centralized servers. One of these servers, referred to as the | ||||
| "application server" owns and manages the conference and media | ||||
| policies, and maintains a dialog with each participant. As a result, | ||||
| it represents the focus seen by all participants in a conference. | ||||
| However, this server doesn't provide any media support. To perform | ||||
| the actual media mixing function, it makes use of a second server, | ||||
| called the "mixing server". This server includes a focus, but has no | ||||
| conference policy server or conference notification service. It has a | ||||
| default conference policy, which accepts all invitations from the | ||||
| top-level focus. Its media policy server accepts any controls made by | ||||
| the application server. The focus in the application server uses | ||||
| third party call control to connect the media streams of each user to | ||||
| the mixing server, as needed. If the focus in the application server | ||||
| receives a media policy control command from a client, it delegates | ||||
| that to the media server by making the same media policy control | ||||
| command to it. | ||||
| This model allows for the mixing server to be used as a resource for | This model allows for the mixing server to be used as a resource for | |||
| a variety of different conferencing applications. This is because it | a variety of different conferencing applications. This is because it | |||
| is unaware of any conference or media policies; it is merely a | is unaware of any conference or media policies; it is merely a | |||
| "slave" to the top-level server, doing whatever it asks. This is | "slave" to the top-level server, doing whatever it asks. This is | |||
| consistent with the SIP Application Server Component Model [10]. | consistent with the SIP Application Server Component Model [18]. | |||
| 6.4 Distributed Mixing | 6.4 Distributed Mixing | |||
| In a distributed mixed conference, there is still a centralized | In a distributed mixed conference, there is still a centralized | |||
| server which implements the focus, conference policy server, and | server which implements the focus, conference policy server, and | |||
| media policy server. However, there is no centralized mixer. Rather, | media policy server. However, there are no centralized mixers. | |||
| there is a mixer in each endpoint, along with a media policy server. | Rather, there are mixers in each endpoint, along with a conference | |||
| The focus distributes the media by using third party call control | policy server. The focus distributes the media by using third party | |||
| [11] to move a media stream between each participant and each other | call control [19] to move a media stream between each participant and | |||
| participant. As a result, if there are N participants in the | each other participant. As a result, if there are N participants in | |||
| conference, there will be a single dialog between each participant | the conference, there will be a single dialog between each | |||
| and the focus, but the session description associated with that | participant and the focus, but the session description associated | |||
| dialog will be constructed to allow media to be distributed amongst | with that dialog will be constructed to allow media to be distributed | |||
| the participants. This is shown in Figure 6. | amongst the participants. This is shown in Figure 7. | |||
| There are several ways in which the media can be distributed to each | There are several ways in which the media can be distributed to each | |||
| participant for mixing. In a multi-unicast model, each participant | participant for mixing. In a multi-unicast model, each participant | |||
| sends a copy of its media to each other participant. In this case, | sends a copy of its media to each other participant. In this case, | |||
| the session description manages N-1 media streams. In a multicast | the session description manages N-1 media streams. In a multicast | |||
| model, each participant joins a common multicast group, and each | model, each participant joins a common multicast group, and each | |||
| participant sends a single copy of its media stream to that group. | participant sends a single copy of its media stream to that group. | |||
| The underlying multicast infrastructure then distributes the media, | The underlying multicast infrastructure then distributes the media, | |||
| so that each participant gets a copy. In a single-source multicast | so that each participant gets a copy. In a single-source multicast | |||
| model (SSM), each participant sends its media stream to a central | model (SSM), each participant sends its media stream to a central | |||
| point, using unicast. The central point then redistributes the media | point, using unicast. The central point then redistributes the media | |||
| to all participants using multicast. The focus is responsible for | to all participants using multicast. The focus is responsible for | |||
| selecting the modality of media distribution, and for handling any | selecting the modality of media distribution, and for handling any | |||
| hybrids that would be necessitated from clients with mixed | hybrids that would be necessitated from clients with mixed | |||
| capabilities. | capabilities. | |||
| When a new participant joins or is added, the focus will perform the | When a new participant joins or is added, the focus will perform the | |||
| necessary third party call control to distribute the media from the | necessary third party call control to distribute the media from the | |||
| new participant to all the other participants, and vice-a-versa. | new participant to all the other participants, and vice-a-versa. | |||
| The central conference server also includes a media policy server. Of | The central conference server also includes a conference policy | |||
| course, the central conference server cannot implement any of the | server. Of course, the central conference server cannot implement any | |||
| media policies directly. Rather, it would delegate the implementation | of the media policies directly. Rather, it would delegate the | |||
| to the media policy servers co-resident with a participant. As an | implementation to the conference policy servers co-resident with a | |||
| example, if a participant decides to switch the overall conference | participant. As an example, if a participant decides to switch the | |||
| mode from "video follows audio" to "tiled video", they would | overall conference mode from "voice activated" to "continuous | |||
| communicate with the central media policy server. This media policy | presence", they would communicate with the central conference policy | |||
| server, in turn, would communicate with the media policy servers co- | server. The conference policy server, in turn, would communicate with | |||
| resident with each participant, using the same media policy control | the conference policy servers co-resident with each participant, | |||
| protocol, and instruct them to use "tiled video". | using the same conference policy control protocol, and instruct them | |||
| to use "continuous presence". | ||||
| This model requires additional functionality in user agents, which | This model requires additional functionality in user agents, which | |||
| may or may not be present. The participants, therefore, must be able | may or may not be present. The participants, therefore, must be able | |||
| to advertise this capability to the focus. | to advertise this capability to the focus. | |||
| 6.5 Cascaded Mixers | 6.5 Cascaded Mixers | |||
| In very large conferences, it may not be possible to have a single | In very large conferences, it may not be possible to have a single | |||
| mixer that can handle all of the media. A solution to this is to use | mixer that can handle all of the media. A solution to this is to use | |||
| cascaded mixers. In this architecture, there is a centralized focus, | cascaded mixers. In this architecture, there is a centralized focus, | |||
| but the mixing function is implemented by a multiplicity of mixers, | but the mixing function is implemented by a multiplicity of mixers, | |||
| scattered throughout the network. Each participant is connected to | scattered throughout the network. Each participant is connected to | |||
| one, and only one of the mixers. The focus uses some kind of control | one, and only one of the mixers. The focus uses some kind of control | |||
| protocol (such as MEGACO [9]) to connect the mixers together, so that | protocol to connect the mixers together, so that all of the | |||
| all of the participants can hear each other. | participants can hear each other. | |||
| This architecture is shown in Figure 7. | This architecture is shown in Figure 8. | |||
| 7 Common Operations | 7 Security Considerations | |||
| There are a large number of ways in which users can interact with a | Conferences frequently require security features in order to properly | |||
| conference. They can join, leave, set policies, approve members, and | operate. The conference policy may dictate that only certain | |||
| so on. This section is meant as an overview of the basic primitives, | participants can join, or that certain participants can create new | |||
| summarizing how they operate. More detailed examples with complete | policies. Generally speaking, conference applications are very | |||
| call flows can be found in [12]. | concerned about authorization decisions. Mechanisms for establishing | |||
| and enforcing such authorization rules is a central concept | ||||
| throughout this document. | ||||
| 7.1 Creating Conferences | Of course, authorization rules require authentication. Normal SIP | |||
| authentication mechanisms should suffice for the conference | ||||
| authorization mechanisms described here. | ||||
| There are many ways in which a conference can be created. Ultimately, | 8 Contributors | |||
| all of them result in the establishment of a conference URI which | ||||
| identifies a focus. In all cases, a conference URI must be created by | This document is the result of discussions amongst the conferencing | |||
| the focus itself, or an element which is responsible for managing | design team. The members of this team include: | |||
| URIs that are used by the focus. Otherwise, the uniqueness of | ||||
| conference URIs could not be guaranteed. | ||||
| Alan Johnston | ||||
| Brian Rosen | ||||
| Rohan Mahy | ||||
| Henning Schulzrinne | ||||
| Orit Levin | ||||
| Roni Even | ||||
| Tom Taylor | ||||
| Petri Koskelainen | ||||
| Nermeen Ismail | ||||
| Andy Zmolek | ||||
| Joerg Ott | ||||
| Dan Petrie | ||||
| +---------+ | +---------+ | |||
| |Partcpnt | | |Partcpnt | | |||
| media | | media | media | | media | |||
| ...............| |.................. | ...............| |.................. | |||
| . | Mixer | . | . | Mixers | . | |||
| . |M.Pol.Srv| . | . |C.Pol.Srv| . | |||
| . +---------+ . | . +---------+ . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . dialog | . | . dialog | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . +---------+ . | . +---------+ . | |||
| . |Cnf.Srvr.| . | . |Cnf.Srvr.| . | |||
| . | | . | . | | . | |||
| . | Focus | . | . | Focus | . | |||
| . |M.Pol.Srv| . | . |C.Pol.Srv| . | |||
| . / |C.Pol.Srv| \ . | . / | | \ . | |||
| . / +---------+ \ . | . / +---------+ \ . | |||
| . / \ . | . / \ . | |||
| . / \ . | . / \ . | |||
| . / dialog \ . | . / dialog \ . | |||
| . / \ . | . / \ . | |||
| . /dialog \ . | . /dialog \ . | |||
| . / \ . | . / \ . | |||
| . / \ . | . / \ . | |||
| . / \ . | . / \ . | |||
| . . | . . | |||
| +---------+ +---------+ | +---------+ +---------+ | |||
| |Partcpnt | |Partcpnt | | |Partcpnt | |Partcpnt | | |||
| | | | | | | | | | | |||
| | | ......................... | | | | | ......................... | | | |||
| | Mixer | | Mixer | | | Mixers | | Mixers | | |||
| |M.Pol.Srv| media |M.Pol.Srv| | |C.Pol.Srv| media |C.Pol.Srv| | |||
| +---------+ +---------+ | +---------+ +---------+ | |||
| Figure 6: Dialog and media streams in a distributed mixed conference | Figure 7: Dialog and media streams in a distributed mixed conference | |||
| +---------+ | ||||
| +-----------------------| |------------------------+ | ||||
| | ++++++++++++++++++++| |++++++++++++++++++ | | ||||
| | + +------| Focus |---------+ + | | ||||
| | + | | | | + | | ||||
| | + | +-| |--+ | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | | + | | + | | ||||
| | + | | + | | + | | ||||
| | + | | + | | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | | | | | | + | | ||||
| | + | | | Mixer 2 | | | + | | ||||
| | + | | | | | | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | |... . .... | | + | | ||||
| | + .|....| . .|.... | + | | ||||
| | + ...... | | . | ..|... + | | ||||
| | + ... | | . | | ....+ | | ||||
| | +---------+ | | +---------+ | | +---------+ | | ||||
| | | | | | | | | | | | | | ||||
| | | Mixer 2 | | | | Mixer 3 | | | | Mixer 4 | | | ||||
| | | | | | | | | | | | | | ||||
| | +---------+ | | +---------+ | | +---------+ | | ||||
| | . . | | . . | | . . | | ||||
| | . . | | .. . | | .. . | | ||||
| | . . | | . . | | . . | | ||||
| +---------+ . | +---------+ . | +---------+ . | | ||||
| | Prtcpnt | . | | Prtcpnt | . | | Prtcpnt | . | | ||||
| | 1 | . | | 1 | . | | 1 | . | | ||||
| +---------+ . | +---------+ . | +---------+ . | | ||||
| . | . | . | | ||||
| +---------+ +---------+ +---------+ | ||||
| | Prtcpnt | | Prtcpnt | | Prtcpnt | | ||||
| | 1 | | 1 | | 1 | | ||||
| +---------+ +---------+ +---------+ | ||||
| ------- SIP Dialog | ||||
| ....... Media Flow | ||||
| +++++++ Control Protocol | ||||
| Figure 7: Cascaded Mixers | ||||
| protocol, a client can instruct the conference policy server to | ||||
| create a new conference. The result of this operation is a conference | ||||
| URI, which is returned to the client. | ||||
| Another way to obtain a conference URI is to literally guess. In an | ||||
| instant conferencing server, there are literally an infinite number | ||||
| of conference URIs which can be used. Each of them is a valid | ||||
| conference URI, since it identifies a focus, and when an INVITE is | ||||
| sent to it, will join the user into that conference. As a result, a | ||||
| client can simply choose one of them at random, so long as it is | ||||
| configured with the domain portion of the URI and any naming | ||||
| conventions in use by the instant conferencing server. | ||||
| OPEN ISSUE: Do we need to specify standards for this? | ||||
| The previous two approaches are used to obtain conference URIs for | ||||
| focuses that are hosted within centralized servers. Creation of | ||||
| conferences where the focus resides in an endpoint operates | ||||
| differently. There, the endpoint itself creates the conference URI, | ||||
| and hands it out to other endpoints which are to be the participants. | ||||
| What differs from case to case is how the endpoint decides to create | ||||
| a conference. | ||||
| One important case is the ad-hoc conference described in Section 6.2. | ||||
| There, an endpoint unilaterally decides to create the conference | ||||
| based on local policy. The dialogs that were connected to the UA are | ||||
| migrated to the endpoint-hosted focus, using a re-INVITE to pass the | ||||
| conference URI to the newly joined participants. | ||||
| Alternatively, one UA can ask another UA to create an endpoint-hosted | ||||
| conference. This is accomplished with the SIP Join header [13]. The | ||||
| UA which receives the Join header in an invitation may need to create | ||||
| a new conference URI (a new one is not needed if the dialog that is | ||||
| being joined is already part of a conference). The conference URI is | ||||
| then handed to the recently joined participants through a re-INVITE. | ||||
| 7.2 Adding Participants | ||||
| There are two modes for adding participants to a conference - first | ||||
| party additions, and third party additions. In a first party | ||||
| addition, the participant that wishes to join makes a direct attempt | ||||
| to join. In a third party addition, some other participant takes | ||||
| action with the aim of causing a third party to be added to the | ||||
| conference. | ||||
| First person additions are trivially accomplished with a standard | ||||
| INVITE. A participant can send an INVITE request to the conference | ||||
| URI, and if the conference policy allows them to join, they are added | ||||
| to the conference. | ||||
| If a UA does not know the conference URI, but has learned about a | ||||
| dialog which is connected to a conference (by using the dialog event | ||||
| package, for example [14]), the UA can join the conference by using | ||||
| the Join header to join the dialog. | ||||
| Third party invitations can be done in one of several ways. The first | ||||
| approach is for the user to ask the third party to send an INVITE to | ||||
| the conference URI. This can be done automatically through the usage | ||||
| of REFER [15]. The participant would send a REFER request to the | ||||
| third party. The Refer-To header field in that request would contain | ||||
| the conference URI. There are countless non-automated means for | ||||
| asking a participant to send an INVITE to the conference URI. A user | ||||
| can send an instant message [16] to the third party, containing an | ||||
| HTML document which requests the user to click on the hyperlink to | ||||
| join the conference: | ||||
| <html> | ||||
| Hey, would you like to <a href="sip:9sf88fk-99sd@conferences.com">join | ||||
| </a> the conference now? | ||||
| </html> | ||||
| The second approach for third party additions is for the participant | ||||
| to ask the focus to add the third party to the conference. In this | ||||
| case, however, a REFER cannot be used. REFER would have the effect of | ||||
| telling the focus to send an INVITE to the new potential participant. | ||||
| However, just sending this INVITE is not sufficient for adding the | ||||
| new member. In more complex realizations, such as the distributed | ||||
| mixing scenario of Section 6.4, a multiplicity of invitations will | ||||
| need to be sent. This would require the focus to attach additional | ||||
| meaning to REFER; it would have to be interpreted as a request to add | ||||
| a participant to the conference. However, it is fundamental to the | ||||
| concept of REFER that the recipient not attach specific application | ||||
| semantics to it. Therefore, it cannot be used. Rather, the user would | ||||
| use the conference policy control protocol to request that the focus | ||||
| add the new participant. The conference policy control protocol can | ||||
| also be used to add a multiplicity of new users. This is referred to | ||||
| as mass invitation. | ||||
| In many cases, a new participant will not wish to join the conference | ||||
| unless they can join with a particicular set of policies. As an | ||||
| example, a participant may want to join anonymously, so that other | ||||
| participants know that someone has joined, but not who. To accomplish | ||||
| this, the conference policy control protocol is used to establish | ||||
| these policies prior to the generation or acceptance of an invitation | ||||
| to the conference. For example, if a user wishes to join a conference | ||||
| with a known conference URI, the user would obtain the URI for the | ||||
| conference policy, manipulate the policy to set themself as an | ||||
| anonymous participant, and then actually join the conference by | ||||
| sending an INVITE request to the conference URI. | ||||
| OPEN ISSUE: Will this always work? Are there cases where | ||||
| the conference policy cannot be manipulated until the | ||||
| INVITE has been sent? This would require a preconditions- | ||||
| style solution. | ||||
| 7.3 Removing Participants | ||||
| As with additions, there are two modalities for departures - first | ||||
| person (in which a user explicitly leaves), and third person, where | ||||
| they are removed by a different user. | ||||
| First person departures are trivially accomplished by terminating the | ||||
| dialog that the participant is using to connect to the focus. | ||||
| Third person departures can be done in one of two ways. First, a user | ||||
| can make use of the REFER method to instruct the third party to send | ||||
| a BYE to the conference server on the dialog that connects them to | ||||
| the focus. This requires the user to have knowledge of the dialog | ||||
| identifiers used by that participant. The second mechanism, which is | ||||
| much cleaner, is to use the conference policy control protocol to | ||||
| inform the focus that the participant is explicitly barred from the | ||||
| conference. This will cause the focus to eject the user, sending them | ||||
| a BYE in addition to whatever other signaling is needed to remove | ||||
| them. The conference policy control protocol can also be used to | ||||
| remove a large number of users. This is generally referred to as mass | ||||
| ejection. | ||||
| 7.4 Approving Policy Changes | ||||
| A conference policy for a particular conference may designate one or | ||||
| more users as moderators for some set of media policy or conference | ||||
| policy change requests. This means that those moderators need to | ||||
| approve the specific policy change. Typically, moderators are used to | ||||
| approve member additions and removals. However, the framework allows | ||||
| for moderators to be associated with any policy change that can be | ||||
| made. | ||||
| The general model to support moderator approval is through the | ||||
| conference notification service. The moderator subscribes to the | ||||
| notification service. They are authenticated by the focus, which | ||||
| determines that they are a moderator for the conference. Whenever a | ||||
| policy change request is made by a client that requires moderator | ||||
| approval, the policy change is not actually committed. Rather, it is | ||||
| marked as pending by the conference policy server. Any moderators for | ||||
| that specific policy request who are subscribed to the conference | ||||
| notification service will receive a notification of the pending | ||||
| change. The moderators, using the conference policy control protocol, | ||||
| can approve the specific change. This commits the new policy. All | ||||
| participants are then notified of the new policy through the | ||||
| notification service. | ||||
| 7.5 Creating Sidebars | ||||
| A sidebar is a "conference within a conference", allowing a subset of | ||||
| the participants to converse amongst themselves. Frequently, | ||||
| participants in a sidebar will still receive media from the main | ||||
| conference, but "in the background". For audio, this may mean that | ||||
| the volume of the media is reduced, for example. | ||||
| There are two ways to represent a sidebar in this framework. The | ||||
| first is to treat it as a specific kind of media policy. It is a | ||||
| media policy which would request that sidebar participants be "in the | ||||
| foreground", and others "in the background". There are no additional | ||||
| dialogs or conferences established. The media policy control protocol | ||||
| would allow a user to explicitly request sidebars. The server would | ||||
| alert users (through the notification service) that they have been | ||||
| invited to the sidebar. They would use the media policy control | ||||
| protocol to approve their participation in it. | ||||
| An alternative view is that a sidebar truly is a conference within a | ||||
| conference, and would be implemented that way. There would be a new | ||||
| conference URI associated with the sidebar. Standard techniques would | ||||
| be used to add users to the sidebar, approve their membership, and so | ||||
| on. The sidebar would itself be a participant in the main conference. | ||||
| Users would continue to receive their media stream only through the | ||||
| main conference. They would have a dialog with the sidebar focus, but | ||||
| no media would be exchanged on this dialog. | ||||
| OPEN ISSUE: It is still unclear as to which model is | 9 Changes since draft-rosenberg-sipping-conferencing-framework-00 | |||
| preferrable. We should pick one. | ||||
| 8 Security Considerations | o Rework of terminology. | |||
| Conferences frequently require security features in order to properly | o More details on moderating policy changes. | |||
| operate. The conference policy may dictate that only certain | ||||
| participants can join, or that certain participants can create new | ||||
| policies. Generally speaking, conference applications are very | ||||
| concerned about authorization decisions. Mechanisms for establishing | ||||
| and enforcing such authorization rules is a central concept | ||||
| throughout this document. | ||||
| Of course, authorization rules require authentication. Normal SIP | o Rework of the overview, and in particular, a shift of focus | |||
| authentication mechanisms should suffice for the the conference | from basic/complex conferences (a term which has been removed) | |||
| authorization mechanisms described here. | to conference aware/unaware participants. | |||
| 9 Contributors | o Removal of explicit reference to megaco for controlling a | |||
| mixer. | ||||
| This document is the result of discussions amongst the conferencing | o Discussion of a lot more conferencing operations. | |||
| design team. The members of this team include: | ||||
| Brian Rosen | o New sidebar mechanism. | |||
| Rohan Mahy | ||||
| Henning Schulzrinne | ||||
| Orit Levin | ||||
| Roni Even | ||||
| Tom Taylor | ||||
| Petri Koskelainen | ||||
| Nermeen Ismail | ||||
| Andy Zmolek | ||||
| Joerg Ott | ||||
| Dan Petrie | ||||
| 10 Authors Addresses | 10 Authors Addresses | |||
| Jonathan Rosenberg | Jonathan Rosenberg | |||
| dynamicsoft | dynamicsoft | |||
| 72 Eagle Rock Avenue | 72 Eagle Rock Avenue | |||
| First Floor | First Floor | |||
| East Hanover, NJ 07936 | East Hanover, NJ 07936 | |||
| email: jdrosen@dynamicsoft.com | email: jdrosen@dynamicsoft.com | |||
| skipping to change at page 30, line 4 ¶ | skipping to change at page 35, line 36 ¶ | |||
| 72 Eagle Rock Avenue | 72 Eagle Rock Avenue | |||
| First Floor | First Floor | |||
| East Hanover, NJ 07936 | East Hanover, NJ 07936 | |||
| email: jdrosen@dynamicsoft.com | email: jdrosen@dynamicsoft.com | |||
| 11 Normative References | 11 Normative References | |||
| 12 Informative References | 12 Informative References | |||
| [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. | [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. | |||
| Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session | Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session | |||
| initiation protocol," RFC 3261, Internet Engineering Task Force, June | initiation protocol," RFC 3261, Internet Engineering Task Force, June | |||
| 2002. | 2002. | |||
| [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a | [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a | |||
| transport protocol for real-time applications," RFC 1889, Internet | transport protocol for real-time applications," RFC 1889, Internet | |||
| Engineering Task Force, Jan. 1996. | Engineering Task Force, Jan. 1996. | |||
| [3] O. Levin et al. , "Requirements for tightly coupled SIP | [3] O. Levin et al. , "Requirements for tightly coupled SIP | |||
| conferencing," Internet Draft, Internet Engineering Task Force, July | conferencing," internet draft, Internet Engineering Task Force, Nov. | |||
| 2002. Work in progress. | 2002. Work in progress. | |||
| [4] A. B. Roach, "Session initiation protocol (sip)-specific event | [4] A. B. Roach, "Session initiation protocol (sip)-specific event | |||
| notification," RFC 3265, Internet Engineering Task Force, June 2002. | notification," RFC 3265, Internet Engineering Task Force, June 2002. | |||
| [5] B. Campbell and J. Rosenberg, "Instant message sessions in | [5] B. Campbell and J. Rosenberg, "Instant message sessions in | |||
| simple," Internet Draft, Internet Engineering Task Force, Oct. 2002. | SIMPLE," internet draft, Internet Engineering Task Force, Oct. 2002. | |||
| Work in progress. | Work in progress. | |||
| [6] J. Rosenberg and H. Schulzrinne, "A session initiation protocol | [6] J. Rosenberg, "A framework and requirements for application | |||
| (SIP) event package for conference state," Internet Draft, Internet | interaction in sip," Internet Draft, Internet Engineering Task Force, | |||
| Engineering Task Force, June 2002. Work in progress. | Oct. 2002. Work in progress. | |||
| [7] T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform resource | [7] A. Johnston and O. Levin, "Session initiation protocol call | |||
| control - conferencing for user agents," internet draft, Internet | ||||
| Engineering Task Force, Feb. 2003. Work in progress. | ||||
| [8] T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform resource | ||||
| identifiers (URI): generic syntax," RFC 2396, Internet Engineering | identifiers (URI): generic syntax," RFC 2396, Internet Engineering | |||
| Task Force, Aug. 1998. | Task Force, Aug. 1998. | |||
| [8] H. Schulzrinne and J. Rosenberg, "Session initiation protocol | [9] H. Schulzrinne and J. Rosenberg, "Session initiation protocol | |||
| (SIP) caller preferences and callee capabilities," Internet Draft, | (SIP) caller preferences and callee capabilities," internet draft, | |||
| Internet Engineering Task Force, July 2002. Work in progress. | Internet Engineering Task Force, Nov. 2002. Work in progress. | |||
| [9] F. Cuervo, N. Greene, A. Rayhan, C. Huitema, B. Rosen, and J. | [10] R. Mahy and D. Petrie, "The session inititation protocol (SIP) | |||
| Segers, "Megaco protocol version 1.0," RFC 3015, Internet Engineering | 'join' header," internet draft, Internet Engineering Task Force, Oct. | |||
| Task Force, Nov. 2000. | 2002. Work in progress. | |||
| [10] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application | [11] J. Rosenberg and H. Schulzrinne, "A session initiation protocol | |||
| server component architecture for SIP," Internet Draft, Internet | (SIP) event package for dialog state," internet draft, Internet | |||
| Engineering Task Force, June 2002. Work in progress. | ||||
| [12] R. Sparks, "The SIP refer method," internet draft, Internet | ||||
| Engineering Task Force, Dec. 2002. Work in progress. | ||||
| [13] "Session initiation protocol (SIP) extension for instant | ||||
| messaging," RFC 3428, Internet Engineering Task Force, Dec. 2002. | ||||
| [14] T. Moran and S. Addagatla, "Architecture for event notification | ||||
| filters," internet draft, Internet Engineering Task Force, Oct. 2002. | ||||
| Work in progress. | ||||
| [15] J. Rosenberg, "A presence event package for the session | ||||
| initiation protocol (SIP)," internet draft, Internet Engineering Task | ||||
| Force, Jan. 2003. Work in progress. | ||||
| [16] J. Rosenberg, "A watcher information event template-package for | ||||
| the session initiation protocol (SIP)," internet draft, Internet | ||||
| Engineering Task Force, Jan. 2003. Work in progress. | ||||
| +---------+ | ||||
| +-----------------------| |------------------------+ | ||||
| | ++++++++++++++++++++| |++++++++++++++++++ | | ||||
| | + +------| Focus |---------+ + | | ||||
| | + | | | | + | | ||||
| | + | +-| |--+ | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | | + | | + | | ||||
| | + | | + | | + | | ||||
| | + | | + | | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | | | | | | + | | ||||
| | + | | | Mixer 2 | | | + | | ||||
| | + | | | | | | + | | ||||
| | + | | +---------+ | | + | | ||||
| | + | |... . .... | | + | | ||||
| | + .|....| . .|.... | + | | ||||
| | + ...... | | . | ..|... + | | ||||
| | + ... | | . | | ....+ | | ||||
| | +---------+ | | +---------+ | | +---------+ | | ||||
| | | | | | | | | | | | | | ||||
| | | Mixer 2 | | | | Mixer 3 | | | | Mixer 4 | | | ||||
| | | | | | | | | | | | | | ||||
| | +---------+ | | +---------+ | | +---------+ | | ||||
| | . . | | . . | | . . | | ||||
| | . . | | .. . | | .. . | | ||||
| | . . | | . . | | . . | | ||||
| +---------+ . | +---------+ . | +---------+ . | | ||||
| | Prtcpnt | . | | Prtcpnt | . | | Prtcpnt | . | | ||||
| | 1 | . | | 1 | . | | 1 | . | | ||||
| +---------+ . | +---------+ . | +---------+ . | | ||||
| . | . | . | | ||||
| +---------+ +---------+ +---------+ | ||||
| | Prtcpnt | | Prtcpnt | | Prtcpnt | | ||||
| | 1 | | 1 | | 1 | | ||||
| +---------+ +---------+ +---------+ | ||||
| ------- SIP Dialog | ||||
| ....... Media Flow | ||||
| +++++++ Control Protocol | ||||
| Figure 8: Cascaded Mixers | ||||
| [17] J. Rosenberg and H. Schulzrinne, "An offer/answer model with | ||||
| session description protocol (SDP)," RFC 3264, Internet Engineering | ||||
| Task Force, June 2002. | ||||
| [18] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application | ||||
| server component architecture for SIP," internet draft, Internet | ||||
| Engineering Task Force, Mar. 2001. Work in progress. | Engineering Task Force, Mar. 2001. Work in progress. | |||
| [11] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, | [19] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, | |||
| "Best current practices for third party call control in the session | "Best current practices for third party call control in the session | |||
| initiation protocol," Internet Draft, Internet Engineering Task | initiation protocol," internet draft, Internet Engineering Task | |||
| Force, June 2002. Work in progress. | Force, June 2002. Work in progress. | |||
| [12] A. Johnston and O. Levin, "Session initiation call control - | Intellectual Property Statement | |||
| conferencing for user agents," Internet Draft, Internet Engineering | ||||
| Task Force, Oct. 2002. Work in progress. | ||||
| [13] R. Mahy and D. Petrie, "The session initiation protocol (sip) | ||||
| join header," Internet Draft, Internet Engineering Task Force, Oct. | ||||
| 2002. Work in progress. | ||||
| [14] J. Rosenberg and H. Schulzrinne, "A session initiation protocol | ||||
| (SIP) event package for dialog state," Internet Draft, Internet | ||||
| Engineering Task Force, June 2002. Work in progress. | ||||
| [15] R. Sparks, "The SIP refer method," Internet Draft, Internet | The IETF takes no position regarding the validity or scope of any | |||
| Engineering Task Force, July 2002. Work in progress. | intellectual property or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | ||||
| this document or the extent to which any license under such rights | ||||
| might or might not be available; neither does it represent that it | ||||
| has made any effort to identify any such rights. Information on the | ||||
| IETF's procedures with respect to rights in standards-track and | ||||
| standards-related documentation can be found in BCP-11. Copies of | ||||
| claims of rights made available for publication and any assurances of | ||||
| licenses to be made available, or the result of an attempt made to | ||||
| obtain a general license or permission for the use of such | ||||
| proprietary rights by implementors or users of this specification can | ||||
| be obtained from the IETF Secretariat. | ||||
| [16] B. Campbell and J. Rosenberg, "Session initiation protocol | The IETF invites any interested party to bring to its attention any | |||
| extension for instant messaging," Internet Draft, Internet | copyrights, patents or patent applications, or other proprietary | |||
| Engineering Task Force, Sept. 2002. Work in progress. | rights which may cover technology that may be required to practice | |||
| this standard. Please address the information to the IETF Executive | ||||
| Director. | ||||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (c) The Internet Society (2002). All Rights Reserved. | Copyright (c) The Internet Society (2003). All Rights Reserved. | |||
| This document and translations of it may be copied and furnished to | This document and translations of it may be copied and furnished to | |||
| others, and derivative works that comment on or otherwise explain it | others, and derivative works that comment on or otherwise explain it | |||
| or assist in its implementation may be prepared, copied, published | or assist in its implementation may be prepared, copied, published | |||
| and distributed, in whole or in part, without restriction of any | and distributed, in whole or in part, without restriction of any | |||
| kind, provided that the above copyright notice and this paragraph are | kind, provided that the above copyright notice and this paragraph are | |||
| included on all such copies and derivative works. However, this | included on all such copies and derivative works. However, this | |||
| document itself may not be modified in any way, such as by removing | document itself may not be modified in any way, such as by removing | |||
| the copyright notice or references to the Internet Society or other | the copyright notice or references to the Internet Society or other | |||
| Internet organizations, except as needed for the purpose of | Internet organizations, except as needed for the purpose of | |||
| End of changes. 133 change blocks. | ||||
| 809 lines changed or deleted | 1129 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||