idnits 2.17.1 

draft-rosenberg-sipping-conferencing-framework-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 28, 2002) is 7850 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'TBD' on line 501

  -- Obsolete informational reference (is this intentional?): RFC 1889 (ref.
     '2') (Obsoleted by RFC 3550)

  -- Obsolete informational reference (is this intentional?): RFC 3265 (ref.
     '4') (Obsoleted by RFC 6665)

  -- Obsolete informational reference (is this intentional?): RFC 2396 (ref.
     '7') (Obsoleted by RFC 3986)

  -- Obsolete informational reference (is this intentional?): RFC 3015 (ref.
     '9') (Obsoleted by RFC 3525)


     Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               SIPPING WG
3	Internet Draft                                              J. Rosenberg
4	                                                             dynamicsoft
5	draft-rosenberg-sipping-conferencing-framework-00.txt
6	October 28, 2002
7	Expires: April 2003

9	   A Framework for Conferencing with the Session Initiation Protocol

11	STATUS OF THIS MEMO

13	   This document is an Internet-Draft and is in full conformance with
14	   all provisions of Section 10 of RFC2026.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress".

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   To view the list Internet-Draft Shadow Directories, see
30	   http://www.ietf.org/shadow.html.

32	Abstract

34	   The Session Initiation Protocol (SIP) supports the initiation,
35	   modification, and termination of media sessions between user agents.
36	   These sessions are managed by SIP dialogs, which represent a SIP
37	   relationship between a pair of user agents. Because dialogs are
38	   between pairs of user agents, SIP's usage for two-party
39	   communications (such as a phone call), is obvious. Communications
40	   sessions with multiple participants, generally known as conferencing,
41	   is more complicated. This document defines a framework for how such
42	   conferencing can occur. This framework describes the overall
43	   architecture, terminology, and protocol components needed for multi-
44	   party conferencing.

46	                           Table of Contents

48	   1          Introduction ........................................    3
49	   2          Terminology .........................................    3
50	   3          Basic Architecture ..................................    7
51	   4          Usage of URIs .......................................   11
52	   5          Functions of the Elements ...........................   12
53	   5.1        Focus ...............................................   12
54	   5.2        Conference Policy Server ............................   13
55	   5.3        Mixers ..............................................   14
56	   5.4        Media Policy Server .................................   14
57	   5.5        Conference Notification Service .....................   15
58	   5.6        Participants ........................................   16
59	   5.7        Conference Policy ...................................   16
60	   5.8        Media Policy ........................................   17
61	   6          Physical Realization ................................   17
62	   6.1        Centralized Server ..................................   17
63	   6.2        Endpoint Server .....................................   17
64	   6.3        Media Server Component ..............................   18
65	   6.4        Distributed Mixing ..................................   21
66	   6.5        Cascaded Mixers .....................................   22
67	   7          Common Operations ...................................   22
68	   7.1        Creating Conferences ................................   22
69	   7.2        Adding Participants .................................   25
70	   7.3        Removing Participants ...............................   27
71	   7.4        Approving Policy Changes ............................   27
72	   7.5        Creating Sidebars ...................................   28
73	   8          Security Considerations .............................   28
74	   9          Contributors ........................................   29
75	   10         Authors Addresses ...................................   29
76	   11         Normative References ................................   29
77	   12         Informative References ..............................   29

79	1 Introduction

81	   The Session Initiation Protocol (SIP) [1] supports the initiation,
82	   modification, and termination of media sessions between user agents.
83	   These sessions are managed by SIP dialogs, which represent a SIP
84	   relationship between a pair of user agents. Because dialogs are
85	   between pairs of user agents, SIP's usage for two-party
86	   communications (such as a phone call), is obvious. Communications
87	   sessions with multiple participants, however, are more complicated.
88	   SIP can support many models of multi-party communications. One,
89	   referred to as loosely coupled conferences, makes use of multicast
90	   media groups. In the loosely coupled model, there is no signaling
91	   relationship between participants in the conference. There is no
92	   central point of control or conference server. Participation is
93	   gradually learned through control information that is passed as part
94	   of the conference (using the Real Time Control Protocol (RTCP) [2],
95	   for example). Loosely coupled conferences are easily supported in SIP
96	   by using multicast addresses within its session descriptions.

98	   In another model, referred to as fully distributed multiparty
99	   conferencing, each participant maintains a signaling relationship
100	   with each other participant, using SIP. There is no central point of
101	   control; it is completely distributed amongst the participants. SIP
102	   does not yet support this model.

104	   In another model, sometimes referrred to as the tightly coupled
105	   conference, there is a central point of control. Each participant
106	   connects to this central point. It provides a variety of conference
107	   functions, and may possibly perform media mixing functions as well.
108	   Tightly coupled conferences are not directly addressed by the SIP
109	   specification, although basic ones are possible without any
110	   additional protocol support.

112	   This document is one of a series of specifications that discusses
113	   tightly coupled conferences. Here, we present the overall framework
114	   for tightly coupled conferencing, referred to simply as
115	   "conferencing" from this point forward. This framework presents a
116	   general architectural model for these conferences, presents
117	   terminology used to discuss such conferences, and describes the sets
118	   of protocols involved in a conference. The aim of the framework is to
119	   meet the general requirements for conferencing that are outlined in
120	   [3].

122	2 Terminology

124	        Conference: Sadly, conference is an overused term which has
125	             different meanings in different contexts. In SIP, a
126	             conference is an instance of a multi-party conversation.

128	             Within the context of this specification, a conference is
129	             always a tightly coupled conference.

131	        Loosely Coupled Conference: A loosely coupled conference is a
132	             conference without coordinated signaling relationships
133	             amongst participants. Loosely coupled conferences use
134	             multicast for distribution of conference memberships.

136	        Tightly Coupled Conference: A tightly coupled conference is a
137	             conference in which a single user agent, referred to as a
138	             focus, maintains a dialog with each participant. The focus
139	             plays the role of the centralized manager of the
140	             conference, and is addressed by a conference URI.

142	        Focus: The focus is a SIP user agent that is addressed by a
143	             conference URI. The focus maintains a SIP signaling
144	             relationship with each participant in the conference. The
145	             focus is responsible for insuring, in some way, that each
146	             participant receives the media that make up the conference.
147	             The focus also implements conference policies. The focus is
148	             a logical role.

150	        Conference URI: A URI, usually a SIP URI, which identifies the
151	             focus of a conference.

153	        Participants: The set of user agents, each identified by a URI,
154	             which are connected to the focus for a particular
155	             conference.

157	        Conference Notification Service: A conference notification
158	             service is a logical function provided by the focus. The
159	             focus can act as a notifier [4], accepting subscriptions to
160	             the conference state, and notifying subscribers about
161	             changes to that state. The state includes the state
162	             maintained by the focus itself, the conference policy, and
163	             the media policy.

165	        Conference Policy Server: A conference policy server is a
166	             logical function which can store and manipulate rules
167	             associated with participation in a conference. These rules
168	             include directives on the lifespan of the conference, who
169	             can and cannot join the conference, definitions of roles
170	             available in the conference and the responsibilities
171	             associated with those roles, and policies on who is allowed
172	             to request which roles. The conference policy server is a
173	             logical role.

175	        Media Policy Server: A media policy server is a logical function
176	             which can store and manipulate rules associated with the
177	             media distribution of the conference. These rules can
178	             specify which participants receive media from which other
179	             participants, and the ways in which that media is combined
180	             for each participant. In the case of audio, these rules can
181	             include the relative volumes at which each participant is
182	             mixed. In the case of video, these rules can indicate
183	             whether the video is tiled, whether the video indicates the
184	             loudest speaker, and so on.

186	        Conference Policy: The set of rules manipulated by the
187	             conference policy server.

189	        Conference Policy Control Protocol: The client-server protocol
190	             used by clients to manipulate the conference policy.

192	        Media Policy: The set of rules manipulated by the media policy
193	             server. The media policy is used by the focus to determine
194	             the mixing characteristics for the conference.

196	        Media Policy Control Protocol: The client-server protocol used
197	             by clients to manipulate the media policy.

199	        Mixer: As defined in the Real Time Transport Protocol [2], a
200	             mixer receives a set of media streams, and combines their
201	             media in a type-specific manner, redistributing the result
202	             to each participant. We use the term here to include
203	             combining of non-RTP media streams as well, such as instant
204	             messaging sessions [5].

206	        Basic Conference: A basic conference is one where there is no
207	             conference policy server, media policy server, or
208	             conference subscription server - only a focus.

210	        Basic Participant: A basic participant is a participant in a
211	             conference that is not aware that it is actually in a
212	             conference. As far as the UA is concerned, it is a point-
213	             to-point call.

215	        Cascaded Conference: A conference in which a participant is the
216	             focus of another conference.

218	        Complex Conference: A complex conference includes at least one
219	             of a conference policy server, media policy server, or
220	             conference subscription server, in addition to the focus.

222	        Complex Participant: A complex participant is a participant in a
223	             conference that has learned, through automated means, that
224	             it is in a conference, and that can use a conference policy
225	             control protocol, media policy control protocol, or
226	             conference subscription, to implement advanced
227	             functionality.

229	        Conference Server: A conference server is a physical server
230	             which contains, at a minimum, the focus. It may also
231	             include a media policy server, a conference policy server,
232	             and a mixer.

234	        Singleton: In this context, a singleton is a conference
235	             participant that is not a focus. A singleton represents a
236	             single user in a conference.

238	        Conference Topology: The conference topology is a graph that
239	             defines the connectivity amongst participants connected
240	             through conferences. Each node in the graph represents a
241	             user agent, whether it is a focus or a singleton. Each leaf
242	             node in the tree represents an singleton, and an internal
243	             node represents a focus. An edge between two nodes implies
244	             that there is a SIP dialog between them. Ideally,
245	             conference topologies are trees, not arbitrary graphs.

247	        Conversation Space: For each conference URI, there is a unique
248	             conversation space. The conversation space is defined as
249	             the set of singleton in the conference topology associated
250	             with that URI. The conference topology associated with a
251	             conference URI is the one that is constructed by starting
252	             with the focus for that URI. Under normal circumstances,
253	             the set of singleton in a conversation space will all
254	             receive each others media.

256	        Instant Conference: A conference in which the focus is
257	             constructed the instant the first INVITE for a URI is
258	             received, and then destroyed in which the last participant
259	             has left.

261	        Mass Invitation: A conference policy control protocol request to
262	             invite a large number of users into the conference.

264	        Mass Ejection: A conference policy control protocol request to
265	             remove a large number of users from the conference.

267	        Sidebar: A sidebar appears to the users as a "conference within
268	             the conference". It is a dicsussion amongst a subset of the
269	             participants, not heard by the remaining participants in
270	             the conference.

272	        Anonymous Participant: An anonymous participant is one that is
273	             known to other participants (through the conference
274	             notification service), but whose identity is being
275	             withheld.

277	        Invisible Participant: An invisible participant is one that is
278	             not known to other participants in the conference. They may
279	             be known to the moderator, depending on conference policy.

281	3 Basic Architecture

283	   A SIP conference is represented by a URI. This URI identifies the
284	   focus, which is the user agent at the center of the conference. Any
285	   participant that is involved in the conference is connected to the
286	   focus by a SIP dialog. The result is a star topology, shown in Figure
287	   1.

289	   The focus has access to a conference policy and media policy, an
290	   instance of which exist for each focus. In a basic SIP conference,
291	   these policies are administratively defined.

293	   Users join the conference by sending an INVITE to the conference URI.
294	   As long as the conference policy allows, the INVITE is accepted by
295	   the focus and the user is brought into the conference. Users can
296	   leave the conference by sending a BYE, as they would in a normal
297	   call. Indeed, a participant in a basic conference does not need to
298	   know that the focus is anything other than a normal SIP user agent.

300	   Similarly, the focus can terminate a dialog with a participant,
301	   should the conference policy change to indicate that the participant
302	   is no longer allowed in the conference. A focus can also initiate an
303	   INVITE, should the conference policy indicate that the focus needs to
304	   bring a participant into the conference.

306	   The focus is responsible for making sure that the media streams which
307	   constitute the conference are available to the participants in the
308	   conference. It does that through the use of one or more mixers, each
309	   of which combines a number of input media streams to produce one or
310	   more output media streams. The focus uses the media policy to
311	   determine the proper configuration of the mixers.

313	   With these basic capabilities, a large number of common conferencing
314	   applications can be built. None of them require any extensions to
315	   SIP; they merely require that the focus is aware of its role and
316	   responsibilities in maintaining the conference. However, basic
317	   conferences do not allow for the participants to control the way in
318	   which the conference operates.

320	                           +-----------+
321	                           |           |
322	                           |           |
323	                           |Participant|
324	                           |           |
325	                           |           |
326	                           +-----------+
327	                                 |
328	                                 |SIP
329	                                 |Dialog
330	                                 |
331	                                 |
332	   +-----------+           +-----------+            +-----------+
333	   |           |           |           |            |           |
334	   |           |           |           |            |           |
335	   |Participant|-----------|   Focus   |------------|Participant|
336	   |           |  SIP      |           |   SIP      |           |
337	   |           |  Dialog   |           |   Dialog   |           |
338	   +-----------+           +-----------+            +-----------+
339	                                 |
340	                                 |
341	                                 |SIP
342	                                 |Dialog
343	                                 |
344	                                 |
345	                           +-----------+
346	                           |           |
347	                           |           |
348	                           |Participant|
349	                           |           |
350	                           |           |
351	                           +-----------+

353	   Figure 1: Basic SIP Conference

355	   A complex SIP conference is one in which additional interfaces are
356	   exposed, allowing for a richer set of controls and information on the
357	   conference. In particular, a complex SIP conference can include a
358	   conference policy server and a media policy server, and the focus can
359	   expose a conference notification service. The model for these
360	   conferences is shown in Figure 2. This figure shows the view from one
361	   participant. The conference now encompasses an additional set of
362	   functions. In addition to maintaining the dialog with the focus, the
363	   participant now has access to these other functions. It can, using a
364	   conference event package [6], SUBSCRIBE to the conference URI, and be
365	   connected to the conference notification service provided by the
366	   focus. Through this package, it can learn about changes in
367	   participants (effectively, the state of the dialogs), the media
368	   policy, and the conference policy.

370	   The participant can also communicate with the conference policy
371	   server, using a conference policy control protocol. This is a
372	   strictly client-server transactional protocol. This protocol might
373	   not be a protocol at all; it can be performed using a web interface.
374	   In this case, no standardized protocols or policies are needed.
375	   However, the web interface can only be manipulated by humans, not
376	   automata. For this reason, the participant can use a protocol
377	   designed specifically for this purpose.

379	   The participant can also communicate with the media policy server,
380	   using a media policy control protocol. This is a strictly client-
381	   server transactional operation. This can also be through a web
382	   interface, or through an explicit protocol.

384	   The focus will access the media and conference policies. There is a
385	   tight coupling between these policies and the focus. Not only does it
386	   need read access to these policies, but it needs to know when they
387	   have changed. Such changes might result in SIP signaling (for
388	   example, the ejection of a user from the conference using BYE), and
389	   most changes will require a notification to be sent to subscribers to
390	   the conference notification service.

392	   The conference policy and media policy servers need not be available
393	   in any particular conference. Even when available, they need not be
394	   used by all participants. A participant in a conference that does not
395	   access any of these functions, and which doesn't even know that the
396	   focus is a focus, is called a basic participant. A conference
397	   participant that can discover and access these additional function is
398	   a complex participant. Any conference can include basic and complex
399	   participants.

401	   The interfaces between (1) the focus and the media policy, (2) the
402	   focus and the conference policy, (3) the conference policy server and
403	   the conference policy, and (4) the media policy server and the media
404	   policy are not subject to standardization at the time of this
405	   writing. They are intended primarily to show the logical roles
406	           Conference    .....................................
407	           Policy        . +-----------+                     .
408	           Control       . |           |                     .
409	           Protocol      . |Participant|                     .
410	      +------------------->|   Policy  |                     .
411	      |                  . |  Server   |                     .
412	      |                  . |           | \                   .
413	      |     Media        . +-----------+  \                  .
414	      |     Policy       . +-----------+   \    //-----\\    .
415	      |     Control      . |           |    > ||         ||  .
416	      |     Protocol     . |   Media   |        \\-----//    .
417	      |     +------------->|  Policy   |       |         |   .
418	      |     |            . |  Server   |---->  |Conference   .
419	      |     |            . |           |       |         |   .
420	      |     |            . +-----------+       |    &    |   .
421	      |     |            .                     |         |   .
422	      |     |            .                     | Media   |   .
423	   +-----------+         . +-----------+       |   Policy|   .
424	   |           |         . |           |        \       //   .
425	   |           |         . |           |         \-----/     .
426	   |Participant|<--------->|   Focus   |            |        .
427	   |           |  SIP    . |           |            |        .
428	   |           |  Dialog . |           |<-----------+        .
429	   +-----------+         . |...........|                     .
430	             ^           . | Conference|                     .
431	             |           . |Notification                     .
432	             +------------>|  Service  |                     .
433	             Subscription. +-----------+                     .
434	                         .                                   .
435	                         .                                   .
436	                         .                                   .
437	                         .                                   .
438	                         .....................................

440	                                     Conference
441	                                      Functions

443	   Figure 2: Complex SIP Conference
444	   to encourage clarity in the requirements and to allow individual
445	   implementations the flexibility to compose a conferencing system in a
446	   scalable and robust manner.

448	4 Usage of URIs

450	   It is fundamental to this framework that a conference is uniquely
451	   identified by a URI, and that this URI identify the focus which is
452	   responsible for the conference. This URI is always a SIP or SIPS URI.

454	   The conference URI is opaque to any participants which might use it.
455	   There is no way to look at the URI, and know for certain whether it
456	   identifies a focus, as opposed to a user or an interface on a PSTN
457	   gateway. This is in line with the general philosophy of URI usage
458	   [7]. However, contextual information surrounding the URI (for
459	   example, SIP header parameters) may indicate that the URI represents
460	   a conference.

462	   The conference URI can represent a long-lived conference or interest
463	   group, such as "sip:discussion-on-dogs@example.com". The focus
464	   identified by this URI would always exist, and always be managing the
465	   conference for whatever participants are currently joined. The
466	   conference URI can also represent an "instant" conference, for
467	   example, "sip:a8sd9998as-9s8daa@example.com". An instant conference
468	   is one where the focus is instantiated when the first URI for it
469	   arrives, and then destroyed when the last participant leaves. Both of
470	   these represent variations in the policies implemented by the focus,
471	   and cannot be determined from inspection of the URI.

473	   Ideally, a conference URI is never constructed or guessed by a user.
474	   Rather, conference URIs are learned through many mechanisms. A
475	   conference URI can be emailed or sent in an instant message. A
476	   conference URI can be linked on a web page. A conference URI can be
477	   obtained from a conference policy control protocol, which can be used
478	   to create conferences and the policies associated with them.

480	   To determine that a SIP URI does represent a focus, standard
481	   techniques for URI capability discovery can be used. First, a
482	   participant can send an OPTIONS to a SIP URI, and if it represents a
483	   focus, the response will indicate such [TBD]. The response will also
484	   indicate whether or not the focus has implemented the subscription
485	   notification service. This is known by the presence of an Allow
486	   header in the response, indicating support for the SUBSCRIBE method,
487	   along with an Allow-Events header, indicating support for the
488	   conferencing package. A second method for determining that a URI
489	   represents a focus is through a refresh request. The Allow and
490	   Allow-Events headers, along with the caller preferences specification
491	   [8] can indicate the same information that would be learned through
492	   an OPTIONS query.

494	   The other functions in a conference are also represented by URIs. If
495	   the conference policy and media policy servers are implemented
496	   through web pages, these servers are regular HTTP URIs. If they are
497	   accessed using an explicit protocol, they are the URIs defined for
498	   those protocols.

500	   Starting with the conference URI, the URIs for the other logical
501	   entities in the conference can be learned using [TBD].

503	        OPEN ISSUE: I suppose we cannot say more until the protocol
504	        work is done. But, we have a requirement here - that there
505	        be a way to learn these URIs starting only with the
506	        conference URI.

508	5 Functions of the Elements

510	   This section gives a more detailed description of the functions
511	   typically implemented in each of the elements.

513	5.1 Focus

515	   As its name implies, the focus is the center of the conference. All
516	   participants in the conference are connected to it using a SIP
517	   dialog. The focus is responsible for maintaining the dialogs
518	   connected to it. It insures that the dialogs are connected to a set
519	   of participants who are allowed to participate in the conference, as
520	   defined by the conference policy. The focus also uses SIP to
521	   manipulate the media sessions, in order to make sure each participant
522	   obtains all the media for the conference. To do that, the focus makes
523	   use of the services of a mixer.

525	   When a focus receives an INVITE, it checks the conference policy. The
526	   conference policy might indicate that this participant is not allowed
527	   to join, in which case the call can be rejected. It might indicate
528	   that another participant, acting as a moderator, needs to approve
529	   this new participant. In that case, the INVITE might be parked on a
530	   music-on-hold server, or a 183 response might be sent to indicate
531	   progress. A notification, using the conference notification service,
532	   would be sent to the moderator. The moderator then has the ability to
533	   manipulate the policies using the conference policy control protocol.
534	   If the policies are changed to allow this new participant, the focus
535	   can accept the INVITE (or unpark it from the music-on-hold server).
536	   The interpretation of the conference policy by the focus is, itself,
537	   a matter of local policy, and not subject to standardization.

539	   If a participant manipulated the conference policy to indicate that a
540	   certain other participant was no longer allowed in the conference,
541	   the focus would send a BYE to that other participant to remove them.
542	   This is often referred to as "ejecting" a user from the conference.
543	   The process of ejecting fundamentally constitutes these two steps -
544	   the establishment of the policy through the conference policy
545	   protocol, and the implementation of that policy (using a BYE) by the
546	   focus.

548	   Similarly, if a participant manipulated the conference policy to
549	   indicate that a number of users need to be added to the conference,
550	   the focus would send an INVITE to those participants. This is often
551	   referred to as the "mass invitation" function. As with ejection, it
552	   is fundamentally composed of the policy functions that specify the
553	   participants which should be present, and the implementation of those
554	   functions using SIP. A policy request to add a set of users might not
555	   require an INVITE to execute it; those users might already be
556	   participants in the conference.

558	   A similar model exists for media policy. If the media policy
559	   indicates that a participant should not receive any video, the focus
560	   might implement that policy by sending a re-INVITE, removing the
561	   media stream to that participant. Alternatively, if the video is
562	   being centrally mixed, it could inform the mixer to send a black
563	   screen to that participant. The means by which the policy is
564	   implemented are not subject to specification.

566	5.2 Conference Policy Server

568	   The conference policy server allows clients to manipulate and
569	   interact with the conference policy. The conference policy is used by
570	   the focus to make authorization decisions and guide its overall
571	   behavior. Logically speaking, there is a one-to-one mapping between a
572	   conference policy and a focus.

574	   The conference policy is represented by a URI. There is a unique
575	   conference policy for each focus. The conference policy URI points to
576	   a conference policy server which can manipulate that conference
577	   policy. A conference policy server also has a "top level" URI which
578	   can be used to access functions that are independent of any
579	   conference. Perhaps the most important of these functions is the
580	   creation of a new conference. This will result in the construction of
581	   a new conference URI, which can then be used to join the conference
582	   itself.

584	   The conference policy server is accessed using a client-server
585	   transactional protocol. The client can be a participant in the
586	   conference, or it can be a third party. Access control lists for who
587	   can modify a conference policy are themselves part of the conference
588	   policy. The conference policy server also allows clients to create
589	   new conferences. This would result in the instantiation of a focus
590	   (and therefore, a conference URI associated with that focus), a
591	   conference policy, and a media policy. The conference policy server
592	   will also have rules about who can create conferences.

594	   The conference policy also includes per-participant policies that
595	   specify how the focus is to handle a particular participant. These
596	   include whether or not the participant is anonymous, for example.

598	5.3 Mixers

600	   A mixer is responsible for combining the media streams that make up
601	   the conference, and generating one or more output streams that are
602	   distributed to recipients (which could be participants or other
603	   mixers). The combination process is specific to the media type, and
604	   is directed by the focus, under the guidance of the rules described
605	   in the media policy.

607	   A mixer is not aware of a "conference" as an entity, per se. A mixer
608	   receives media streams as inputs, and based on directions provided by
609	   the focus, generates media streams as outputs. There is no grouping
610	   of media streams beyond the policies that describe the ways in which
611	   the streams are mixed.

613	   A mixer is always under the control of a focus. The focus is
614	   responsible for interpreting the media policy, and then installing
615	   the appropriate rules in the mixer. If the focus is directly
616	   controlling a mixer, the mixer can either be co-resident with the
617	   focus, or can be controlled through a protocol like Megaco [9].

619	   However, a focus need not directly control a mixer. Rather, a focus
620	   can delegate the mixing to the participants, each of which has their
621	   own mixer. This is described in Section 6.4.

623	5.4 Media Policy Server

625	   The media policy server is similar to the conference policy server.
626	   It is accessed using a transactional client-server protocol. It
627	   manipulates a media policy, identified by a URI. The focus has the
628	   responsibility of acting on that media policy, implementing it
629	   through direct or indirect control of mixers.

631	   The media policy describes the way in which the set of inputs to the
632	   mixer are combined to generate the set of outputs. Media policies can
633	   span media types. In other words, the policy on how one media stream
634	   is mixed can be based on characteristics of other media streams.

636	   Media policies can be based on any quantifiable characteristic of the
637	   media stream (its source, volume, codecs, speaking/silence, etc.),
638	   and they can be based on internal or external variables accessible by
639	   the media policy.

641	   The media policy server is responsible for reconciliation of
642	   potentially conflicting requests regarding the media policy for the
643	   conference.

645	   The client of the media policy protocol can be any entity interested
646	   in manipulating media policies. Clearly, participants might be
647	   interested in manipulating them. A participant might want to raise or
648	   lower the volume for one of the other participants it is hearing. Or,
649	   a participant might want to switch from a tiled video view, to just
650	   viewing the active speaker. A client of the media policy protocol
651	   could also be another server whose job is to determine the media
652	   policy. As an example, a floor control server is responsible for
653	   determining which participant(s) in a conference are allowed to speak
654	   at any given time, based on participant requests and access rules.
655	   The floor control server would act as a client of the media policy
656	   server, and inform the media policy server about who is allowed to
657	   speak.

659	   The client of the media policy protocol could also be another media
660	   policy server, as described in Section 6.4.

662	   Some examples of media policies include:

664	        o The video output is the picture of the loudest speaker (video
665	          follows audio).

667	        o The audio from each participant will be mixed with equal
668	          weight, and distributed to all other participants.

670	        o The audio and video that is distributed is the one selected by
671	          the floor control server.

673	5.5 Conference Notification Service

675	   The focus can provide a conference notification service. In this
676	   role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts
677	   subscriptions from clients for the conference URI, and generates
678	   notifications to them as the state of the conference changes.

680	   This state is composed of three separate pieces. The first is the
681	   state of the focus, the second is the conference policy, and the
682	   third is the media policy.

684	   The state of the focus includes the participants connected to the
685	   focus, and information about the dialogs associated with them. As new
686	   participants join, this state would change, allowing subscribers to
687	   learn about them. Similarly, when someone leaves, this state also
688	   changes, allowing subscribers to learn about this fact.

690	   The state of the conference policy includes the set of participants
691	   that are allowed, or not allowed, to join the conference, and the set
692	   of participants who are to be explicitly added to the conference. It
693	   includes the roles which are assigned to each participant, such as
694	   whether they are a moderator. If there was a change in role, for
695	   example, a new moderator was selected, the focus would inform
696	   subscribers.

698	   The state of the media policy includes the media streams being
699	   received by each participant, the audio or video modalities, and so
700	   on.

702	5.6 Participants

704	   A participant in a conference is any SIP user agent that has a dialog
705	   with the focus. This SIP user agent can be a PC application, a SIP
706	   hardphone, or a PSTN gateway. It can also be another focus. A
707	   conference which has a participant that is the focus of another
708	   conference is called a cascaded conference. They can also be used to
709	   provide scalable conferences where there are regional sub-
710	   conferences, each of which is connected to the main conference. A
711	   conference topology refers to a graph which shows each focus and each
712	   participant as a vertex, with a connection between each participant
713	   and its focus.

715	5.7 Conference Policy

717	   The conference policy contains the rules that guide the operation of
718	   the focus. These rules can be simple, such as an access list that
719	   defines the set of allowed participants in a conference. The rules
720	   can also be incredibly complex, specifying time-of-day based rules on
721	   participation conditional on the presence of other participants. It
722	   is important to understand that there is no restriction on the type
723	   of rules that can be encapsulated in a conference policy.

725	   However, there does exist a protocol means by which a client can
726	   request a change in the conference policy. This is done by
727	   communicating with the conference policy server, which manipulates
728	   the conference policy. By the nature of conference policies, not all
729	   aspects of the policy can be manipulated with the conference policy
730	   control protocol. It is the responsibility of the conference policy
731	   server to reconcile the various requests with the conference policy.

733	5.8 Media Policy

735	   The media policy contains the rules that guide the operation of the
736	   mixer. The focus uses these rules to interact with the mixer to
737	   implement them. These rules can be simple (mix all media from all
738	   participants), or they can be incredibly complex. It is important to
739	   understand that there is no restriction on the type of rules that can
740	   be encapsulated in a media policy.

742	   However, there does exist a protocol means by which a client can
743	   request a change in the media policy. This is done by communicating
744	   with the media policy server, which manipulates the media policy. By
745	   the nature of media policies, not all aspects of the policy can be
746	   manipulated with the media policy control protocol. It is the
747	   responsibility of the media policy server to reconcile the various
748	   requests with the media policy.

750	6 Physical Realization

752	   In this section, we present several physical instantiations of these
753	   components, to show how these basic functions can be combined to
754	   solve a variety of problems.

756	6.1 Centralized Server

758	   In the most simplistic realization of this framework, there is a
759	   single physical server in the network which implements the focus, the
760	   conference policy server, the media policy server, and the mixer.
761	   This is the classic "one box" solution, shown in Figure 3.

763	6.2 Endpoint Server

765	   Another important model is that of a locally-mixed ad-hoc conference.
766	   In this scenario, two users (A and B) are in a regular point-to-point
767	   call. One of the participants (A) decides to conference in a third
768	   participant, C. To do this, A begins acting as a focus. Its existing
769	   dialog with B becomes the first dialog attached to the focus. B would
770	   re-INVITE A on that dialog, changing its Contact URI to a new value
771	   which identifies the focus. In essence, A "mutates" from a single-
772	   user UA to a focus plus a single user UA, and in the process of such
773	   a mutation, its URI changes. Then, the focus makes an outbound INVITE
774	   to C. When C accepts, it mixes the media from A and C together,
775	   redistributing the results. The mixed media is also played locally.
776	   Figure 4 shows a diagram of this transition.

778	   It is important to note that the external interfaces in this model,
779	                            Conference Server
780	                   ...................................
781	                   .                                 .
782	                   . +------+        +------------+  .
783	                   . |Media |        | Conference |  .
784	                   . |Policy|        |Notification|  .
785	                   . |Server|        |   Server   |  .
786	                   . +------+        +------------+  .
787	                   . +----------+                    .
788	                   . |Conference|                    .
789	                   . |  Policy  | +-------+ +-----+  .
790	                   . |  Server  | | Focus | |Mixer|  .
791	                   . +----------+ +-------+ +-----+  .
792	                   ................//.\.......--./....
793	                                 //    \  ----  /
794	                               //      -\-      /RTP
795	                       SIP   //    ---- \      /
796	                           //   ---      \SIP /
797	                         // ---- RTP      \   /
798	                        / --               \ /
799	                 +-----------+         +-----------+
800	                 |Participant|         |Participant|
801	                 +-----------+         +-----------+

803	   Figure 3: Centralized server architecture

805	   between A and B, and between B and C, are exactly the same to those
806	   that would be used in a centralized server model. B could also
807	   include a media policy server and conference subscription server too,
808	   allowing the participants to have access to them if they so desired.
809	   Just because the focus is co-resident with a participant does not
810	   mean any aspect of the behaviors and external interfaces will change.

812	6.3 Media Server Component
813	         B                              B
814	      +------+                       +------+
815	      |      |                       |      |
816	      |  UA  |                       |  UA  |
817	      |      |                       |      |
818	      +------+                       +------+
819	        |  .                           |  .
820	        |  .                           |  .
821	        |  .                           |  .
822	        |  .         Transition        |  .
823	        |  .        ------------>      |  .
824	     SIP|  .RTP                     SIP|  .RTP
825	        |  .                           |  .
826	        |  .                           |  .
827	        |  .                           |  .
828	        |  .                           |  .
829	        |  .                       +----------+
830	      +------+                     | +------+ |   SIP    +------+
831	      |      |                     | |Focus | |----------|      |
832	      |  UA  |                     | |M.Pol.| |          |  UA  |
833	      |      |                     | |C.Pol.| |..........|      |
834	      +------+                     | |Mixer | |   RTP    +------+
835	                                   | +------+ |
836	         A                         |     +    |             C
837	                                   |     + <..|.......
838	                                   |     +    |      .
839	                                   | +------+ |      .
840	                                   | |Parti-| |      .
841	                                   | |cipant| |      .
842	                                   | |      | |      .
843	                                   | +------+ |      .
844	                                   +----------+      .
845	                                        B            .
846	                                                     .

848	                                                   Internal
849	                                                   Interface

851	   Figure 4: Transition from two-party call to conference
852	                   +------------+             +------------+
853	                   | App  Server|  SIP        |Conf. Cmpnt.|
854	                   |            |-------------|            |
855	                   |   Focus    | Conf. Proto |   Focus    |
856	                   |   C.Pol    |-------------|   M.Pol    |
857	                   |   M.Pol    | Media Proto |   Mixer    |
858	                   |Notification|-------------|            |
859	                   |            |             |            |
860	                   +------------+             +------------+
861	                       |      \                    .. .
862	                       |       \\            RTP...   .
863	                       |         \\           ..      .
864	                       |     SIP   \\      ...        .
865	                   SIP |             \\ ...           .RTP
866	                       |              ..\             .
867	                       |           ...   \\           .
868	                       |        ...        \\         .
869	                       |      ..             \\       .
870	                       |   ...                 \\     .
871	                       | ..                      \    .
872	                  +-----------+              +-----------+
873	                  |Participant|              |Participant|
874	                  +-----------+              +-----------+

876	   Figure 5: Media server component model

878	   In this model, shown in Figure 5, each conference involves two
879	   centralized servers. One of these servers, referred to as the
880	   "application server" owns and manages the conference and media
881	   policies, and maintains a dialog with each participant. As a result,
882	   it represents the focus seen by all participants in a conference.
883	   However, this server doesn't provide any media support. To perform
884	   the actual media mixing function, it makes use of a second server,
885	   called the "mixing server". This server includes a focus, but has no
886	   conference policy server or conference notification service. It has a
887	   default conference policy, which accepts all invitations from the
888	   top-level focus. Its media policy server accepts any controls made by
889	   the application server. The focus in the application server uses
890	   third party call control to connect the media streams of each user to
891	   the mixing server, as needed. If the focus in the application server
892	   receives a media policy control command from a client, it delegates
893	   that to the media server by making the same media policy control
894	   command to it.

896	   This model allows for the mixing server to be used as a resource for
897	   a variety of different conferencing applications. This is because it
898	   is unaware of any conference or media policies; it is merely a
899	   "slave" to the top-level server, doing whatever it asks. This is
900	   consistent with the SIP Application Server Component Model [10].

902	6.4 Distributed Mixing

904	   In a distributed mixed conference, there is still a centralized
905	   server which implements the focus, conference policy server, and
906	   media policy server. However, there is no centralized mixer. Rather,
907	   there is a mixer in each endpoint, along with a media policy server.
908	   The focus distributes the media by using third party call control
909	   [11] to move a media stream between each participant and each other
910	   participant. As a result, if there are N participants in the
911	   conference, there will be a single dialog between each participant
912	   and the focus, but the session description associated with that
913	   dialog will be constructed to allow media to be distributed amongst
914	   the participants. This is shown in Figure 6.

916	   There are several ways in which the media can be distributed to each
917	   participant for mixing. In a multi-unicast model, each participant
918	   sends a copy of its media to each other participant. In this case,
919	   the session description manages N-1 media streams. In a multicast
920	   model, each participant joins a common multicast group, and each
921	   participant sends a single copy of its media stream to that group.
922	   The underlying multicast infrastructure then distributes the media,
923	   so that each participant gets a copy. In a single-source multicast
924	   model (SSM), each participant sends its media stream to a central
925	   point, using unicast. The central point then redistributes the media
926	   to all participants using multicast. The focus is responsible for
927	   selecting the modality of media distribution, and for handling any
928	   hybrids that would be necessitated from clients with mixed
929	   capabilities.

931	   When a new participant joins or is added, the focus will perform the
932	   necessary third party call control to distribute the media from the
933	   new participant to all the other participants, and vice-a-versa.

935	   The central conference server also includes a media policy server. Of
936	   course, the central conference server cannot implement any of the
937	   media policies directly. Rather, it would delegate the implementation
938	   to the media policy servers co-resident with a participant. As an
939	   example, if a participant decides to switch the overall conference
940	   mode from "video follows audio" to "tiled video", they would
941	   communicate with the central media policy server. This media policy
942	   server, in turn, would communicate with the media policy servers co-
943	   resident with each participant, using the same media policy control
944	   protocol, and instruct them to use "tiled video".

946	   This model requires additional functionality in user agents, which
947	   may or may not be present. The participants, therefore, must be able
948	   to advertise this capability to the focus.

950	6.5 Cascaded Mixers

952	   In very large conferences, it may not be possible to have a single
953	   mixer that can handle all of the media. A solution to this is to use
954	   cascaded mixers. In this architecture, there is a centralized focus,
955	   but the mixing function is implemented by a multiplicity of mixers,
956	   scattered throughout the network. Each participant is connected to
957	   one, and only one of the mixers. The focus uses some kind of control
958	   protocol (such as MEGACO [9]) to connect the mixers together, so that
959	   all of the participants can hear each other.

961	   This architecture is shown in Figure 7.

963	7 Common Operations

965	   There are a large number of ways in which users can interact with a
966	   conference. They can join, leave, set policies, approve members, and
967	   so on. This section is meant as an overview of the basic primitives,
968	   summarizing how they operate. More detailed examples with complete
969	   call flows can be found in [12].

971	7.1 Creating Conferences

973	   There are many ways in which a conference can be created. Ultimately,
974	   all of them result in the establishment of a conference URI which
975	   identifies a focus. In all cases, a conference URI must be created by
976	   the focus itself, or an element which is responsible for managing
977	   URIs that are used by the focus. Otherwise, the uniqueness of
978	   conference URIs could not be guaranteed.

980	                             +---------+
981	                             |Partcpnt |
982	                 media       |         |      media
983	              ...............|         |..................
984	              .              |  Mixer  |                 .
985	              .              |M.Pol.Srv|                 .
986	              .              +---------+                 .
987	              .                   |                      .
988	              .                   |                      .
989	              .                   |                      .
990	              .            dialog |                      .
991	              .                   |                      .
992	              .                   |                      .
993	              .                   |                      .
994	              .              +---------+                 .
995	              .              |Cnf.Srvr.|                 .
996	             .               |         |                 .
997	             .               |  Focus  |                 .
998	             .               |M.Pol.Srv|                 .
999	             .             / |C.Pol.Srv|  \              .
1000	             .            /  +---------+   \             .
1001	             .           /                  \            .
1002	             .          /                    \           .
1003	             .         /               dialog \          .
1004	             .        /                        \         .
1005	             .       /dialog                    \        .
1006	             .      /                            \       .
1007	             .     /                              \      .
1008	             .    /                                \     .
1009	             .                                           .
1010	           +---------+                           +---------+
1011	           |Partcpnt |                           |Partcpnt |
1012	           |         |                           |         |
1013	           |         | ......................... |         |
1014	           |  Mixer  |                           |  Mixer  |
1015	           |M.Pol.Srv|          media            |M.Pol.Srv|
1016	           +---------+                           +---------+

1018	   Figure 6: Dialog and media streams in a distributed mixed conference
1019	                           +---------+
1020	   +-----------------------|         |------------------------+
1021	   |   ++++++++++++++++++++|         |++++++++++++++++++      |
1022	   |   +            +------|  Focus  |---------+       +      |
1023	   |   +            |      |         |         |       +      |
1024	   |   +            |    +-|         |--+      |       +      |
1025	   |   +            |    | +---------+  |      |       +      |
1026	   |   +            |    |      +       |      |       +      |
1027	   |   +            |    |      +       |      |       +      |
1028	   |   +            |    |      +       |      |       +      |
1029	   |   +            |    | +---------+  |      |       +      |
1030	   |   +            |    | |         |  |      |       +      |
1031	   |   +            |    | | Mixer 2 |  |      |       +      |
1032	   |   +            |    | |         |  |      |       +      |
1033	   |   +            |    | +---------+  |      |       +      |
1034	   |   +            |    |...   .  .... |      |       +      |
1035	   |   +           .|....|      .      .|....  |       +      |
1036	   |   +     ...... |    |      .       |    ..|...    +      |
1037	   |   +  ...       |    |      .       |      |   ....+      |
1038	   | +---------+    |    | +---------+  |      |  +---------+ |
1039	   | |         |    |    | |         |  |      |  |         | |
1040	   | | Mixer 2 |    |    | | Mixer 3 |  |      |  | Mixer 4 | |
1041	   | |         |    |    | |         |  |      |  |         | |
1042	   | +---------+    |    | +---------+  |      |  +---------+ |
1043	   |    .    .      |    |      .  .    |      |     .   .    |
1044	   |   .      .     |    |    ..   .    |      |   ..    .    |
1045	   |  .       .     |    |   .      .   |      |  .       .   |
1046	  +---------+  .    |  +---------+  .   |    +---------+  .   |
1047	  | Prtcpnt |   .   |  | Prtcpnt |   .  |    | Prtcpnt |  .   |
1048	  |    1    |    .  |  |    1    |   .  |    |    1    |  .   |
1049	  +---------+    .  |  +---------+    . |    +---------+   .  |
1050	                  . |                 . |                  .  |
1051	           +---------+         +---------+           +---------+
1052	           | Prtcpnt |         | Prtcpnt |           | Prtcpnt |
1053	           |    1    |         |    1    |           |    1    |
1054	           +---------+         +---------+           +---------+

1056	     -------  SIP Dialog
1057	     .......  Media Flow
1058	     +++++++  Control Protocol

1060	   Figure 7: Cascaded Mixers

1062	   protocol, a client can instruct the conference policy server to
1063	   create a new conference. The result of this operation is a conference
1064	   URI, which is returned to the client.

1066	   Another way to obtain a conference URI is to literally guess. In an
1067	   instant conferencing server, there are literally an infinite number
1068	   of conference URIs which can be used. Each of them is a valid
1069	   conference URI, since it identifies a focus, and when an INVITE is
1070	   sent to it, will join the user into that conference. As a result, a
1071	   client can simply choose one of them at random, so long as it is
1072	   configured with the domain portion of the URI and any naming
1073	   conventions in use by the instant conferencing server.

1075	        OPEN ISSUE: Do we need to specify standards for this?

1077	   The previous two approaches are used to obtain conference URIs for
1078	   focuses that are hosted within centralized servers. Creation of
1079	   conferences where the focus resides in an endpoint operates
1080	   differently. There, the endpoint itself creates the conference URI,
1081	   and hands it out to other endpoints which are to be the participants.
1082	   What differs from case to case is how the endpoint decides to create
1083	   a conference.

1085	   One important case is the ad-hoc conference described in Section 6.2.
1086	   There, an endpoint unilaterally decides to create the conference
1087	   based on local policy. The dialogs that were connected to the UA are
1088	   migrated to the endpoint-hosted focus, using a re-INVITE to pass the
1089	   conference URI to the newly joined participants.

1091	   Alternatively, one UA can ask another UA to create an endpoint-hosted
1092	   conference. This is accomplished with the SIP Join header [13]. The
1093	   UA which receives the Join header in an invitation may need to create
1094	   a new conference URI (a new one is not needed if the dialog that is
1095	   being joined is already part of a conference). The conference URI is
1096	   then handed to the recently joined participants through a re-INVITE.

1098	7.2 Adding Participants

1100	   There are two modes for adding participants to a conference - first
1101	   party additions, and third party additions. In a first party
1102	   addition, the participant that wishes to join makes a direct attempt
1103	   to join. In a third party addition, some other participant takes
1104	   action with the aim of causing a third party to be added to the
1105	   conference.

1107	   First person additions are trivially accomplished with a standard
1108	   INVITE. A participant can send an INVITE request to the conference
1109	   URI, and if the conference policy allows them to join, they are added
1110	   to the conference.

1112	   If a UA does not know the conference URI, but has learned about a
1113	   dialog which is connected to a conference (by using the dialog event
1114	   package, for example [14]), the UA can join the conference by using
1115	   the Join header to join the dialog.

1117	   Third party invitations can be done in one of several ways. The first
1118	   approach is for the user to ask the third party to send an INVITE to
1119	   the conference URI. This can be done automatically through the usage
1120	   of REFER [15]. The participant would send a REFER request to the
1121	   third party. The Refer-To header field in that request would contain
1122	   the conference URI. There are countless non-automated means for
1123	   asking a participant to send an INVITE to the conference URI. A user
1124	   can send an instant message [16] to the third party, containing an
1125	   HTML document which requests the user to click on the hyperlink to
1126	   join the conference:

1128	   <html>
1129	   Hey, would you like to <a href="sip:9sf88fk-99sd@conferences.com">join
1130	   </a> the conference now?
1131	   </html>

1133	   The second approach for third party additions is for the participant
1134	   to ask the focus to add the third party to the conference. In this
1135	   case, however, a REFER cannot be used. REFER would have the effect of
1136	   telling the focus to send an INVITE to the new potential participant.
1137	   However, just sending this INVITE is not sufficient for adding the
1138	   new member. In more complex realizations, such as the distributed
1139	   mixing scenario of Section 6.4, a multiplicity of invitations will
1140	   need to be sent. This would require the focus to attach additional
1141	   meaning to REFER; it would have to be interpreted as a request to add
1142	   a participant to the conference. However, it is fundamental to the
1143	   concept of REFER that the recipient not attach specific application
1144	   semantics to it. Therefore, it cannot be used. Rather, the user would
1145	   use the conference policy control protocol to request that the focus
1146	   add the new participant. The conference policy control protocol can
1147	   also be used to add a multiplicity of new users. This is referred to
1148	   as mass invitation.

1150	   In many cases, a new participant will not wish to join the conference
1151	   unless they can join with a particicular set of policies. As an
1152	   example, a participant may want to join anonymously, so that other
1153	   participants know that someone has joined, but not who. To accomplish
1154	   this, the conference policy control protocol is used to establish
1155	   these policies prior to the generation or acceptance of an invitation
1156	   to the conference. For example, if a user wishes to join a conference
1157	   with a known conference URI, the user would obtain the URI for the
1158	   conference policy, manipulate the policy to set themself as an
1159	   anonymous participant, and then actually join the conference by
1160	   sending an INVITE request to the conference URI.

1162	        OPEN ISSUE: Will this always work? Are there cases where
1163	        the conference policy cannot be manipulated until the
1164	        INVITE has been sent? This would require a preconditions-
1165	        style solution.

1167	7.3 Removing Participants

1169	   As with additions, there are two modalities for departures - first
1170	   person (in which a user explicitly leaves), and third person, where
1171	   they are removed by a different user.

1173	   First person departures are trivially accomplished by terminating the
1174	   dialog that the participant is using to connect to the focus.

1176	   Third person departures can be done in one of two ways. First, a user
1177	   can make use of the REFER method to instruct the third party to send
1178	   a BYE to the conference server on the dialog that connects them to
1179	   the focus. This requires the user to have knowledge of the dialog
1180	   identifiers used by that participant. The second mechanism, which is
1181	   much cleaner, is to use the conference policy control protocol to
1182	   inform the focus that the participant is explicitly barred from the
1183	   conference. This will cause the focus to eject the user, sending them
1184	   a BYE in addition to whatever other signaling is needed to remove
1185	   them. The conference policy control protocol can also be used to
1186	   remove a large number of users. This is generally referred to as mass
1187	   ejection.

1189	7.4 Approving Policy Changes

1191	   A conference policy for a particular conference may designate one or
1192	   more users as moderators for some set of media policy or conference
1193	   policy change requests. This means that those moderators need to
1194	   approve the specific policy change. Typically, moderators are used to
1195	   approve member additions and removals. However, the framework allows
1196	   for moderators to be associated with any policy change that can be
1197	   made.

1199	   The general model to support moderator approval is through the
1200	   conference notification service. The moderator subscribes to the
1201	   notification service. They are authenticated by the focus, which
1202	   determines that they are a moderator for the conference. Whenever a
1203	   policy change request is made by a client that requires moderator
1204	   approval, the policy change is not actually committed. Rather, it is
1205	   marked as pending by the conference policy server. Any moderators for
1206	   that specific policy request who are subscribed to the conference
1207	   notification service will receive a notification of the pending
1208	   change. The moderators, using the conference policy control protocol,
1209	   can approve the specific change. This commits the new policy. All
1210	   participants are then notified of the new policy through the
1211	   notification service.

1213	7.5 Creating Sidebars

1215	   A sidebar is a "conference within a conference", allowing a subset of
1216	   the participants to converse amongst themselves. Frequently,
1217	   participants in a sidebar will still receive media from the main
1218	   conference, but "in the background". For audio, this may mean that
1219	   the volume of the media is reduced, for example.

1221	   There are two ways to represent a sidebar in this framework. The
1222	   first is to treat it as a specific kind of media policy. It is a
1223	   media policy which would request that sidebar participants be "in the
1224	   foreground", and others "in the background". There are no additional
1225	   dialogs or conferences established. The media policy control protocol
1226	   would allow a user to explicitly request sidebars. The server would
1227	   alert users (through the notification service) that they have been
1228	   invited to the sidebar. They would use the media policy control
1229	   protocol to approve their participation in it.

1231	   An alternative view is that a sidebar truly is a conference within a
1232	   conference, and would be implemented that way. There would be a new
1233	   conference URI associated with the sidebar. Standard techniques would
1234	   be used to add users to the sidebar, approve their membership, and so
1235	   on. The sidebar would itself be a participant in the main conference.
1236	   Users would continue to receive their media stream only through the
1237	   main conference. They would have a dialog with the sidebar focus, but
1238	   no media would be exchanged on this dialog.

1240	        OPEN ISSUE: It is still unclear as to which model is
1241	        preferrable. We should pick one.

1243	8 Security Considerations

1245	   Conferences frequently require security features in order to properly
1246	   operate. The conference policy may dictate that only certain
1247	   participants can join, or that certain participants can create new
1248	   policies. Generally speaking, conference applications are very
1249	   concerned about authorization decisions. Mechanisms for establishing
1250	   and enforcing such authorization rules is a central concept
1251	   throughout this document.

1253	   Of course, authorization rules require authentication. Normal SIP
1254	   authentication mechanisms should suffice for the the conference
1255	   authorization mechanisms described here.

1257	9 Contributors

1259	   This document is the result of discussions amongst the conferencing
1260	   design team. The members of this team include:

1262	   Brian Rosen
1263	   Rohan Mahy
1264	   Henning Schulzrinne
1265	   Orit Levin
1266	   Roni Even
1267	   Tom Taylor
1268	   Petri Koskelainen
1269	   Nermeen Ismail
1270	   Andy Zmolek
1271	   Joerg Ott
1272	   Dan Petrie

1274	10 Authors Addresses

1276	   Jonathan Rosenberg
1277	   dynamicsoft
1278	   72 Eagle Rock Avenue
1279	   First Floor
1280	   East Hanover, NJ 07936
1281	   email: jdrosen@dynamicsoft.com

1283	11 Normative References

1285	12 Informative References

1287	   [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J.

1289	   Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
1290	   initiation protocol," RFC 3261, Internet Engineering Task Force, June
1291	   2002.

1293	   [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a
1294	   transport protocol for real-time applications," RFC 1889, Internet
1295	   Engineering Task Force, Jan. 1996.

1297	   [3] O. Levin et al.  , "Requirements for tightly coupled SIP
1298	   conferencing," Internet Draft, Internet Engineering Task Force, July
1299	   2002.  Work in progress.

1301	   [4] A. B. Roach, "Session initiation protocol (sip)-specific event
1302	   notification," RFC 3265, Internet Engineering Task Force, June 2002.

1304	   [5] B. Campbell and J. Rosenberg, "Instant message sessions in
1305	   simple," Internet Draft, Internet Engineering Task Force, Oct. 2002.
1306	   Work in progress.

1308	   [6] J. Rosenberg and H. Schulzrinne, "A session initiation protocol
1309	   (SIP) event package for conference state," Internet Draft, Internet
1310	   Engineering Task Force, June 2002.  Work in progress.

1312	   [7] T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform resource
1313	   identifiers (URI): generic syntax," RFC 2396, Internet Engineering
1314	   Task Force, Aug.  1998.

1316	   [8] H. Schulzrinne and J. Rosenberg, "Session initiation protocol
1317	   (SIP) caller preferences and callee capabilities," Internet Draft,
1318	   Internet Engineering Task Force, July 2002.  Work in progress.

1320	   [9] F. Cuervo, N. Greene, A. Rayhan, C. Huitema, B. Rosen, and J.
1321	   Segers, "Megaco protocol version 1.0," RFC 3015, Internet Engineering
1322	   Task Force, Nov. 2000.

1324	   [10] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application
1325	   server component architecture for SIP," Internet Draft, Internet
1326	   Engineering Task Force, Mar. 2001.  Work in progress.

1328	   [11] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo,
1329	   "Best current practices for third party call control in the session
1330	   initiation protocol," Internet Draft, Internet Engineering Task
1331	   Force, June 2002.  Work in progress.

1333	   [12] A. Johnston and O. Levin, "Session initiation call control -
1334	   conferencing for user agents," Internet Draft, Internet Engineering
1335	   Task Force, Oct. 2002.  Work in progress.

1337	   [13] R. Mahy and D. Petrie, "The session initiation protocol (sip)
1338	   join header," Internet Draft, Internet Engineering Task Force, Oct.
1339	   2002.  Work in progress.

1341	   [14] J. Rosenberg and H. Schulzrinne, "A session initiation protocol
1342	   (SIP) event package for dialog state," Internet Draft, Internet
1343	   Engineering Task Force, June 2002.  Work in progress.

1345	   [15] R. Sparks, "The SIP refer method," Internet Draft, Internet
1346	   Engineering Task Force, July 2002.  Work in progress.

1348	   [16] B. Campbell and J. Rosenberg, "Session initiation protocol
1349	   extension for instant messaging," Internet Draft, Internet
1350	   Engineering Task Force, Sept.  2002.  Work in progress.

1352	   Full Copyright Statement

1354	   Copyright (c) The Internet Society (2002). All Rights Reserved.

1356	   This document and translations of it may be copied and furnished to
1357	   others, and derivative works that comment on or otherwise explain it
1358	   or assist in its implementation may be prepared, copied, published
1359	   and distributed, in whole or in part, without restriction of any
1360	   kind, provided that the above copyright notice and this paragraph are
1361	   included on all such copies and derivative works. However, this
1362	   document itself may not be modified in any way, such as by removing
1363	   the copyright notice or references to the Internet Society or other
1364	   Internet organizations, except as needed for the purpose of
1365	   developing Internet standards in which case the procedures for
1366	   copyrights defined in the Internet Standards process must be
1367	   followed, or as required to translate it into languages other than
1368	   English.

1370	   The limited permissions granted above are perpetual and will not be
1371	   revoked by the Internet Society or its successors or assigns.

1373	   This document and the information contained herein is provided on an
1374	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1375	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1376	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1377	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1378	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.