idnits 2.17.1 

draft-ietf-sipping-cc-framework-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 24.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1934.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1945.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1952.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1958.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 28, 2007) is 5995 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: 'JTAPI' on line 231

  -- Looks like a reference, but probably isn't: 'CSTA' on line 232

  -- Looks like a reference, but probably isn't: 'VoiceXML' on line 725

  == Unused Reference: '2' is defined on line 1781, but no explicit reference
     was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 3265 (ref.
     '4') (Obsoleted by RFC 6665)

  -- Obsolete informational reference (is this intentional?): RFC 4566 (ref.
     '5') (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sipping-service-examples-13

  == Outdated reference: A later version (-12) exists of
     draft-ietf-sipping-cc-transfer-08

  == Outdated reference: A later version (-07) exists of
     draft-ietf-sip-answermode-06


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SIPPING WG                                                       R. Mahy
3	Internet-Draft                                               Plantronics
4	Intended status: Informational                               B. Campbell
5	Expires: May 31, 2008                                          R. Sparks
6	                                                        Estacado Systems
7	                                                            J. Rosenberg
8	                                                           Cisco Systems
9	                                                               D. Petrie
10	                                                                  SIP EZ
11	                                                        A. Johnston, Ed.
12	                                                                   Avaya
13	                                                       November 28, 2007

15	     A Call Control and Multi-party usage framework for the Session
16	                       Initiation Protocol (SIP)
17	                   draft-ietf-sipping-cc-framework-08

19	Status of this Memo

21	   By submitting this Internet-Draft, each author represents that any
22	   applicable patent or other IPR claims of which he or she is aware
23	   have been or will be disclosed, and any of which he or she becomes
24	   aware will be disclosed, in accordance with Section 6 of BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on May 31, 2008.

44	Copyright Notice

46	   Copyright (C) The IETF Trust (2007).

48	Abstract

50	   This document defines a framework and requirements for multi-party
51	   usage of SIP.  To enable discussion of multi-party features and
52	   applications we define an abstract call model for describing the
53	   media relationships required by many of these.  The model and actions
54	   described here are specifically chosen to be independent of the SIP
55	   signaling and/or mixing approach chosen to actually setup the media
56	   relationships.  In addition to its dialog manipulation aspect, this
57	   framework includes requirements for communicating related information
58	   and events such as conference and session state, and session history.
59	   This framework also describes other goals that embody the spirit of
60	   SIP applications as used on the Internet.

62	Table of Contents

64	   1.  Motivation and Background  . . . . . . . . . . . . . . . . . .  4
65	   2.  Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . .  6
66	     2.1.  "Conversation Space" Model . . . . . . . . . . . . . . . .  6
67	     2.2.  Relationship Between Conversation Space, SIP Dialogs,
68	           and SIP Sessions . . . . . . . . . . . . . . . . . . . . .  7
69	     2.3.  Signaling Models . . . . . . . . . . . . . . . . . . . . .  8
70	     2.4.  Mixing Models  . . . . . . . . . . . . . . . . . . . . . .  9
71	       2.4.1.  Tightly Coupled  . . . . . . . . . . . . . . . . . . . 10
72	       2.4.2.  Loosely Coupled  . . . . . . . . . . . . . . . . . . . 11
73	     2.5.  Conveying Information and Events . . . . . . . . . . . . . 12
74	     2.6.  Componentization and Decomposition . . . . . . . . . . . . 14
75	       2.6.1.  Media Intermediaries . . . . . . . . . . . . . . . . . 14
76	       2.6.2.  Mixer  . . . . . . . . . . . . . . . . . . . . . . . . 14
77	       2.6.3.  Transcoder . . . . . . . . . . . . . . . . . . . . . . 15
78	       2.6.4.  Media Relay  . . . . . . . . . . . . . . . . . . . . . 15
79	       2.6.5.  Queue Server . . . . . . . . . . . . . . . . . . . . . 15
80	       2.6.6.  Parking Place  . . . . . . . . . . . . . . . . . . . . 15
81	       2.6.7.  Announcements and Voice Dialogs  . . . . . . . . . . . 15
82	     2.7.  Use of URIs  . . . . . . . . . . . . . . . . . . . . . . . 17
83	       2.7.1.  Naming Users in SIP  . . . . . . . . . . . . . . . . . 18
84	       2.7.2.  Naming Services with SIP URIs  . . . . . . . . . . . . 19
85	     2.8.  Invoker Independence . . . . . . . . . . . . . . . . . . . 21
86	     2.9.  Billing issues . . . . . . . . . . . . . . . . . . . . . . 21
87	   3.  Catalog of call control actions and sample features  . . . . . 22
88	     3.1.  Remote Call Control Actions on Early Dialogs . . . . . . . 22
89	       3.1.1.  Remote Answer  . . . . . . . . . . . . . . . . . . . . 22
90	       3.1.2.  Remote Forward or Put  . . . . . . . . . . . . . . . . 23
91	       3.1.3.  Remote Busy or Error Out . . . . . . . . . . . . . . . 23
92	     3.2.  Remote Call Control Actions on Single Dialogs  . . . . . . 23
93	       3.2.1.  Remote Dial  . . . . . . . . . . . . . . . . . . . . . 23
94	       3.2.2.  Remote On and Off Hold . . . . . . . . . . . . . . . . 23
95	       3.2.3.  Remote Hangup  . . . . . . . . . . . . . . . . . . . . 23
96	     3.3.  Call Control Actions on Multiple Dialogs . . . . . . . . . 24
97	       3.3.1.  Transfer . . . . . . . . . . . . . . . . . . . . . . . 24
98	       3.3.2.  Take . . . . . . . . . . . . . . . . . . . . . . . . . 25
99	       3.3.3.  Add  . . . . . . . . . . . . . . . . . . . . . . . . . 25
100	       3.3.4.  Local Join . . . . . . . . . . . . . . . . . . . . . . 26
101	       3.3.5.  Insert . . . . . . . . . . . . . . . . . . . . . . . . 27
102	       3.3.6.  Split  . . . . . . . . . . . . . . . . . . . . . . . . 27
103	       3.3.7.  Near-fork  . . . . . . . . . . . . . . . . . . . . . . 28
104	       3.3.8.  Far fork . . . . . . . . . . . . . . . . . . . . . . . 28
105	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 29
106	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30
107	   6.  Appendix A: Example Features . . . . . . . . . . . . . . . . . 30
108	     6.1.  Implementation of these features . . . . . . . . . . . . . 33
109	       6.1.1.  Barge-in . . . . . . . . . . . . . . . . . . . . . . . 34
110	       6.1.2.  Call Monitoring  . . . . . . . . . . . . . . . . . . . 34
111	       6.1.3.  Call Park  . . . . . . . . . . . . . . . . . . . . . . 35
112	       6.1.4.  Call Pickup  . . . . . . . . . . . . . . . . . . . . . 35
113	       6.1.5.  Click-to-dial  . . . . . . . . . . . . . . . . . . . . 35
114	       6.1.6.  Distinctive ring . . . . . . . . . . . . . . . . . . . 36
115	       6.1.7.  Intercom . . . . . . . . . . . . . . . . . . . . . . . 36
116	       6.1.8.  Music on Hold  . . . . . . . . . . . . . . . . . . . . 36
117	       6.1.9.  Pre-paid calling . . . . . . . . . . . . . . . . . . . 36
118	       6.1.10. Single Line Extension/Multiple Line Appearance . . . . 37
119	       6.1.11. Speakerphone paging  . . . . . . . . . . . . . . . . . 37
120	       6.1.12. Voice message screening  . . . . . . . . . . . . . . . 37
121	       6.1.13. Voice Portal . . . . . . . . . . . . . . . . . . . . . 38
122	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 38
123	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 38
124	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41
125	   Intellectual Property and Copyright Statements . . . . . . . . . . 43

127	1.  Motivation and Background

129	   The Session Initiation Protocol [1] (SIP) was defined for the
130	   initiation, maintenance, and termination of sessions or calls between
131	   one or more users.  However, despite its origins as a large-scale
132	   multiparty conferencing protocol, SIP is used today primarily for
133	   point to point calls.  This two-party configuration is the focus of
134	   the SIP specification and most of its extensions.

136	   This document defines a framework and requirements for multi-party
137	   usage of SIP.  Most multi-party operations manipulate SIP dialogs
138	   (also known as call legs) or SIP conference media policy to cause
139	   participants in a conversation to perceive specific media
140	   relationships.  In other protocols that deal with the concept of
141	   calls, this manipulation is known as call control.  In addition to
142	   its dialog or policy manipulation aspect, "call control" also
143	   includes communicating information and events related to manipulating
144	   calls, including information and events dealing with session state
145	   and history, conference state, user state, and even message state.

147	   Based on input from the SIP community, the authors compiled the
148	   following set of goals for SIP call control and multiparty
149	   applications:
150	   o  Define Primitives, Not Services.  Allow for a handful of robust
151	      yet simple mechanisms that can be combined to deliver features and
152	      services.  Throughout this document we refer to these simple
153	      mechanisms as "primitives".  Primitives should be sufficiently
154	      robust so that when they are combined with eachother, they can be
155	      used to build lots of services.  However, the goal is not to
156	      define a provably complete set of primitives.  Note that while the
157	      IETF will NOT standardize behavior or services, it may define
158	      example services for informational purposes, as in service
159	      examples [6].
160	   o  Participant oriented.  The primitives should be designed to
161	      provide services that are oriented around the experience of the
162	      participants.  The authors observe that end users of features and
163	      services usually don't care how a media relationship is setup.
164	      Their ultimate experience is based only on the resulting media and
165	      other externally visible characteristics.
166	   o  Signaling Model independent: Support both a central control and a
167	      peer-to-peer feature invocation model (and combinations of the
168	      two).  Baseline SIP already supports a centralized control model
169	      described in 3pcc [7], and the SIP community has expressed a great
170	      deal of interest in peer-to-peer or distributed call control using
171	      primitives such as those defined in REFER [8], Replaces [9], and
172	      Join [10].

174	   o  Mixing Model independent: The bulk of interesting multiparty
175	      applications involve mixing or combining media from multiple
176	      participants.  This mixing can be performed by one or more of the
177	      participants, or by a centralized mixing resource.  The experience
178	      of the participants should not depend on the mixing model used.
179	      While most examples in this document refer to audio mixing, the
180	      framework applies to any media type.  In this context a "mixer"
181	      refers to combining media of the same type in an appropriate,
182	      media-specific way.  This is consistent with model described in
183	      the SIP conferencing framework.
184	   o  Invoker oriented.  Only the user who invokes a feature or a
185	      service needs to know exactly which service is invoked or why.
186	      This is good because it allows new services to be created without
187	      requiring new primitives from all the participants; and it allows
188	      for much simpler feature authorization policies, for example, when
189	      participation spans organizational boundaries.  As discussed in
190	      section 3.8, this also avoids exponential state explosion when
191	      combining features.  The invoker only has to manage a user
192	      interface or API to prevent local feature interactions.  All the
193	      other participants simply need to manage the feature interactions
194	      of a much smaller number of primitives.
195	   o  Primitives make full use of URIs.  URIs are a very powerful
196	      mechanism for describing users and services.  They represent a
197	      plentiful resource that can be extremely expressive and easily
198	      routed, translated, and manipulated--even across organizational
199	      boundaries.  URIs can contain special parameters and informational
200	      headers that need only be relevant to the owner of the namespace
201	      (domain) of the URI.  Just as a user who selects an http: URL need
202	      not understand the significance and organization of the web site
203	      it references, a user may encounter a SIP URI that translates into
204	      an email-style group alias, that plays a pre-recorded message, or
205	      runs some complex call-handling logic.  Note that while this may
206	      seem paradoxical to the previous goal, both goals can be satisfied
207	      by the same model.
208	   o  Make use of SIP headers and SIP event packages to provide SIP
209	      entities with information about their environment.  These should
210	      include information about the status / handling of dialogs on
211	      other user agents, information about the history of other contacts
212	      attempted prior to the current contact, the status of
213	      participants, the status of conferences, user presence
214	      information, and the status of messages.
215	   o  Encourage service decomposition, and design to make use of
216	      standard components using well-defined, simple interfaces.  Sample
217	      components include a SIP mixer, recording service, announcement
218	      server, and voice dialog server.  (This is not an exhaustive
219	      list).

221	   o  Include authentication, authorization, policy, logging, and
222	      accounting mechanisms to allow these primitives to be used safely
223	      among mutually untrusted participants.  Some of these mechanisms
224	      may be used to assist in billing, but no specific billing system
225	      will be endorsed.
226	   o  Permit graceful fallback to baseline SIP.  Definitions for new SIP
227	      call control extensions/primitives must describe a graceful way to
228	      fallback to baseline SIP behavior.  Support for one primitive must
229	      not imply support for another primitive.
230	   o  There is no desire or goal to reinvent traditional models, such as
231	      the model used the [H.450] family of protocols, [JTAPI], or the
232	      [CSTA] call model, as these other models do not share the design
233	      goals presented in this document.

235	2.  Key Concepts

237	   This section introduces a number of key concepts which will be used
238	   to describe and explain various call control operations and services
239	   in the remainder of this document.  This includes the conversation
240	   space model, signaling and mixing models, common components, and the
241	   use of URIs.

243	2.1.  "Conversation Space" Model

245	   This document introduces the concept of an abstract "conversation
246	   space" as a set of participants who believe they are all
247	   communicating among one another.  Each conversation space contains
248	   one or more participants.

250	   Participants are SIP User Agents that send original media to or
251	   terminate and receive media from other members of the conversation
252	   space.  Logically, every participant in the conversation space has
253	   access to all the media generated in that space (this is strictly
254	   true if all participants share a common media type).  A SIP User
255	   Agent that does not contribute or consume any media is NOT a
256	   participant; nor is a user agent that merely forwards, transcoders,
257	   mixes, or selects media originating elsewhere in the conversation
258	   space.  [Note that a conversation space consists of zero or more SIP
259	   calls or SIP conferences.  A conversation space is similar to the
260	   definition of a "call" in some other call models.]

262	   Participants may represent human users or non-human users (referred
263	   to as robots or automatons in this document).  Some participants may
264	   be hidden within a conversation space.  Some examples of hidden
265	   participants include: robots that generate tones, images, or
266	   announcements during a conference to announce users arriving and
267	   departing, a human call center supervisor monitoring a conversation
268	   between a trainee and a customer, and robots that record media for
269	   training or archival purposes.

271	   Participants may also be active or passive.  Active participants are
272	   expected to be intelligent enough to leave a conversation space when
273	   they no longer desire to participate.  (An attentive human
274	   participant is obviously active.)  Some robotic participants (such as
275	   a voice messaging system, an instant messaging agent, or a voice
276	   dialog system) may be active participants if they can leave the
277	   conversation space when there is no human interaction.  Other robots
278	   (for example our tone generating robot from the previous example) are
279	   passive participants.  A human participant "on-hold" is passive.

281	   An example diagram of a conversation space can be shown as a "bubble"
282	   or ovals, or as a "set" in curly or square brace notation.  Each set,
283	   oval, or "bubble" represents a conversation space.  Hidden
284	   participants are shown in lowercase letters.

286	   Note that while the term "conversation" usually applies to oral
287	   exchange of information, we apply the conversation space model to any
288	   media exchange between participants.

290	   { A , B }                   [ A , b, C, D ]

292	      .-.                 .---.
293	     /   \               /     \
294	    /  A  \             / A   b \
295	   (       )           (         )
296	    \  B  /             \ C   D /
297	     \   /               \     /
298	      '-'                 '---'

300	2.2.  Relationship Between Conversation Space, SIP Dialogs, and SIP
301	      Sessions

303	   In SIP, a call is "an informal term that refers to some communication
304	   between peers, generally set up for the purposes of a multimedia
305	   conversation."  Obviously we cannot discuss normative behavior based
306	   on such an intentionally vague definition.  The concept of a
307	   conversation space is needed because the SIP definition of call is
308	   not sufficiently precise for the purpose of describing the user
309	   experience of multiparty features.

311	   Do any other definitions convey the correct meaning?  SIP, and SDP
312	   [5] both define a conference as "a multimedia session identified by a
313	   common session description."  A session is defined as "a set of
314	   multimedia senders and receivers and the data streams flowing from
315	   senders to receivers."  Both of these definitions are heavily
316	   oriented toward multicast sessions with little differentiation among
317	   participants.  As such, neither is particularly useful for our
318	   purposes.  In fact, the definition of "call" in some call models is
319	   more similar to our definition of a conversation space.

321	   Some examples of the relationship between conversation spaces, SIP
322	   dialogs, and SIP sessions are listed below.  In each example, a human
323	   user will perceive that there is a single call.
324	   o  A simple two-party call is a single conversation space, a single
325	      session, and a single dialog.
326	   o  A locally mixed three-way call is two sessions and two dialogs.
327	      It is also a single conversation space.
328	   o  A simple dial-in audio conference is a single conversation space,
329	      but is represented by as many dialogs and sessions as there are
330	      human participants.
331	   o  A multicast conference is a single conversation space, a single
332	      session, and as many dialogs as participants.

334	2.3.  Signaling Models

336	   Obviously to make changes to a conversation space, you must be able
337	   to use SIP signaling to cause these changes.  Specifically there must
338	   be a way to manipulate SIP dialogs (call legs) to move participants
339	   into and out of conversation spaces.  Although this is not as
340	   obvious, there also must be a way to manipulate SIP dialogs to
341	   include non-participant user agents that are otherwise involved in a
342	   conversation space (ex: B2BUAs, 3pcc controllers, mixers,
343	   transcoders, translators, or relays).

345	   Implementations may setup the media relationships described in the
346	   conversation space model using a centralized control model.  One
347	   common way to implement this using SIP is known as 3rd Party Call
348	   Control (3pcc) and is described in 3pcc [7].  The 3pcc approach
349	   relies on only the following 3 primitive operations:
350	   o  Create a new dialog (INVITE)
351	   o  Modify a dialog (reINVITE)
352	   o  Destroy a dialog (BYE)

354	   The main advantage of the 3pcc approach is that it only requires very
355	   basic SIP support from end systems to support call control features.
356	   As such, third-party call control is a natural way to handle protocol
357	   conversion and mid-call features.  It also has the advantage and
358	   disadvantage that new features can/must be implemented in one place
359	   only (the controller), and neither requires enhanced client
360	   functionality, nor takes advantage of it.

362	   In addition, a peer-to-peer approach is discussed at length in this
363	   draft.  The primary drawback of the peer-to-peer model is additional
364	   complexity in the end system and authentication and management
365	   models.  The benefits of the peer-to-peer model include:
366	   o  state remains at the edges
367	   o  call signaling need only go through participants involved (there
368	      are no additional points of failure)
369	   o  peers can take advantage of end-to-end message integrity or
370	      encryption
371	   o  setup time is shorter (fewer messages are required to be sent by
372	      the initiator of the action)

374	   The peer-to-peer approach relies on additional "primitive"
375	   operations, some of which are identified here.
376	   o  Replace an existing dialog
377	   o  Join a new dialog with an existing dialog
378	   o  Locally perform media forking (multi-unicast)
379	   o  Ask another UA to send a request on your behalf

381	   The peer-to-peer approach also only results in a single SIP dialog,
382	   directly between the two UAs.  The 3pcc approach results in two SIP
383	   dialogs, between each UA and the controller.  As a result, the SIP
384	   features and extensions that will be used during the dialog are
385	   limited to the those understood by the controller.  As a result, in a
386	   situation where both the UAs support an advanced SIP feature but the
387	   controller does not, the feature will not be able to be used.

389	   Many of the features, primitives, and actions described in this
390	   document also require some type of media mixing, combining, or
391	   selection as described in the next section.

393	2.4.  Mixing Models

395	   SIP permits a variety of mixing models, which are discussed here
396	   briefly.  This topic is discussed more thoroughly in the SIP
397	   conferencing framework [15] and cc-conferencing [19].  SIP supports
398	   both tightly-coupled and loosely-coupled conferencing, although more
399	   sophisticated behavior is available in tightly-coupled conferences.
400	   In a tightly-coupled conference, a single SIP user agent (called the
401	   focus) has a direct dialog relationship with each participant (and
402	   may control non participant user agents as well).  In a loosely-
403	   coupled conference there is no coordinated signaling relationships
404	   among the participants.

406	   For brevity, only the two most popular conferencing models are
407	   significantly discussed in this document (local and centralized
408	   mixing).  Applications of the conversation spaces model to loosely-
409	   coupled multicast and distributed full unicast mesh conferences are
410	   left as an exercise for the reader.  Note that a distributed full
411	   mesh conference can be used for basic conferences, but does not
412	   easily allow for more complex conferencing actions like splitting,
413	   merging, and sidebars.

415	   Call control features should be designed to allow a mixer (local or
416	   centralized) to decide when to reduce a conference back to a 2-party
417	   call, or drop all the participants (for example if only two
418	   automatons are communicating).  The actual heuristics used to release
419	   calls are beyond the scope of this document, but may depend on
420	   properties in the conversation space, such as the number of active,
421	   passive, or hidden participants; and the send-only, receive-only, or
422	   send-and-receive orientation of various participants.

424	2.4.1.  Tightly Coupled

426	   Tightly coupled conferences utilize a central point for signaling and
427	   authentication known as a focus [15].  The actual media can be
428	   centrally mixed or distributed.

430	2.4.1.1.  (Single) End System Mixing

432	   The first model we call "end system mixing".  In this model, user A
433	   calls user B, and they have a conversation.  At some point later, A
434	   decides to conference in user C. To do this, A calls C, using a
435	   completely separate SIP call.  This call uses a different Call-ID,
436	   different tags, etc.  There is no call set up directly between B and
437	   C. No SIP extension or external signaling is needed.  A merely
438	   decides to locally join two dialogs.

440	      B     C
441	       \   /
442	        \ /
443	         A

445	   A receives media streams from both B and C, and mixes them.  A sends
446	   a stream containing A's and C's streams to B, and a stream containing
447	   A's and B's streams to C. Basically, user A handles both signaling
448	   and media mixing.

450	2.4.1.2.  Centralized Mixing

452	   In a centralized mixing model, all participants have a pairwise SIP
453	   and media relationship with the mixer.  Common applications of
454	   centralized mixing include ad-hoc conferences and scheduled dial-in
455	   or dial-out conferences.  In the figure below, the mixer M receives
456	   and sends media to participants A, B, C, D, and E.

458	      B     C
459	       \   /
460	        \ /
461	         M --- A
462	        / \
463	       /   \
464	      D     E

466	2.4.1.3.  Centralized Signaling, Distributed Media

468	   In this conferencing model, there is a centralized controller, as in
469	   the dial-in and dial-out cases.  However, the centralized server
470	   handles signaling only.  The media is still sent directly between
471	   participants, using either multicast or multi-unicast.  Participants
472	   perform their own mixing.  Multi-unicast is when a user sends
473	   multiple packets (one for each recipient, addressed to that
474	   recipient).  This is referred to as a "Decentralized Multipoint
475	   Conference" in [H.323].  Full mesh media with centralized mixing is
476	   another approach.

478	2.4.2.  Loosely Coupled

480	   In these models, there is no point of central control of SIP
481	   signaling.  As in the "Centralized Signaling, Distributed Media" case
482	   above, all endpoints send media to all other endpoints.  Consequently
483	   every endpoint mixes their own media from all the other sources, and
484	   sends their own media to every other participant.

486	2.4.2.1.  Large-Scale Multicast Conferences

488	   Large-scale multicast conferences were the original motivation for
489	   both the Session Description Protocol SDP [5] and SIP.  In a large-
490	   scale multicast conference, one or more multicast addresses are
491	   allocated to the conference.  Each participant joins those multicast
492	   groups, and sends their media to those groups.  Signaling is not sent
493	   to the multicast groups.  The sole purpose of the signaling is to
494	   inform participants of which multicast groups to join.  Large-scale
495	   multicast conferences are usually pre-arranged, with specific start
496	   and stop times.  However, multicast conferences do not need to be
497	   pre-arranged, so long as a mechanism exists to dynamically obtain a
498	   multicast address.

500	2.4.2.2.  Full Distributed Unicast Conferencing

502	   In this conferencing model, each participant has both a pairwise
503	   media relationship and a pairwise signaling relationship with every
504	   other participant (a full mesh).  This model requires a mechanism to
505	   maintain a consistent view of distributed state across the group.
506	   This is a classic hard problem in computer science.  Also, this model
507	   does not scale well for large numbers of participants. because for
508	   <n> participants the number of media and signaling relationships is
509	   approximately n-squared.  As a result, this model is not generally
510	   available in commercial implementations; to the contrary it is
511	   primarily the topic of research or experimental implementations.
512	   Note that this model assumes peer-to-peer signaling.

514	2.5.  Conveying Information and Events

516	   Participants should have access to information about the other
517	   participants in a conversation space, so that this information can be
518	   rendered to a human user or processed by an automaton.  Although some
519	   of this information may be available from the Request-URI or To,
520	   From, Contact, or other SIP headers, another mechanism of reporting
521	   this information is necessary.

523	   Many applications are driven by knowledge about the progress of calls
524	   and conferences.  In general these types of events allow for the
525	   construction of distributed applications, where the application
526	   requires information on dialog and conference state, but is not
527	   necessarily co-resident with an endpoint user agent or conference
528	   server.  For example, a focus involved in a conversation space may
529	   wish to provide URIs for conference status, and/or conference/floor
530	   control.

532	   The SIP Events [4] architecture defines general mechanisms for
533	   subscription to and notification of events within SIP networks.  It
534	   introduces the notion of a package that is a specific "instantiation"
535	   of the events mechanism for a well-defined set of events.

537	   Event packages are needed to provide the status of a user's dialogs,
538	   provide the status of conferences and its participants, provide user
539	   presence information, provide the status of registrations, and
540	   provide the status of user's messages.  While this is not an
541	   exhaustive list, these are sufficient to enable the sample features
542	   described in this document.

544	   The conference event package [12] allows users to subscribe to
545	   information about an entire tightly-coupled SIP conference.
546	   Notifications convey information about the participants such as: the
547	   SIP URI identifying each user, their status in the space (active,
548	   declined, departed), URIs to invoke other features (such as sidebar
549	   conversations), links to other relevant information (such as floor
550	   control policies), and if floor control policies are in place, the
551	   user's floor control status.  For conversation spaces created from
552	   cascaded conferences, conversation state can be gathered from
553	   relevant foci and merged into a cohesive set of state.

555	   The dialog package [11] provides information about all the dialogs
556	   the target user is maintaining, what conversations the user in
557	   participating in, and how these are correlated.  Likewise the
558	   registration package [13] provides notifications when contacts have
559	   changed for a specific address-of-record.  The combination of these
560	   allows a user agent to learn about all conversations occurring for
561	   the entire registered contact set for an address-of-record.

563	   Note that user presence in SIP [14] has a close relationship with
564	   these later two event packages.  It is fundamental to the presence
565	   model that the information used to obtain user presence is
566	   constructed from any number of different input sources.  Examples of
567	   other such sources include calendaring information and uploads of
568	   presence documents.  These two packages can be considered another
569	   mechanism that allows a presence agent to determine the presence
570	   state of the user.  Specifically, a user presence server can act as a
571	   subscriber for the dialog and registration packages to obtain
572	   additional information that can be used to construct a presence
573	   document.

575	   The multi-party architecture may also need to provide a mechanism to
576	   get information about the status /handling of a dialog (for example,
577	   information about the history of other contacts attempted prior to
578	   the current contact).  Finally, the architecture should provide ample
579	   opportunities to present informational URIs that relate to calls,
580	   conversations, or dialogs in some way.  For example, consider the SIP
581	   Call-Info header, or Contact headers returned in a 300-class
582	   response.  Frequently additional information about a call or dialog
583	   can be fetched via non-SIP URIs.  For example, consider a web page
584	   for package tracking when calling a delivery company, or a web page
585	   with related documentation when joining a dial-in conference.  The
586	   use of URIs in the multiparty framework is discussed in more detail
587	   in Section 3.7.

589	   Finally the interaction of SIP with stimulus-signaling-based
590	   applications, that allow a user agent to interact with an application
591	   without knowledge of the semantics of that application, is discussed
592	   in the SIP application interaction framework [16].  Stimulus
593	   signaling can occur to a user interface running locally with the
594	   client, or to a remote user interface, through media streams.
595	   Stimulus signaling encompasses a wide range of mechanisms, ranging
596	   from clicking on hyperlinks, to pressing buttons, to traditional Dual
597	   Tone Multi Frequency (DTMF) input.  In all cases, stimulus signaling
598	   is supported through the use of markup languages, which play a key
599	   role in that framework.

601	2.6.  Componentization and Decomposition

603	   This framework proposes a decomposed component architecture with a
604	   very loose coupling of services and components.  This means that a
605	   service (such as a conferencing server or an auto-attendant) need not
606	   be implemented as an actual server.  Rather, these services can be
607	   built by combining a few basic components in straightforward or
608	   arbitrarily complex ways.

610	   Since the components are easily deployed on separate boxes, by
611	   separate vendors, or even with separate providers, we achieve a
612	   separation of function that allows each piece to be developed in
613	   complete isolation.  We can also reuse existing components for new
614	   applications.  This allows rapid service creation, and the ability
615	   for services to be distributed across organizational domains anywhere
616	   in the Internet.

618	   For many of these components it is also desirable to discover their
619	   capabilities, for example querying the ability of a mixer to host a
620	   10 dialog conference, or to reserve resources for a specific time.
621	   These actions could be provided in the form of URIs, provided there
622	   is an a priori means of understanding their semantics.  For example
623	   if there is a published dictionary of operations, a way to query the
624	   service for the available operations and the associated URIs, the URI
625	   can be the interface for providing these service operations.  This
626	   concept is described in more detail in the context of dialog
627	   operations in Section 3.

629	2.6.1.  Media Intermediaries

631	   Media Intermediaries are not participants in any conversation space,
632	   although an entity that is also a media translator may also have a
633	   co-located participant component (for example a mixer that also
634	   announces the arrival of a new participant; the announcement portion
635	   is a participant, but the mixer itself is not).  Media intermediaries
636	   should be as transparent as possible to the end users--offering a
637	   useful, fundamental service; without getting in the way of new
638	   features implemented by participants.  Some common media
639	   intermediaries are described below.

641	2.6.2.  Mixer

643	   A SIP mixer is a component that combines media from all dialogs in
644	   the same conversation in a media specific way.  For example, the
645	   default combining for an audio conference might be an N-1
646	   configuration, while a text mixer might interleave text messages on a
647	   per-line basis.  More details about how to manipulate the media
648	   policy used by mixers is being discussed in the XCON Working Group.

650	2.6.3.  Transcoder

652	   A transcoder translates media from one encoding or format to another
653	   (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to
654	   text/plain), or from one media type to another (for example text to
655	   speech).  A more thorough discussion of transcoding is described in
656	   SIP transcoding services invocation [17].

658	2.6.4.  Media Relay

660	   A media relay terminates media and simply forwards it to a new
661	   destination without changing the content in any way.  Sometimes media
662	   relays are used to provide source IP address anonymity, to facilitate
663	   middlebox traversal, or to provide a trusted entity where media can
664	   be forcefully disconnected.

666	2.6.5.  Queue Server

668	   A queue server is a location where calls can be entered into one of
669	   several FIFO (first-in, first-out) queues.  A queue server would
670	   subscribe to the presence of groups or individuals who are interested
671	   in its queues.  When detecting that a user is available to service a
672	   queue, the server redirects or transfers the last call in the
673	   relevant queue to the available user.  On a queue-by-queue basis,
674	   authorized users could also subscribe to the call state (dialog
675	   information) of calls within a queue.  Authorized users could use
676	   this information to effectively pluck (take) a call out of the queue
677	   (for example by sending an INVITE with a Replaces header to one of
678	   the user agents in the queue).

680	2.6.6.  Parking Place

682	   A parking place is a location where calls can be terminated
683	   temporarily and then retrieved later.  While a call is "parked", it
684	   can receive media "on-hold" such as music, announcements, or
685	   advertisements.  Such a service could be further decomposed such that
686	   announcements or music are handled by a separate component.

688	2.6.7.  Announcements and Voice Dialogs

690	   An announcement server is a server that can play digitized media
691	   (frequently audio), such as music or recorded speech.  These servers
692	   are typically accessible via SIP, HTTP, or RTSP.  An analogous
693	   service is a recording service that stores digitized media.  A
694	   convention for specifying announcements in SIP URIs is described in
695	   [24].  Likewise the same server could easily provide a service that
696	   records digitized media.

698	   A "voice dialog" is a model of spoken interactive behavior between a
699	   human and an automaton that can include synthesized speech, digitized
700	   audio, recognition of spoken and DTMF key input, recording of spoken
701	   input, and interaction with call control.  Voice dialogs frequently
702	   consist of forms or menus.  Forms present information and gather
703	   input; menus offer choices of what to do next.

705	   Spoken dialogs are a basic building block of applications that use
706	   voice.  Consider for example that a voice mail system, the
707	   conference-id and passcode collection system for a conferencing
708	   system, and complicated voice portal applications all require a voice
709	   dialog component.

711	2.6.7.1.  Text-to-Speech and Automatic Speech Recognition

713	   Text-to-Speech (TTS) is a service that converts text into digitized
714	   audio.  TTS is frequently integrated into other applications, but
715	   when separated as a component, it provides greater opportunity for
716	   broad reuse.  Automatic Speech Recognition (ASR) is a service that
717	   attempts to decipher digitized speech based on a proposed grammar.
718	   Like TTS, ASR services can be embedded, or exposed so that many
719	   applications can take advantage of such services.  A standardized
720	   (decomposed) interface to access standalone TTS and ASR services is
721	   currently being developed in the SPEECHSC Working Group.

723	2.6.7.2.  VoiceXML

725	   [VoiceXML] is a W3C recommendation that was designed to give authors
726	   control over the spoken dialog between users and applications.  The
727	   application and user take turns speaking: the application prompts the
728	   user, and the user in turn responds.  Its major goal is to bring the
729	   advantages of web-based development and content delivery to
730	   interactive voice response applications.  We believe that VoiceXML
731	   represents the ideal partner for SIP in the development of
732	   distributed IVR servers.  VoiceXML is an XML based scripting language
733	   for describing IVR services at an abstract level.  VoiceXML supports
734	   DTMF recognition, speech recognition, text-to-speech, and playing out
735	   of recorded media files.  The results of the data collected from the
736	   user are passed to a controlling entity through an HTTP POST
737	   operation.  The controller can then return another script, or
738	   terminate the interaction with the IVR server.

740	   A VoiceXML server also need not be implemented as a monolithic
741	   server.  Below is a diagram of a VoiceXML browser that is split into
742	   media and non-media handling parts.  The VoiceXML interpreter handles
743	   SIP dialog state and state within a VoiceXML document, and sends
744	   requests to the media component over another protocol.

746	                       +-------------+
747	                       |             |
748	                       | VoiceXML    |
749	                       | Interpreter |
750	                       | (signaling) |
751	                       +-------------+
752	                         ^          ^
753	                         |          |
754	                     SIP |          | RTSP
755	                         |          |
756	                         |          |
757	                         v          v
758	            +-------------+        +-------------+
759	            |             |        |             |
760	            |  SIP UA     |   RTP  | RTSP Server |
761	            |             |<------>|   (media)   |
762	            |             |        |             |
763	            +-------------+        +-------------+

765	                   Figure : Decomposed VoiceXML Server

767	2.7.  Use of URIs

769	   All naming in SIP uses URIs.  URIs in SIP are used in a plethora of
770	   contexts: the Request-URI; Contact, To, From, and *-Info headers;
771	   application/uri bodies; and embedded in email, web pages, instant
772	   messages, and ENUM records.  The request-URI identifies the user or
773	   service that the call is destined for.

775	   SIP URIs embedded in informational SIP headers, SIP bodies, and non-
776	   SIP content can also specify methods, special parameters, headers,
777	   and even bodies.  For example:

779	   sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice

781	   Throughout this draft we discuss call control primitive operations.
782	   One of the biggest problems is defining how these operations may be
783	   invoked.  There are a number of ways to do this.  One way is to
784	   define the primitives in the protocol itself such that SIP methods
785	   (for example REFER) or SIP headers (for example Replaces) indicate a
786	   specific call control action.  Another way to invoke call control
787	   primitives is to define a specific Request-URI naming convention.
788	   Either these conventions must be shared between the client (the
789	   invoker) and the server, or published by or on behalf of the server.
790	   The former involves defining URI construction techniques (e.g.  URI
791	   parameters and/or token conventions) as proposed in [24].  The latter
792	   technique usually involves discovering the URI via a SIP event
793	   package, a web page, a business card, or an Instant Message.  Yet
794	   another means to acquire the URIs is to define a dictionary of
795	   primitives with well-defined semantics and provide a means to query
796	   the named primitives and corresponding URIs that may be invoked on
797	   the service or dialogs.

799	2.7.1.  Naming Users in SIP

801	   An address-of-record, or public SIP address, is a SIP (or SIPS) URI
802	   that points to a domain with a location server that can map the URI
803	   to set of Contact URIs where the user might be available.  Typically
804	   the Contact URIs are populated via registration.

806	   Address of Record               Contacts

808	   sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060
809	                                 sip:bbrown@mailbox.provider.example.net
810	                                 sip:+1.408.555.6789@mobile.example.net

812	   Callee Capabilities [20] defines a set of additional parameters to
813	   the Contact header that define the characteristics of the user agent
814	   at the specified URI.  For example, there is a mobility parameter
815	   that indicates whether the UA is fixed or mobile.  When a user agent
816	   registers, it places these parameters in the Contact headers to
817	   characterize the URIs it is registering.  This allows a proxy for
818	   that domain to have information about the contact addresses for that
819	   user.

821	   When a caller sends a request, it can optionally request Caller
822	   Preferences [21], by including the Accept-Contact, Request-
823	   Disposition, and Reject-Contact headers that request certain handling
824	   by the proxy in the target domain.  These headers contain preferences
825	   that describe the set of desired URIs to which the caller would like
826	   their request routed.  The proxy in the target domain matches these
827	   preferences with the Contact characteristics originally registered by
828	   the target user.  The target user can also choose to run arbitrarily
829	   complex "Find-me" feature logic on a proxy in the target domain.

831	   There is a strong asymmetry in how preferences for callers and
832	   callees can be presented to the network.  While a caller takes an
833	   active role by initiating the request, the callee takes a passive
834	   role in waiting for requests.  This motivates the use of callee-
835	   supplied scripts and caller preferences included in the call request.
836	   This asymmetry is also reflected in the appropriate relationship
837	   between caller and callee preferences.  A server for a callee should
838	   respect the wishes of the caller to avoid certain locations, while
839	   the preferences among locations has to be the callee's choice, as it
840	   determines where, for example, the phone rings and whether the callee
841	   incurs mobile telephone charges for incoming calls.

843	   SIP User Agent implementations are encouraged to make intelligent
844	   decisions based on the type of participants (active/passive, hidden,
845	   human/robot) in a conversation space.  This information is conveyed
846	   via the dialog package or in a SIP header parameter communicated
847	   using an appropriate SIP header.  For example, a music on hold
848	   service may take the sensible approach that if there are two or more
849	   unhidden participants, it should not provide hold music; or that it
850	   will not send hold music to robots.

852	   Multiple participants in the same conversation space may represent
853	   the same human user.  For example, the user may use one participant
854	   for video, chat, and whiteboard media on a PC and another for audio
855	   media on a SIP phone.  In this case, the address-of-record is the
856	   same for both user agents, but the Contacts are different.  In
857	   addition, human users may add robot participants that act on their
858	   behalf (for example a call recording service, or a calendar
859	   announcement reminder).  Call Control features in SIP should continue
860	   to function as expected in such an environment.

862	2.7.2.  Naming Services with SIP URIs

864	   A critical piece of defining a session level service that can be
865	   accessed by SIP is defining the naming of the resources within that
866	   service.  This point cannot be overstated.

868	   In the context of SIP control of application components, we take
869	   advantage of the fact that the left-hand-side of a standard SIP URI
870	   is a user part.  Most services may be thought of as user automatons
871	   that participate in SIP sessions.  It naturally follows that the user
872	   part should be utilized as a service indicator.

874	   For example, media servers commonly offer multiple services at a
875	   single host address.  Use of the user part as a service indicator
876	   enables service consumers to direct their requests without ambiguity.
877	   It has the added benefit of enabling media services to register their
878	   availability with SIP Registrars just as any "real" SIP user would.
879	   This maintains consistency and provides enhanced flexibility in the
880	   deployment of media services in the network.

882	   There has been much discussion about the potential for confusion if
883	   media services URIs are not readily distinguishable from other types
884	   of SIP UAs.  The use of a service namespace provides a mechanism to
885	   unambiguously identify standard interfaces while not constraining the
886	   development of private or experimental services.

888	   In SIP, the Request-URI identifies the user or service that the call
889	   is destined for.  The great advantage of using URIs (specifically,
890	   the SIP Request-URI) as a service identifier comes because of the
891	   combination of two facts.  First, unlike in the PSTN, where the
892	   namespace (dialable telephone numbers) are limited, URIs come from an
893	   infinite space.  They are plentiful, and they are free.  Secondly,
894	   the primary function of SIP is call routing through manipulations of
895	   the Request-URI.  In the traditional SIP application, this URI
896	   represents a person.  However, the URI can also represent a service,
897	   as we propose here.  This means we can apply the routing services SIP
898	   provides to routing of calls to services.  The result - the problem
899	   of service invocation and service location becomes a routing problem,
900	   for which SIP provides a scalable and flexible solution.  Since there
901	   is such a vast namespace of services, we can explicitly name each
902	   service in a finely granular way.  This allows the distribution of
903	   services across the network.  For further discussion about services
904	   and SIP URIs, see RFC 3087 [22]

906	   Consider a conferencing service, where we have separated the names of
907	   ad-hoc conferences from scheduled conferences, we can program proxies
908	   to route calls for ad-hoc conferences to one set of servers, and
909	   calls for scheduled ones to another, possibly even in a different
910	   provider.  In fact, since each conference itself is given a URI, we
911	   can distribute conferences across servers, and easily guarantee that
912	   calls for the same conference always get routed to the same server.
913	   This is in stark contrast to conferences in the telephone network,
914	   where the equivalent of the URI - the phone number - is scarce.  An
915	   entire conferencing provider generally has one or two numbers.
916	   Conference IDs must be obtained through IVR interactions with the
917	   caller, or through a human attendant.  This makes it difficult to
918	   distribute conferences across servers all over the network, since the
919	   PSTN routing only knows about the dialed number.

921	   For more examples, consider the URI conventions of RFC 4240 [24] for
922	   media servers and RFC 4458 [25] for voicemail and IVR systems.

924	   In practical applications, it is important that an invoker does not
925	   necessarily apply semantic rules to various URIs it did not create.
926	   Instead, it should allow any arbitrary string to be provisioned, and
927	   map the string to the desired behavior.  The administrator of a
928	   service may choose to provision specific conventions or mnemonic
929	   strings, but the application should not require it.  In any large
930	   installation, the system owner is likely to have pre-existing rules
931	   for mnemonic URIs, and any attempt by an application to define its
932	   own rules may create a conflict.  Implementations should allow an
933	   arbitrary mix of URIs from these schemes, or any other scheme that
934	   renders valid SIP URIs to be provisioned, rather than enforce only
935	   one particular scheme.

937	   As we have shown, SIP URIs represent an ideal, flexible mechanism for
938	   describing and naming service resources, regardless if the resources
939	   are queues, conferences, voice dialogs, announcements, voicemail
940	   treatments, or phone features.

942	2.8.  Invoker Independence

944	   With functional signaling, only the invoker of features in SIP need
945	   to know exactly which feature they are invoking.  One of the primary
946	   benefits of this approach is that combinations of functional features
947	   work in SIP call control without requiring complex feature
948	   interaction matrices.  For example, let us examine the combination of
949	   a "transfer" of a call that is "conferenced".

951	   Alice calls Bob. Alice silently "conferences in" her robotic
952	   assistant Albert as a hidden party.  Bob transfers Alice to Carol.
953	   If Bob asks Alice to Replace her leg with a new one to Carol then
954	   both Alice and Albert should be communicating with Carol
955	   (transparently).

957	   Using the peer-to-peer model, this combination of features works fine
958	   if A is doing local mixing (Alice replaces Bob's dialog with
959	   Carol's), or if A is using a central mixer (the mixer replaces Bob's
960	   dialog with Carol's).  A clever implementation using the 3pcc model
961	   can generate similar results.

963	   New extensions to the SIP Call Control Framework should attempt to
964	   preserve this property.

966	2.9.  Billing issues

968	   Billing in the PSTN is typically based on who initiated a call.  At
969	   the moment billing in a SIP network is neither consistent with
970	   itself, nor with the PSTN.  (A billing model for SIP should allow for
971	   both PSTN-style billing, and non-PSTN billing.)  The example below
972	   demonstrates one such inconsistency.

974	   Alice places a call to Bob. Alice then blind transfers Bob to Carol
975	   through a PSTN gateway.  In current usage of REFER, Bob may be billed
976	   for a call he did not initiate (his UA originated the outgoing dialog
977	   however).  This is not necessarily a terrible thing, but it
978	   demonstrates a security concern (Bob must have appropriate local
979	   policy to prevent fraud).  Also, Alice may wish to pay for Bob's
980	   session with Carol.  There should be a way to signal this in SIP.

982	   Likewise a Replacement call may maintain the same billing
983	   relationship as a Replaced call, so if Alice first calls Carol, then
984	   asks Bob to Replace this call, Alice may continue to receive a bill.

986	   Further work in SIP billing should define a way to set or discover
987	   the direction of billing.

989	3.  Catalog of call control actions and sample features

991	   Call control actions can be categorized by the dialogs upon which
992	   they operate.  The actions may involve a single or multiple dialogs.
993	   These dialogs can be early or established.  Multiple dialogs may be
994	   related in a conversation space to form a conference or other
995	   interesting media topologies.

997	   It should be noted that it is desirable to provide a means by which a
998	   party can discover the actions that may be performed on a dialog.
999	   The interested party may be independent or related to the dialogs.
1000	   One means of accomplishing this is through the ability to define and
1001	   obtain URIs for these actions as described in section .

1003	   Below are listed several call control "actions" that establish or
1004	   modify dialogs and relate the participants in a conversation space.
1005	   The names of the actions listed are for descriptive purposes only
1006	   (they are not normative).  This list of actions is not meant to be
1007	   exhaustive.

1009	   In the examples, all actions are initiated by the user "Alice"
1010	   represented by UA "A".

1012	3.1.  Remote Call Control Actions on Early Dialogs

1014	   The following are a set of actions that may be performed on a single
1015	   early dialog.  These actions can be thought of as a set of remote
1016	   control operations.  For example an automaton might perform the
1017	   operation on behalf of a user.  Alternatively a user might use the
1018	   remote control in the form of an application to perform the action on
1019	   the early dialog of a UA that may be out of reach.  All of these
1020	   actions correspond to telling the UA how to respond to a request to
1021	   establish an early dialog.  These actions provide useful
1022	   functionality for PDA, PC and server based applications that desire
1023	   the ability to control a UA.  A proposed mechanism for this type of
1024	   functionality is described in Remote Call Control [23].

1026	3.1.1.  Remote Answer

1028	   A dialog is in some early dialog state such as 180 Ringing.  It may
1029	   be desirable to tell the UA to answer the dialog.  That is tell it to
1030	   send a 200 Ok response to establish the dialog.

1032	3.1.2.  Remote Forward or Put

1034	   It may be desirable to tell the UA to respond with a 3xx class
1035	   response to forward an early dialog to another UA.

1037	3.1.3.  Remote Busy or Error Out

1039	   It may be desirable to instruct the UA to send an error response such
1040	   as 486 Busy Here.

1042	3.2.  Remote Call Control Actions on Single Dialogs

1044	   There is another useful set of actions that operate on a single
1045	   established dialog.  These operations are useful in building
1046	   productivity applications for aiding users to control their phone.
1047	   For example a Customer Relationship Management (CRM) application that
1048	   sets up calls for a user eliminating the need for the user to
1049	   actually enter an address.  These operations can also be thought of a
1050	   remote control actions.  A proposed mechanism for this type of
1051	   functionality is described in Remote Call Control [23].

1053	3.2.1.  Remote Dial

1055	   This action instructs the UA to initiate a dialog.  This action can
1056	   be performed using the REFER method.

1058	3.2.2.  Remote On and Off Hold

1060	   This action instructs the UA to put an established dialog on hold.
1061	   Though this operation can conceptually be performed with the REFER
1062	   method, there is no semantics defined as to what the referred party
1063	   should do with the SDP.  There is no way to distinguish between the
1064	   desire to go on or off hold on a per media stream basis.

1066	3.2.3.  Remote Hangup

1068	   This action instructs the UA to terminate an early or established
1069	   dialog.  A REFER request with the following Refer-To URI and Target-
1070	   Dialog header field [26] performs this action.  Note: this example
1071	   does not show the full set of header fields.

1073	   REFER sip:carol@client.chicago.net SIP/2.0
1074	   Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE
1075	   Target-Dialog: 13413098;local-tag=879738;remote-tag=023214

1077	3.3.  Call Control Actions on Multiple Dialogs

1079	   These actions apply to a set of related dialogs.

1081	3.3.1.  Transfer

1083	   This section describes how call transfer can be achieved using
1084	   centralized (3pcc) and peer-to-peer (REFER) approaches.

1086	   The conversation space changes as follows:

1088	    before            after
1089	   { A , B }  -->   { C , B }

1091	   A replaces itself with C.

1093	   To make this happen using the peer-to-peer approach, "A" would send
1094	   two SIP requests.  A shorthand for those requests is shown below:

1096	   REFER B  Refer-To:C
1097	   BYE B

1099	   To make this happen instead using the 3pcc approach, the controller
1100	   sends requests represented by the shorthand below:

1102	   INVITE C (w/SDP of B)
1103	   reINVITE B (w/SDP of C)
1104	   BYE A

1106	   Features enabled by this action:

1108	   - blind transfer
1109	   - transfer to a central mixer (some type of conference or forking)
1110	   - transfer to park server (park)
1111	   - transfer to music on hold or announcement server
1112	   - transfer to a "queue"
1113	   - transfer to a service (such as Voice Dialogs service)
1114	   - transition from local mixer to central mixer

1116	   This action is frequently referred to as "completing an attended
1117	   transfer".  It is described in more detail in cc-transfer [18].

1119	   Note that if a transfer requires URI hiding or privacy, then the 3pcc
1120	   approach can more easily implement this.  For example, if the URI of
1121	   C needs to be hidden from B, then the use of 3pcc helps accomplish
1122	   this.

1124	3.3.2.  Take

1126	   The conversation space changes as follows:

1128	   { B , C } --> { B , A }

1130	   A forcibly replaces C with itself.  In most uses of this primitive, A
1131	   is just "un-replacing" itself.

1133	   Using the peer-to-peer approach, "A" sends:

1135	    INVITE B  Replaces: <dialog between B and C>

1137	   Using the 3pcc approach (all requests sent from controller)

1139	    INVITE A (w/SDP of B)
1140	    reINVITE B (w/SDP of A)
1141	    BYE C

1143	   Features enabled by this action:

1145	   - transferee completes an attended transfer
1146	   - retrieve from central mixer (not recommended)
1147	   - retrieve from music on hold or park
1148	   - retrieve from queue
1149	   - call center take
1150	   - voice portal resuming ownership of a call it originated
1151	   - answering-machine style screening (pickup)
1152	   - pickup of a ringing call (i.e. early dialog)

1154	   Note: that pick up of a ringing call has perhaps some interesting
1155	   additional requirements.  First of all it is an early dialog as
1156	   opposed to an established dialog.  Secondly the party which is to
1157	   pickup the call may only wish to do so only while it is an early
1158	   dialog.  That is in the race condition where the ringing UA accepts
1159	   just before it receives signaling from the party wishing to take the
1160	   call, the taking party wishes to yield or cancel the take.  The goal
1161	   is to avoid yanking an answered call from the called party.

1163	   This action is described in Replaces [9] and in cc-transfer [18].

1165	3.3.3.  Add

1167	   Note that the following 4 actions are described in cc-conferencing
1168	   [19].

1170	   This is merely adding a participant to a SIP conference.  The
1171	   conversation space changes as follows:

1173	   { A , B } --> { A , B , C }

1175	   A adds C to the conversation.

1177	   Using the peer-to-peer approach, adding a party using local mixing
1178	   requires no signaling.  To transition from a 2-party call or a
1179	   locally mixed conference to centrally mixing A could send the
1180	   following requests:

1182	    REFER B  Refer-To: conference-URI
1183	    INVITE conference-URI
1184	    BYE B

1186	   To add a party to a conference:

1188	    REFER C  Refer-To: conference-URI
1189	                   or
1190	    REFER conference-URI  Refer-To: C

1192	   Using the 3pcc approach to transition to centrally mixed, the
1193	   controller would send:

1195	    INVITE mixer leg 1 (w/SDP of A)
1196	    INVITE mixer leg 2 (w/SDP of B)
1197	    INVITE C (late SDP)
1198	    reINVITE A (w/SDP of mixer leg 1)
1199	    reINVITE B (w/SDP of mixer leg 2)
1200	    INVITE mixer leg3 (w/SDP of C)

1202	   To add a party to a SIP conference:

1204	    INVITE C (late SDP)
1205	    INVITE conference-URI (w/SDP of C)

1207	   Features enabled:

1209	   - standard conference feature
1210	   - call recording
1211	   - answering-machine style screening (screening)

1213	3.3.4.  Local Join

1215	   The conversation space changes like this:

1217	   { A , B } , { A , C }  -->  { A , B , C }

1219	           or like this

1221	   { A , B } , { C , D }  -->  { A , B , C , D }

1223	   A takes two conversation spaces and joins them together into a single
1224	   space.

1226	   Using the peer-to-peer approach, A can mix locally, or REFER the
1227	   participants of both conversation spaces to the same central mixer
1228	   (as in 3.3.5).

1230	   For the 3pcc approach, the call flows for inserting participants, and
1231	   joining and splitting conversation spaces are tedious yet
1232	   straightforward, so these are left as an exercise for the reader.

1234	   Features enabled:

1236	   - standard conference feature
1237	   - leaving a sidebar to rejoin a larger conference

1239	3.3.5.  Insert

1241	   The conversation space changes like this:

1243	   { B , C } --> { A , B , C }

1245	   A inserts itself into a conversation space.

1247	   A proposed mechanism for signaling this using the peer-to-peer
1248	   approach is to send a new header in an INVITE with "joining" [10]
1249	   semantics.  For example:

1251	   INVITE B Join: <dialog id of B and C>

1253	   If B accepted the INVITE, B would accept responsibility to setup the
1254	   dialogs and mixing necessary (for example: to mix locally or to
1255	   transfer the participants to a central mixer)

1257	   Features enabled:

1259	   - barge-in
1260	   - call center monitoring
1261	   - call recording

1263	3.3.6.  Split

1265	   { A , B , C , D } --> { A , B } , { C , D }

1267	   If using a central conference with peer-to-peer
1268	    REFER C  Refer-To: conference-URI (new URI)
1269	    REFER D  Refer-To: conference-URI (new URI)
1270	    BYE C
1271	    BYE D

1273	   Features enabled:

1275	   - sidebar conversations during a larger conference

1277	3.3.7.  Near-fork

1279	   A participates in two conversation spaces simultaneously:

1281	   { A, B } --> { B , A } & { A , C }

1283	   A is a participant in two conversation spaces such that A sends the
1284	   same media to both spaces, and renders media from both spaces,
1285	   presumably by mixing or rendering the media from both.  We can define
1286	   that A is the "anchor" point for both forks, each of which is a
1287	   separate conversation space.

1289	   This action is purely local implementation (it requires no special
1290	   signaling).  Local features such as switching calls between the
1291	   background and foreground are possible using this media relationship.

1293	3.3.8.  Far fork

1295	   The conversation space diagram...

1297	   { A, B } --> { A , B } & { B , C }

1299	   A requests B to be the "anchor" of two conversation spaces.

1301	   This is easily setup by creating a conference with two sub-
1302	   conferences and setting the media policy appropriately such that B is
1303	   a participant in both.  Media forking can also be setup using 3pcc as
1304	   described in Section 5.1 of RFC3264 [3] (an offer/answer model for
1305	   SDP).  The session descriptions for forking are quite complex.
1306	   Controllers should verify that endpoints can handle forked-media, for
1307	   example using prior configuration.

1309	   Features enabled:

1311	   - barge-in
1312	   - voice portal services
1313	   - whisper
1314	   - hotword detection
1315	   - sending DTMF somewhere else

1317	4.  Security Considerations

1319	   Call Control primitives provide a powerful set of features that can
1320	   be dangerous in the hands of an attacker.  To complicate matters,
1321	   call control primitives are likely to be automatically authorized
1322	   without direct human oversight.

1324	   The class of attacks that are possible using these tools include the
1325	   ability to eavesdrop on calls, disconnect calls, redirect calls,
1326	   render irritating content (including ringing) at a user agent, cause
1327	   an action that has billing consequences, subvert billing (theft-of-
1328	   service), and obtain private information.  Call control extensions
1329	   must take extra care to describe how these attacks will be prevented.

1331	   We can also make some general observations about authorization and
1332	   trust with respect to call control.  The security model is
1333	   dramatically dependent on the signaling model chosen (see section
1334	   3.2)

1336	   Let us first examine the security model used in the 3pcc approach.
1337	   All signaling goes through the controller, which is a trusted entity.
1338	   Traditional SIP authentication and hop-by-hop encryption and message
1339	   integrity work fine in this environment, but end-to-end encryption
1340	   and message integrity may not be possible.

1342	   When using the peer-to-peer approach, call control actions and
1343	   primitives can be legitimately initiated by a) an existing
1344	   participant in the conversation space, b) a former participant in the
1345	   conversation space, or c) an entity trusted by one of the
1346	   participants.  For example, a participant always initiates a
1347	   transfer; a retrieve from Park (a take) is initiated on behalf of a
1348	   former participant; and a barge-in (insert or far-fork) is initiated
1349	   by a trusted entity (an operator for example).

1351	   Authenticating requests by an existing participant or a trusted
1352	   entity can be done with baseline SIP mechanisms.  In the case of
1353	   features initiated by a former participant, these should be protected
1354	   against replay attacks by using a unique name or identifier per
1355	   invocation.  The Replaces header exhibits this behavior as a by-
1356	   product of its operation (once a Replaces operation is successful,
1357	   the dialog being Replaced no longer exists).  For other requests, a
1358	   "one-time" Request-URI may be provided to the feature invoker.

1360	   To authorize call control primitives that trigger special behavior
1361	   (such as an INVITE with Replaces or Join semantics), the receiving
1362	   user agent may have trouble finding appropriate credentials with
1363	   which to challenge or authorize the request, as the sender may be
1364	   completely unknown to the receiver, except through the introduction
1365	   of a third party.  These credentials need to be passed transitively
1366	   in some way or fetched in an event body, for example.

1368	5.  IANA Considerations

1370	   This document required no action by IANA.

1372	6.  Appendix A: Example Features

1374	   Primitives are defined in terms of their ability to provide features.
1375	   These example features should require an amply robust set of services
1376	   to demonstrate a useful set of primitives.  They are described here
1377	   briefly.  Note that the descriptions of these features are non-
1378	   normative.  Some of these features are used as examples in section 6
1379	   to demonstrate how some features may require certain media
1380	   relationships.  Note also that this document describes a mixture of
1381	   both features originating in the world of telephones, and features
1382	   that are clearly Internet oriented.

1384	   Example Feature Definitions:

1386	   Attended Transfer - The transferring party establishes a session with
1387	   the transfer target before completing the transfer.

1389	   Auto Answer - Calls to a certain address or location answer
1390	   immediately via a speakerphone.

1392	   Automatic Callback: Alice calls Bob, but Bob is busy.  Alice would
1393	   like Bob to call her automatically when he is available.  When Bob
1394	   hangs up, Alice's phone rings.  When Alice answers, Bob's phone
1395	   rings.  Bob answers and they talk.

1397	   Barge-in - Carol interrupts Alice who has a call in-progress call
1398	   with Bob. In some variations, Alice forcibly joins a new conversation
1399	   with Carol, in other variations, all three parties are placed in the
1400	   same conversation (basically a 3-way conference).

1402	   Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob
1403	   to contact Carol, but makes no attempt to contact Carol
1404	   independently.  In many implementations, Alice does not verify Bob's
1405	   success or failure in contacting Carol.

1407	   Call Forwarding - Before a dialog is accepted it is redirected to
1408	   another location, for example, because the originally intended
1409	   recipient is busy, does not answer, is disconnected from the network,
1410	   configured all requests to go somewhere else.

1412	   Call Monitoring - A call center supervisor joins an in-progress call
1413	   for monitoring purposes.

1415	   Call Park - A call participant parks a call (essentially puts the
1416	   call on hold), and then retrieves it at a later time (typically from
1417	   another location).

1419	   Call Pickup - A party picks up a call that was ringing at another
1420	   location.  One variation allows the caller to choose which location,
1421	   another variation just picks up any call in that user's "pickup
1422	   group".

1424	   Call Return - Alice calls Bob. Bob misses the call or is disconnected
1425	   before he is finished talking to Alice.  Bob invokes Call return that
1426	   calls Alice, even if Alice did not provide her real identity or
1427	   location to Bob.

1429	   Call Waiting - Alice is in a call, then receives another call.  Alice
1430	   can place the first call on hold, and talk with the other caller.
1431	   She can typically switch back and forth between the callers.

1433	   Click-to-dial - Alice looks in her company directory for Bob. When
1434	   she finds Bob, she clicks on a URI to call him.  Her phone rings (or
1435	   possibly answers automatically), and when she answers, Bob's phone
1436	   rings.

1438	   Conference Call - Three or more active, visible participants in the
1439	   same conversation space.

1441	   Consultative transfer - the transferring party establishes a session
1442	   with the target and mixes both sessions together so that all three
1443	   parties can participate, then disconnects leaving the transferee and
1444	   transfer target with an active session.

1446	   Distinctive ring - Incoming calls have different ring cadences or
1447	   sample sounds depending on the From party, the To party, or other
1448	   factors.

1450	   Do Not Disturb - Alice selects the Do Not Disturb option.  Calls to
1451	   her either ring briefly or not at all and are forwarded elsewhere.
1452	   Some variations allow specially authorized callers to override this
1453	   feature and ring Alice anyway.

1455	   Find-Me - Alice sets up complicated rules for how she can be reached
1456	   (possibly using CPL (Lennox, J., Wu, X., and H. Schulzrinne, "Call
1457	   Processing Language (CPL): A Language for User Control of Internet
1458	   Telephony Services," October 2004.) [27], presence (Rosenberg, J., "A
1459	   Presence Event Package for the Session Initiation Protocol (SIP),"
1460	   August 2004.) [14], or other factors).  When Bob calls Alice, his
1461	   call is eventually routed to a temporary Contact where Alice happens
1462	   to be available.

1464	   Hotline - Alice picks up a phone and is immediately connected to the
1465	   technical support hotline, for example.

1467	   IM Conference Alerts: A user receives an notification as an Instant
1468	   Message whenever someone joins a conference they are also in.

1470	   Inbound Call Screening - Alice doesn't want to receive calls from
1471	   Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
1472	   some variations this works even if Matt hides his identity.

1474	   Intercom - Alice typically presses a button on a phone that
1475	   immediately connects to another user or phone and causes that phone
1476	   to play her voice over its speaker.  Some variations immediately
1477	   setup two-way communications, other variations require another button
1478	   to be pressed to enable a two-way conversation.

1480	   Message Waiting - Bob calls Alice when she steps away from her phone,
1481	   when she returns a visible or audible indicator conveys that someone
1482	   has left her a voicemail message.  The message waiting indication may
1483	   also convey how many messages are waiting, from whom, what time, and
1484	   other useful pieces of information.

1486	   Music on Hold - When Alice places a call with Bob on hold, it
1487	   replaces its audio with streaming content such as music,
1488	   announcements, or advertisements.

1490	   Outbound Call Screening - Alice is paged and unknowingly calls a PSTN
1491	   pay-service telephone number in the Caribbean, but local policy
1492	   blocks her call, and possibly informs her why.

1494	   Pre-paid calling - Alice pays for a certain currency or unit amount
1495	   of calling value.  When she places a call, she provides her account
1496	   number somehow.  If her account runs out of calling value during a
1497	   call her call is disconnected or redirected to a service where she
1498	   can purchase more calling value.

1500	   Presence-Enabled Conferencing: Alice wants to set up a conference
1501	   call with Bob and Cathy when they all happen to be available (rather
1502	   than scheduling a predefined time).  The server providing the
1503	   application monitors their status, and calls all three when they are
1504	   all "online", not idle, and not in another call.

1506	   Single Line Extension/Multiple Line Appearance -- A group of phones
1507	   are all treated as "extensions" of a single line.  A call for one
1508	   rings them all.  As soon as one answers, the others stop ringing.  If
1509	   any extension is actively in a conversation, another extension can
1510	   "pick up" and immediately join the conversation.  This emulates the
1511	   behavior of a home telephone line with multiple phones.

1513	   Speakerphone paging - Alice calls the paging address and speaks.  Her
1514	   voice is played on the speaker of every idle phone in a preconfigured
1515	   group of phones.

1517	   Speed dial - Alice dials an abbreviated number, or enters an alias,
1518	   or presses a special speed dial button representing Bob. Her action
1519	   is interpreted as if she specified the full address of Bob.

1521	   Voice message screening - Bob calls Alice.  Alice is screening her
1522	   calls, so Bob hears Alice's voicemail greeting.  Alice can hear Bob
1523	   leave his message.  If she decides to talk to Bob, she can take the
1524	   call back from the voicemail system, otherwise she can let Bob leave
1525	   a message.  This emulates the behavior of a home telephone answering
1526	   machine

1528	   Voice Portal - A service that allows users to access a portal site
1529	   using spoken dialog interaction.  For example, Alice needs to
1530	   schedule a working dinner with her co-worker Carol.  Alice uses a
1531	   voice portal to check Carol's flight schedule, find a restaurant near
1532	   her hotel, make a reservation, get directions there, and page Carol
1533	   with this information.

1535	   Whispered call waiting - Alice is in a conversation with Bob. Carol
1536	   calls Alice.  Either Carol can "whisper" to Alice directly ("Can you
1537	   get lunch in 15 minutes?"), or an automaton whispers to Alice
1538	   informing her that Carol is trying to reach her.

1540	6.1.  Implementation of these features

1542	   Example Features:

1544	   Attended Transfer        [18]
1545	   Auto Answer              [28]
1546	   Automatic Callback       Two person presence-based conference
1547	   Barge-in                 Section 6.1.1
1548	   Blind Transfer           [18]
1549	   Call Forwarding          Proxy or Local implementation
1550	   Call Hold                [6]
1551	   Call Monitoring          Section 6.1.2
1552	   Call Park                Section 6.1.3, [6]
1553	   Call Pickup              Section 6.1.4, [6]
1554	   Call Return              Proxy feature
1555	   Call Waiting             Local Implementation
1556	   Click-to-dial            Section 6.1.5, [6]
1557	   Conference Call          [19]
1558	   Presence-based
1559	   Conferencing             [19], [14]
1560	   Consultative transfer    [18]
1561	   Distinctive ring         Section 6.1.6, Proxy or Local implementation
1562	   Do Not Disturb           [14]
1563	   Find-Me                  Proxy service based on presence
1564	   Hotline                  Local Implementation
1565	   IM Conference Alerts     Subscribe to conference status
1566	   Inbound Call Screening   Proxy or Local implementation
1567	   Intercom                 Section 6.1.7, [28]
1568	   Message Waiting          [29]
1569	   Multiple Appearances     Section 6.1.10
1570	   Music on Hold            Section 6.1.8, [6]
1571	   Outbound Call Screening  Proxy feature
1572	   Pre-Paid Calling         Section 6.1.9
1573	   Single Line Extension    Section 6.1.10
1574	   Speakerphone paging      Section 6.1.11, Speed dial + Auto Answer
1575	   Speed dial               Local Implementation
1576	   Voice Message Screening  Section 6.1.12
1577	   Voice Portal             Section 6.1.13
1578	   Whispered call waiting   Local implementation

1580	6.1.1.  Barge-in

1582	   Barge-in works the same as call monitoring except that it must
1583	   indicate that the send media stream to be mixed so that all of the
1584	   other parties can hear the stream from UA barging in.

1586	6.1.2.  Call Monitoring

1588	   Call monitoring is a Join operation.  The monitoring UA sends a Join
1589	   to the dialog it wants to listen to.  It is able to discover the
1590	   dialog via the dialog state on the monitored UA.  The monitoring UA
1591	   sends SDP in the INVITE that indicates receive only media.  As the UA
1592	   is monitoring only it does not matter whether the UA indicates it
1593	   wishes the send stream be mix or point to point.

1595	6.1.3.  Call Park

1597	   Call park requires the ability to: put a dialog some place, advertise
1598	   it to users in a pickup group and to uniquely identify it in a means
1599	   that can be communicated (including human voice).  The dialog can be
1600	   held locally on the UA parking the dialog or alternatively
1601	   transferred to the park service for the pickup group.  The parked
1602	   dialog then needs to be labeled (e.g. orbit 12) in a way that can be
1603	   communicated to the party that is to pick up the call.  The UAs in
1604	   the pick up group discovers the parked dialog(s) via the dialog
1605	   package from the park service.  If the dialog is parked locally the
1606	   park service merely aggregates the parked call states from the set of
1607	   UAs in the pickup up group.

1609	6.1.4.  Call Pickup

1611	   There are two different features that are called call pickup.  The
1612	   first is the pickup of a parked dialog.  The UA from which the dialog
1613	   is to be picked up subscribes to the dialog state of the park service
1614	   or the UA that has locally parked the dialog.  Dialogs that are
1615	   parked should be labeled with an identifier.  The labels are used by
1616	   the UA to allow the user to indicate which dialog is to be picked up.
1617	   The UA picking up the call invoked the URI in the call state that is
1618	   labeled as replace-remote.

1620	   The other call pickup feature involves picking up an early dialog
1621	   (typically ringing).  This feature uses some of the same primitives
1622	   as the pick up of a parked call.  The call state of the UA ringing
1623	   phone is advertised using the dialog package.  The UA that is to
1624	   pickup the early dialog subscribes either directly to the ringing UA
1625	   or to a service aggregating the states for UAs in the pickup group.
1626	   The call state identifies early dialogs.  The UA uses the call
1627	   state(s) to help the user choose which early dialog that is to be
1628	   picked up.  The UA then invokes the URI in the call state labeled as
1629	   replace-remote.

1631	6.1.5.  Click-to-dial

1633	   The application or server that hosts the click-to-dial application
1634	   captures the URI to be dialed and can setup the call using 3pcc or
1635	   can send a REFER request to the UA that is to dial the address.  As
1636	   users sometimes change their mind or wish to give up listing to a
1637	   ringing or voicemail answered phone, this application illustrates the
1638	   need to also have the ability to remotely hangup a call.

1640	6.1.6.  Distinctive ring

1642	   The target UA either makes a local decision based on information in
1643	   an incoming INVITE (To, From, Contact, Request-URI) or trusts an
1644	   Alert-Info header provided by the caller or inserted by a trusted
1645	   proxy.  In the latter case, the UA fetches the content described in
1646	   the URI (typically via http) and renders it to the user.

1648	6.1.7.  Intercom

1650	   The UA initiates a dialog using INVITE and the Answer-Mode: Auto
1651	   header field as described in [28].  The called UA accepts the INVITE
1652	   with a 200 OK and automatically enables the speakerphone.

1654	   Alternatively this can be a local decision for the UA to answer based
1655	   upon called party identification.

1657	6.1.8.  Music on Hold

1659	   Music on hold can be implemented a number of ways.  One way is to
1660	   transfer the held call to a holding service.  When the UA wishes to
1661	   take the call off hold it basically performs a take on the call from
1662	   the holding service.  This involves subscribing to call state on the
1663	   holding service and then invoking the URI in the call state labeled
1664	   as replace-remote.

1666	   Alternatively music on hold can be performed as a local mixing
1667	   operation.  The UA holding the call can mix in the music from the
1668	   music service via RTP (i.e. an additional dialog) or RTSP or other
1669	   streaming media source.  This approach is simpler (i.e. the held
1670	   dialog does not move so there is less chance of loosing them) from a
1671	   protocol perspective, however it does use more LAN bandwidth and
1672	   resources on the UA.

1674	6.1.9.  Pre-paid calling

1676	   For prepaid calling, the user's media always passes through a device
1677	   that is trusted by the pre-paid provider.  This may be the other
1678	   endpoint (for example a PSTN gateway).  In either case, an
1679	   intermediary proxy or B2BUA can periodically verify the amount of
1680	   time available on the pre-paid account, and use the session-timer
1681	   extension to cause the trusted endpoint (gateway) or intermediary
1682	   (media relay) to send a reINVITE before that time runs out.  During
1683	   the reINVITE, the SIP intermediary can re-verify the account and
1684	   insert another session-timer header.

1686	   Note that while most pre-paid systems on the PSTN use an IVR to
1687	   collect the account number and destination, this isn't strictly
1688	   necessary for a SIP-originated prepaid call.  SIP requests and SIP
1689	   URIs are sufficiently expressive to convey the final destination, the
1690	   provider of the prepaid service, the location from which the user is
1691	   calling, and the prepaid account they want to use.  If a pre-paid IVR
1692	   is used, the mechanism described below (Voice Portals) can be
1693	   combined as well.

1695	6.1.10.  Single Line Extension/Multiple Line Appearance

1697	   Incoming calls ring all the extensions through basic parallel
1698	   forking.  Each extension subscribes to dialog events from each other
1699	   extension.  While one user has an active call, any other UA extension
1700	   can insert itself into that conversation (it already knows the dialog
1701	   information) in the same way as barge-in.

1703	   Standardization work to allow line appearance numbers to be
1704	   coordinated across a group of UAs is currently underway.

1706	6.1.11.  Speakerphone paging

1708	   Speakerphone paging can be implemented using either multicast or
1709	   through a simple multipoint mixer.  In the multicast solution the
1710	   paging UA sends a multicast INVITE with send only media in the SDP
1711	   (see also RFC3264).  The automatic answer and enabling of the
1712	   speakerphone is a locally configured decision on the paged UAs.  The
1713	   paging UA sends RTP via the multicast address indicated in the SDP.

1715	   The multipoint solution is accomplished by sending an INVITE to the
1716	   multipoint mixer.  The mixer is configured to automatically answer
1717	   the dialog.  The paging UA then sends REFER requests for each of the
1718	   UAs that are to become paging speakers (The UA is likely to send out
1719	   a single REFER that is parallel forked by the proxy server).  The UAs
1720	   performing as paging speakers are configured to automatically answer
1721	   based upon caller identification (e.g.  To field, URI or Referred-To
1722	   headers).

1724	   Finally as a third option, the user agent can send a mass-invitation
1725	   request to a conference server, which would create a conference and
1726	   send INVITEs containing the Answer-Mode: Auto header field to all
1727	   user agents in the paging group.

1729	6.1.12.  Voice message screening

1731	   At first, this is the same as call monitoring.  In this case the
1732	   voicemail service is one of the UAs.  The UA screening the message
1733	   monitors the call on the voicemail service, and also subscribes to
1734	   dialog information.  If the user screening their messages decides to
1735	   answer, they perform a Take from the voicemail system (for example,
1736	   send an INVITE with Replaces to the UA leaving the message)

1738	6.1.13.  Voice Portal

1740	   A voice portal is essentially a complex collection of voice dialogs
1741	   used to access interesting content.  One of the most desirable call
1742	   control features of a Voice Portal is the ability to start a new
1743	   outgoing call from within the context of the Portal (to make a
1744	   restaurant reservation, or return a voicemail message for example).
1745	   Once the new call is over, the user should be able to return to the
1746	   Portal by pressing a special key, using some DTMF sequence (ex: a
1747	   very long pound or hash tone), or by speaking a hotword (ex: "Main
1748	   Menu").

1750	   In order to accomplish this, the Voice Portal starts with the
1751	   following media relationship:

1753	   { User , Voice Portal }

1755	   The user then asks to make an outgoing call.  The Voice Portal asks
1756	   the User to perform a Far-Fork.  In other words the Voice Portal
1757	   wants the following media relationship:

1759	           { Target , User }  &  { User , Voice Portal }

1761	   The Voice Portal is now just listening for a hotword or the
1762	   appropriate DTMF.  As soon as the user indicates they are done, the
1763	   Voice Portal takes the call from the old Target, and we are back to
1764	   the original media relationship.

1766	   This feature can also be used by the account number and phone number
1767	   collection menu in a pre-paid calling service.  A user can press a
1768	   DTMF sequence that presents them with the appropriate menu again.

1770	7.  Acknowledgements

1772	   Thanks to AC Mahendran, John Elwell, and Xavier Marjou for their
1773	   detailed Working Group review of the document.

1775	8.  Informative References

1777	   [1]   Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
1778	         Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
1779	         Session Initiation Protocol", RFC 3261, June 2002.

1781	   [2]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
1782	         Levels", BCP 14, RFC 2119, March 1997.

1784	   [3]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1785	         Session Description Protocol (SDP)", RFC 3264, June 2002.

1787	   [4]   Roach, A., "Session Initiation Protocol (SIP)-Specific Event
1788	         Notification", RFC 3265, June 2002.

1790	   [5]   Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1791	         Description Protocol", RFC 4566, July 2006.

1793	   [6]   Johnston, A., "Session Initiation Protocol Service Examples",
1794	         draft-ietf-sipping-service-examples-13 (work in progress),
1795	         July 2007.

1797	   [7]   Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo,
1798	         "Best Current Practices for Third Party Call Control (3pcc) in
1799	         the Session Initiation Protocol (SIP)", BCP 85, RFC 3725,
1800	         April 2004.

1802	   [8]   Sparks, R., "The Session Initiation Protocol (SIP) Refer
1803	         Method", RFC 3515, April 2003.

1805	   [9]   Mahy, R., Biggs, B., and R. Dean, "The Session Initiation
1806	         Protocol (SIP) "Replaces" Header", RFC 3891, September 2004.

1808	   [10]  Mahy, R. and D. Petrie, "The Session Initiation Protocol (SIP)
1809	         "Join" Header", RFC 3911, October 2004.

1811	   [11]  Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE-
1812	         Initiated Dialog Event Package for the Session Initiation
1813	         Protocol (SIP)", RFC 4235, November 2005.

1815	   [12]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
1816	         Initiation Protocol (SIP) Event Package for Conference State",
1817	         RFC 4575, August 2006.

1819	   [13]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
1820	         Package for Registrations", RFC 3680, March 2004.

1822	   [14]  Rosenberg, J., "A Presence Event Package for the Session
1823	         Initiation Protocol (SIP)", RFC 3856, August 2004.

1825	   [15]  Rosenberg, J., "A Framework for Conferencing with the Session
1826	         Initiation Protocol (SIP)", RFC 4353, February 2006.

1828	   [16]  Rosenberg, J., "A Framework for Application Interaction in the
1829	         Session Initiation Protocol  (SIP)",
1830	         draft-ietf-sipping-app-interaction-framework-05 (work in
1831	         progress), July 2005.

1833	   [17]  Camarillo, G., "Framework for Transcoding with the Session
1834	         Initiation Protocol (SIP)",
1835	         draft-ietf-sipping-transc-framework-05 (work in progress),
1836	         December 2006.

1838	   [18]  Sparks, R., "Session Initiation Protocol Call Control -
1839	         Transfer", draft-ietf-sipping-cc-transfer-08 (work in
1840	         progress), July 2007.

1842	   [19]  Johnston, A. and O. Levin, "Session Initiation Protocol (SIP)
1843	         Call Control - Conferencing for User Agents", BCP 119,
1844	         RFC 4579, August 2006.

1846	   [20]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating
1847	         User Agent Capabilities in the Session Initiation Protocol
1848	         (SIP)", RFC 3840, August 2004.

1850	   [21]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
1851	         Preferences for the Session Initiation Protocol (SIP)",
1852	         RFC 3841, August 2004.

1854	   [22]  Campbell, B. and R. Sparks, "Control of Service Context using
1855	         SIP Request-URI", RFC 3087, April 2001.

1857	   [23]  Jennings, C. and R. Mahy, "Remote Call Control in the Session
1858	         Initiation Protocol (SIP) using the REFER  method and the
1859	         session-oriented dialog package", draft-mahy-sip-remote-cc-05
1860	         (work in progress), March 2007.

1862	   [24]  Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media
1863	         Services with SIP", RFC 4240, December 2005.

1865	   [25]  Jennings, C., Audet, F., and J. Elwell, "Session Initiation
1866	         Protocol (SIP) URIs for Applications such as Voicemail and
1867	         Interactive Voice Response (IVR)", RFC 4458, April 2006.

1869	   [26]  Rosenberg, J., "Request Authorization through Dialog
1870	         Identification in the Session Initiation Protocol (SIP)",
1871	         RFC 4538, June 2006.

1873	   [27]  Lennox, J., Wu, X., and H. Schulzrinne, "Call Processing
1874	         Language (CPL): A Language for User Control of Internet
1875	         Telephony Services", RFC 3880, October 2004.

1877	   [28]  Willis, D. and A. Allen, "Requesting Answering Modes for the
1878	         Session Initiation Protocol (SIP)",
1879	         draft-ietf-sip-answermode-06 (work in progress),
1880	         September 2007.

1882	   [29]  Mahy, R., "A Message Summary and Message Waiting Indication
1883	         Event Package for the Session Initiation Protocol (SIP)",
1884	         RFC 3842, August 2004.

1886	Authors' Addresses

1888	   Rohan Mahy
1889	   Plantronics
1890	   345 Encincal Street
1891	   Santa Cruz, CA
1892	   USA

1894	   Email: rohan@ekabal.com

1896	   Ben Campbell
1897	   Estacado Systems

1899	   Email: ben@nostrum.com

1901	   Robert Sparks
1902	   Estacado Systems

1904	   Email: rjsparks@nostrum.com

1906	   Jonathan Rosenberg
1907	   Cisco Systems

1909	   Email: jdrosen@cisco.com

1911	   Dan Petrie
1912	   SIP EZ

1914	   Email: dpetrie@sipez.com
1915	   Alan Johnston (editor)
1916	   Avaya

1918	   Email: alan@sipstation.com

1920	Full Copyright Statement

1922	   Copyright (C) The IETF Trust (2007).

1924	   This document is subject to the rights, licenses and restrictions
1925	   contained in BCP 78, and except as set forth therein, the authors
1926	   retain all their rights.

1928	   This document and the information contained herein are provided on an
1929	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1930	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1931	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1932	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1933	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1934	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1936	Intellectual Property

1938	   The IETF takes no position regarding the validity or scope of any
1939	   Intellectual Property Rights or other rights that might be claimed to
1940	   pertain to the implementation or use of the technology described in
1941	   this document or the extent to which any license under such rights
1942	   might or might not be available; nor does it represent that it has
1943	   made any independent effort to identify any such rights.  Information
1944	   on the procedures with respect to rights in RFC documents can be
1945	   found in BCP 78 and BCP 79.

1947	   Copies of IPR disclosures made to the IETF Secretariat and any
1948	   assurances of licenses to be made available, or the result of an
1949	   attempt made to obtain a general license or permission for the use of
1950	   such proprietary rights by implementers or users of this
1951	   specification can be obtained from the IETF on-line IPR repository at
1952	   http://www.ietf.org/ipr.

1954	   The IETF invites any interested party to bring to its attention any
1955	   copyrights, patents or patent applications, or other proprietary
1956	   rights that may cover technology that may be required to implement
1957	   this standard.  Please address the information to the IETF at
1958	   ietf-ipr@ietf.org.

1960	Acknowledgment

1962	   Funding for the RFC Editor function is provided by the IETF
1963	   Administrative Support Activity (IASA).