idnits 2.17.1 

draft-ietf-sipping-cc-framework-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 24.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1889.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1866.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1873.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1879.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 979 has weird spacing: '...with on    sip...'

  == Line 991 has weird spacing: '... prompt  sip:s...'

  == Line 1532 has weird spacing: '... Alerts    sub...'

  == Line 1819 has weird spacing: '...riented  dialo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (Oct 2005) is 6761 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '3pcc' on line 174

  -- Looks like a reference, but probably isn't: 'JTAPI' on line 234

  -- Looks like a reference, but probably isn't: 'CSTA' on line 235

  -- Looks like a reference, but probably isn't: 'SDP' on line 460

  -- Looks like a reference, but probably isn't: 'VoiceXML' on line 697

  -- Looks like a reference, but probably isn't: 'CPL' on line 1451

  == Unused Reference: '22' is defined on line 1815, but no explicit
     reference was found in the text

  == Unused Reference: '24' is defined on line 1822, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 3265 (ref. '4') (Obsoleted by RFC 6665)

  ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566)

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sipping-service-examples-09

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-sipping-conferencing-framework (ref. '15')

  == Outdated reference: A later version (-05) exists of
     draft-ietf-sipping-transc-framework-02

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-sipping-transc-framework (ref. '17')

  == Outdated reference: A later version (-12) exists of
     draft-ietf-sipping-cc-transfer-05

  == Outdated reference: A later version (-05) exists of
     draft-mahy-sip-remote-cc-01


     Summary: 8 errors (**), 0 flaws (~~), 12 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SIPPING WG                                                       R. Mahy
3	Internet-Draft                                              SIP Edge LLC
4	Expires: April 4, 2006                                       B. Campbell
5	                                                               R. Sparks
6	                                                        Estacado Systems
7	                                                            J. Rosenberg
8	                                                           Cisco Systems
9	                                                               D. Petrie
10	                                                                  SIP EZ
11	                                                             A. Johnston
12	                                                                     MCI
13	                                                                Oct 2005

15	     A Call Control and Multi-party usage framework for the Session
16	                       Initiation Protocol (SIP)
17	                 draft-ietf-sipping-cc-framework-05.txt

19	Status of this Memo

21	   By submitting this Internet-Draft, each author represents that any
22	   applicable patent or other IPR claims of which he or she is aware
23	   have been or will be disclosed, and any of which he or she becomes
24	   aware will be disclosed, in accordance with Section 6 of BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on April 4, 2006.

44	Copyright Notice

46	   Copyright (C) The Internet Society (2005).

48	Abstract
49	   This document defines a framework and requirements for multi-party
50	   usage of SIP.  To enable discussion of multi-party features and
51	   applications we define an abstract call model for describing the
52	   media relationships required by many of these.  The model and actions
53	   described here are specifically chosen to be independent of the SIP
54	   signaling and/or mixing approach chosen to actually setup the media
55	   relationships.  In addition to its dialog manipulation aspect, this
56	   framework includes requirements for communicating related information
57	   and events such as conference and session state, and session history.
58	   This framework also describes other goals which embody the spirit of
59	   SIP applications as used on the Internet.

61	Table of Contents

63	   1.   Conventions  . . . . . . . . . . . . . . . . . . . . . . . .   4
64	   2.   Motivation and Background  . . . . . . . . . . . . . . . . .   4
65	   3.   Key Concepts . . . . . . . . . . . . . . . . . . . . . . . .   6
66	     3.1  "Conversation Space" Model . . . . . . . . . . . . . . . .   6
67	     3.2  Comparison with Related Definitions  . . . . . . . . . . .   7
68	     3.3  Signaling Models . . . . . . . . . . . . . . . . . . . . .   8
69	     3.4  Mixing Models  . . . . . . . . . . . . . . . . . . . . . .   9
70	       3.4.1  Tightly Coupled  . . . . . . . . . . . . . . . . . . .  10
71	       3.4.2  Loosely Coupled  . . . . . . . . . . . . . . . . . . .  10
72	     3.5  Conveying Information and Events . . . . . . . . . . . . .  11
73	     3.6  Componentization and Decomposition . . . . . . . . . . . .  13
74	       3.6.1  Media Intermediaries . . . . . . . . . . . . . . . . .  14
75	       3.6.2  Mixer  . . . . . . . . . . . . . . . . . . . . . . . .  14
76	       3.6.3  Transcoder . . . . . . . . . . . . . . . . . . . . . .  14
77	       3.6.4  Media Relay  . . . . . . . . . . . . . . . . . . . . .  14
78	       3.6.5  Queue Server . . . . . . . . . . . . . . . . . . . . .  14
79	       3.6.6  Parking Place  . . . . . . . . . . . . . . . . . . . .  15
80	       3.6.7  Announcements and Voice Dialogs  . . . . . . . . . . .  15
81	     3.7  Use of URIs  . . . . . . . . . . . . . . . . . . . . . . .  17
82	       3.7.1  Naming Users in SIP  . . . . . . . . . . . . . . . . .  17
83	       3.7.2  Naming Services with SIP URIs  . . . . . . . . . . . .  19
84	     3.8  Invoker Independence . . . . . . . . . . . . . . . . . . .  22
85	     3.9  Billing issues . . . . . . . . . . . . . . . . . . . . . .  23
86	   4.   Catalog of call control actions and sample features  . . . .  23
87	     4.1  Early Dialog Actions . . . . . . . . . . . . . . . . . . .  24
88	       4.1.1  Remote Answer  . . . . . . . . . . . . . . . . . . . .  24
89	       4.1.2  Remote Forward or Put  . . . . . . . . . . . . . . . .  24
90	       4.1.3  Remote Busy or Error Out . . . . . . . . . . . . . . .  24
91	     4.2  Single Dialog Actions  . . . . . . . . . . . . . . . . . .  24
92	       4.2.1  Remote Dial  . . . . . . . . . . . . . . . . . . . . .  24
93	       4.2.2  Remote On and Off Hold . . . . . . . . . . . . . . . .  25
94	       4.2.3  Remote Hangup  . . . . . . . . . . . . . . . . . . . .  25
95	     4.3  Multi-dialog actions . . . . . . . . . . . . . . . . . . .  25
96	       4.3.1  Transfer . . . . . . . . . . . . . . . . . . . . . . .  25
97	       4.3.2  Take . . . . . . . . . . . . . . . . . . . . . . . . .  26
98	       4.3.3  Add  . . . . . . . . . . . . . . . . . . . . . . . . .  26
99	       4.3.4  Local Join . . . . . . . . . . . . . . . . . . . . . .  27
100	       4.3.5  Insert . . . . . . . . . . . . . . . . . . . . . . . .  27
101	       4.3.6  Split  . . . . . . . . . . . . . . . . . . . . . . . .  27
102	       4.3.7  Near-fork  . . . . . . . . . . . . . . . . . . . . . .  27
103	       4.3.8  Far fork . . . . . . . . . . . . . . . . . . . . . . .  28
104	   5.   Security Considerations  . . . . . . . . . . . . . . . . . .  28
105	   6.   IANA Considerations  . . . . . . . . . . . . . . . . . . . .  29
106	   7.   Appendix A: Example Features . . . . . . . . . . . . . . . .  29
107	     7.1  Implementation of these features . . . . . . . . . . . . .  33
108	       7.1.1  Call Park  . . . . . . . . . . . . . . . . . . . . . .  33
109	       7.1.2  Call Pickup  . . . . . . . . . . . . . . . . . . . . .  34
110	       7.1.3  Music on Hold  . . . . . . . . . . . . . . . . . . . .  34
111	       7.1.4  Call Monitoring  . . . . . . . . . . . . . . . . . . .  34
112	       7.1.5  Barge-in . . . . . . . . . . . . . . . . . . . . . . .  35
113	       7.1.6  Intercom . . . . . . . . . . . . . . . . . . . . . . .  35
114	       7.1.7  Speakerphone paging  . . . . . . . . . . . . . . . . .  35
115	       7.1.8  Distinctive ring . . . . . . . . . . . . . . . . . . .  35
116	       7.1.9  Voice message screening  . . . . . . . . . . . . . . .  36
117	       7.1.10   Single Line Extension  . . . . . . . . . . . . . . .  36
118	       7.1.11   Click-to-dial  . . . . . . . . . . . . . . . . . . .  36
119	       7.1.12   Pre-paid calling . . . . . . . . . . . . . . . . . .  36
120	       7.1.13   Voice Portal . . . . . . . . . . . . . . . . . . . .  37
121	   8.   References . . . . . . . . . . . . . . . . . . . . . . . . .  37
122	     8.1  Normative References . . . . . . . . . . . . . . . . . . .  37
123	     8.2  Informational References . . . . . . . . . . . . . . . . .  39
124	        Authors' Addresses . . . . . . . . . . . . . . . . . . . . .  39
125	        Intellectual Property and Copyright Statements . . . . . . .  41

127	1.  Conventions

129	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
130	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
131	   document are to be interpreted as described in RFC-2119 [2].

133	2.  Motivation and Background

135	   The Session Initiation Protocol [1] (SIP) was defined for the
136	   initiation, maintenance, and termination of sessions or calls between
137	   one or more users.  However, despite its origins as a large-scale
138	   multiparty conferencing protocol, SIP is used today primarily for
139	   point to point calls.  This two-party configuration is the focus of
140	   the SIP specification and most of its extensions.

142	   This document defines a framework and requirements for multi-party
143	   usage of SIP.  Most multi-party operations manipulate SIP session
144	   dialogs (also known as call legs) or SIP conference media policy to
145	   cause participants in a conversation to perceive specific media
146	   relationships.  In other protocols that deal with the concept of
147	   calls, this manipulation is known as call control.  In addition to
148	   its dialog or policy manipulation aspect, "call control" also
149	   includes communicating information and events related to manipulating
150	   calls, including information and events dealing with session state
151	   and history, conference state, user state, and even message state.

153	   Based on input from the SIP community, the authors compiled the
154	   following set of goals for SIP call control and multiparty
155	   applications:
156	   o  Define Primitives, Not Services.  Allow for a handful of robust
157	      yet simple mechanisms which can be combined to deliver features
158	      and services.  Throughout this document we refer to these simple
159	      mechanisms as "primitives".  Primitives should be sufficiently
160	      robust that when they are combined they can be used to build lots
161	      of services.  However, the goal is not to define a provably
162	      complete set of primitives.  Note that while the IETF will NOT
163	      standardize behavior or services, it may define example services
164	      for informational purposes, as in service examples [6].
165	   o  Participant oriented.  The primitives should be designed to
166	      provide services which are oriented around the experience of the
167	      participants.  The authors observe that end users of features and
168	      services usually don't care how a media relationship is setup.
169	      Their ultimate experience is based only on the resulting media and
170	      other externally visible characteristics.
171	   o  Signaling Model independent: Support both a central control and a
172	      peer-to-peer feature invocation model (and combinations of the
173	      two).  Baseline SIP already supports a centralized control model
174	      described in [3pcc], and the SIP community has expressed a great
175	      deal of interest in peer-to-peer or distributed call control using
176	      primitives such as those defined in REFER [8], Replaces [9], and
177	      Join [10].
178	   o  Mixing Model independent: The bulk of interesting multiparty
179	      applications involve mixing or combining media from multiple
180	      participants.  This mixing can be performed by one or more of the
181	      participants, or by a centralized mixing resource.  The experience
182	      of the participants should not depend on the mixing model used.
183	      While most examples in this document refer to audio mixing, the
184	      framework applies to any media type.  In this context a "mixer"
185	      refers to combining media in an appropriate, media-specific way.
186	      This is consistent with model described in the SIP conferencing
187	      framework.
188	   o  Invoker oriented.  Only the user who invokes a feature or a
189	      service needs to know exactly which service is invoked or why.
190	      This is good because it allows new services to be created without
191	      requiring new primitives from all the participants; and it allows
192	      for much simpler feature authorization policies, for example, when
193	      participation spans organizational boundaries.  As discussed in
194	      section 3.8, this also avoids exponential state explosion when
195	      combining features.  The invoker only has to manage a user
196	      interface or API to prevent local feature interactions.  All the
197	      other participants simply need to manage the feature interactions
198	      of a much smaller number of primitives.
199	   o  Primitives make full use of URIs.  URIs are a very powerful
200	      mechanism for describing users and services.  They represent a
201	      plentiful resource which can be extremely expressive and easily
202	      routed, translated, and manipulated--even across organizational
203	      boundaries.  URIs can contain special parameters and informational
204	      headers which need only be relevant to the owner of the namespace
205	      (domain) of the URI.  Just as a user who selects an http: URL need
206	      not understand the significance and organization of the web site
207	      it references, a user may encounter a SIP URL which translates
208	      into an email-style group alias, which plays a pre-recorded
209	      message, or runs some complex call-handling logic.  Note that
210	      while this may seem paradoxical to the previous goal, both goals
211	      can be satisfied by the same model.
212	   o  Make use of SIP headers and SIP event packages to provide SIP
213	      entities with information about their environment.  These should
214	      include information about the status / handling of dialogs on
215	      other user agents, information about the history of other contacts
216	      attempted prior to the current contact, the status of
217	      participants, the status of conferences, user presence
218	      information, and the status of messages.
219	   o  Encourage service decomposition, and design to make use of
220	      standard components using well-defined, simple interfaces.  Sample
221	      components include a SIP mixer, recording service, announcement
222	      server, and voice dialog server.  (This is not an exhaustive
223	      list).
224	   o  Include authentication, authorization, policy, logging, and
225	      accounting mechanisms to allow these primitives to be used safely
226	      among mutually untrusted participants.  Some of these mechanisms
227	      may be used to assist in billing, but no specific billing system
228	      will be endorsed.
229	   o  Permit graceful fallback to baseline SIP.  Definitions for new SIP
230	      call control extensions/primitives MUST describe a graceful way to
231	      fallback to baseline SIP behavior.  Support for one primitive MUST
232	      NOT imply support for another primitive.
233	   o  There is no desire or goal to reinvent traditional models, such as
234	      the model used the [H.450] family of protocols, [JTAPI], or the
235	      [CSTA] call model, as these other models do not share the design
236	      goals presented in this document.

238	3.  Key Concepts

240	3.1  "Conversation Space" Model

242	   This document introduces the concept of an abstract "conversation
243	   space" (essentially as a set of participants who believe they are all
244	   communicating among one another).  Each conversation space contains
245	   one or more participants.

247	   Participants are SIP User Agents which send original media to or
248	   terminate and receive media from other members of the conversation
249	   space.  Logically, every participant in the conversation space has
250	   access to all the media generated in that space (this is strictly
251	   true if all participants share a common media type).  A SIP User
252	   Agent which does not contribute or consume any media is NOT a
253	   participant; nor is a user agent which merely forwards, transcodes,
254	   mixes, or selects media originating elsewhere in the conversation
255	   space.  [Note that a conversation space consists of zero or more SIP
256	   calls or SIP conferences.  A conversation space is similar to the
257	   definition of a "call" in some other call models.]

259	   Participants may represent human users or non-human users (referred
260	   to as robots or automatons in this document).  Some participants may
261	   be hidden within a conversation space.  Some examples of hidden
262	   participants include: robots which generate tones, images, or
263	   announcements during a conference to announce users arriving and
264	   departing, a human call center supervisor monitoring a conversation
265	   between a trainee and a customer, and robots which record media for
266	   training or archival purposes.

268	   Participants may also be active or passive.  Active participants are
269	   expected to be intelligent enough to leave a conversation space when
270	   they no longer desire to participate.  (An attentive human
271	   participant is obviously active.)  Some robotic participants (such as
272	   a voice messaging system, an instant messaging agent, or a voice
273	   dialog system) may be active participants if they can leave the
274	   conversation space when there is no human interaction.  Other robots
275	   (for example our tone generating robot from the previous example) are
276	   passive participants.  A human participant "on-hold" is passive.

278	   An example diagram of a conversation space can be shown as a "bubble"
279	   or ovals, or as a "set" in curly or square brace notation.  Each set,
280	   oval, or "bubble" represents a conversation space.  Hidden
281	   participants are shown in lowercase letters.

283	   { A , B }               [ A , B ]

285	      .-.                 .---.
286	     /   \               /     \
287	    /  A  \             / A   b \
288	   (       )           (         )
289	    \  B  /             \ C   D /
290	     \   /               \     /
291	      '-'                 '---'

293	3.2  Comparison with Related Definitions

295	   In SIP, a call is "an informal term that refers to some communication
296	   between peers, generally set up for the purposes of a multimedia
297	   conversation."  Obviously we cannot discuss normative behavior based
298	   on such an intentionally vague definition.  The concept of a
299	   conversation space is needed because the SIP definition of call is
300	   not sufficiently precise for the purpose of describing the user
301	   experience of multiparty features.

303	   Do any other definitions convey the correct meaning?  SIP, and SDP
304	   [5] both define a conference as "a multimedia session identified by a
305	   common session description."  A session is defined as "a set of
306	   multimedia senders and receivers and the data streams flowing from
307	   senders to receivers."  Both of these definitions are heavily
308	   oriented toward multicast sessions with little differenciation among
309	   participants.  As such, neither is particularly useful for our
310	   purposes.  In fact, the definition of "call" in some call models is
311	   more similar to our definition of a conversation space.

313	   Some examples of the relationship between conversation spaces, SIP
314	   call legs, and SIP sessions are listed below.  In each example, a
315	   human user will perceive that there is a single call.

317	   o  A simple two-party call is a single conversation space, a single
318	      session, and a single call-leg.
319	   o  A locally mixed three-way call is two sessions and two call-legs.
320	      It is also a single conversation space.
321	   o  A simple dial-in audio conference is a single conversation space,
322	      but is represented by as many call-legs and sessions as there are
323	      human participants.
324	   o  A multicast conference is a single conversation space, a single
325	      session, and as many call-legs as participants.

327	3.3  Signaling Models

329	   Obviously to make changes to a conversation space, you must be able
330	   to use SIP signaling to cause these changes.  Specifically there must
331	   be a way to manipulate SIP dialogs (call legs) to move participants
332	   into and out of conversation spaces.  Although this is not as
333	   obvious, there also must be a way to manipulate SIP dialogs to
334	   include non-participant user agents which are otherwise involved in a
335	   conversation space (ex: B2BUAs, 3pcc controllers, mixers,
336	   transcoders, translators, or relays).

338	   Implementations may setup the media relationships described in the
339	   conversation space model using the approach described in 3pcc [7].
340	   The 3pcc approach relies on only the following 3 primitive
341	   operations:
342	   o  Create a new call-leg  (INVITE)
343	   o  Modify a call-leg      (reINVITE)
344	   o  Destroy a call-leg     (BYE)

346	   The main advantage of the 3pcc approach is that it only requires very
347	   basic SIP support from end systems to support call control features.
348	   As such, third-party call control is a natural way to handle protocol
349	   conversion and mid-call features.  It also has the advantage and
350	   disadvantage that new features can/must be implemented in one place
351	   only (the controller), and neither requires enhanced client
352	   functionality, nor takes advantage of it.

354	   In addition, a peer-to-peer approach is discussed at length in this
355	   draft.  The primary drawback of the peer-to-peer model is additional
356	   end system complexity.  The benefits of the peer-to-peer model
357	   include:
358	   o  state remains at the edges
359	   o  call signaling need only go through participants involved (there
360	      are no additional points of failure)
361	   o  peers can take advantage of end-to-end message integrity or
362	      encryption

364	   o  setup time is shorter (fewer messages and round trips are
365	      required)

367	   The peer-to-peer approach relies on additional "primitive"
368	   operations, some of which are identified here.
369	   o  Replace an existing dialog
370	   o  Join a new dialog with an existing dialog
371	   o  Support SIP conference policy control
372	   o  Locally perform media forking (multi-unicast)
373	   o  Ask another UA to send a request on your behalf

375	   Many of the features, primitives, and actions described in this
376	   document also require some type of media mixing, combining, or
377	   selection as described in the next section.

379	3.4  Mixing Models

381	   SIP permits a variety of mixing models, which are discussed here
382	   briefly.  This topic is discussed more thoroughly in the SIP
383	   conferencing framework [15] and cc-conferencing [19].  SIP supports
384	   both tightly-coupled and loosely-coupled conferencing, although more
385	   sophisticated behavior is available in tightly-coupled conferences.
386	   In a tightly-coupled conference, a single SIP user agent (called the
387	   focus) has a direct dialog relationship with each participant (and
388	   may control non participant user agents as well).  In a loosely-
389	   coupled conference there is no coordinated signaling relationships
390	   among the participants.

392	   For brevity, only the two most popular conferencing models are
393	   significantly discussed in this document (local and centralized
394	   mixing).  Applications of the conversation spaces model to loosely-
395	   coupled multicast and distributed full unicast mesh conferences are
396	   left as an exercise for the reader.  Note that a distributed full
397	   mesh conference can be used for basic conferences, but does not
398	   easily allow for more complex conferencing actions like splitting,
399	   merging, and sidebars.

401	   Call control features should be designed to allow a mixer (local or
402	   centralized) to decide when to reduce a conference back to a 2-party
403	   call, or drop all the participants (for example if only two
404	   automatons are communicating).  The actual heuristics used to release
405	   calls are beyond the scope of this document, but may depend on
406	   properties in the conversation space, such as the number of active,
407	   passive, or hidden participants; and the send-only, receive-only, or
408	   send-and-receive orientation of various participants.

410	3.4.1  Tightly Coupled

412	3.4.1.1  (Single) End System Mixing

414	   The first model we call "end system mixing".  In this model, user A
415	   calls user B, and they have a conversation.  At some point later, A
416	   decides to conference in user C. To do this, A calls C, using a
417	   completely separate SIP call.  This call uses a different Call-ID,
418	   different tags, etc.  There is no call set up directly between B and
419	   C. No SIP extension or external signaling is needed.  A merely
420	   decides to locally join two call-legs.

422	      B     C
423	       \   /
424	        \ /
425	         A

427	   A receives media streams from both B and C, and mixes them.  A sends
428	   a stream containing A's and C's streams to B, and a stream containing
429	   A's and B's streams to C. Basically, user A handles both signaling
430	   and media mixing.

432	3.4.1.2  Centralized Mixing

434	   In a centralized mixing model, all participants have a pairwise SIP
435	   and media relationship with the mixer.  Common applications of
436	   centralized mixing include ad-hoc conferences and scheduled dial-in
437	   or dial-out conferences. [need diagram]

439	3.4.1.3  Centralized Signaling, Distributed Media

441	   In this conferencing model, there is a centralized controller, as in
442	   the dial-in and dial-out cases.  However, the centralized server
443	   handles signaling only.  The media is still sent directly between
444	   participants, using either multicast or multi-unicast.  Multi-unicast
445	   is when a user sends multiple packets (one for each recipient,
446	   addressed to that recipient).  This is referred to as a
447	   "Decentralized Multipoint Conference" in [H.323].

449	3.4.2  Loosely Coupled

451	   In these models, there is no point of central control of SIP
452	   signaling.  As in the "Centralized Signaling, Distributed Media" case
453	   above, all endpoints send media to all other endpoints.  Consequently
454	   every endpoint mixes their own media from all the other sources, and
455	   sends their own media to every other participant. [add diagrams]

457	3.4.2.1  Large-Scale Multicast Conferences

459	   Large-scale multicast conferences were the original motivation for
460	   both the Session Description Protocol [SDP] and SIP.  In a large-
461	   scale multicast conference, one or more multicast addresses are
462	   allocated to the conference.  Each participant joins that multicast
463	   groups, and sends their media to those groups.  Signaling is not sent
464	   to the multicast groups.  The sole purpose of the signaling is to
465	   inform participants of which multicast groups to join.  Large-scale
466	   multicast conferences are usually pre-arranged, with specific start
467	   and stop times.  However, multicast conferences do not need to be
468	   pre-arranged, so long as a mechanism exists to dynamically obtain a
469	   multicast address.

471	3.4.2.2  Full Distributed Unicast Conferencing

473	   In this conferencing model, each participant has both a pairwise
474	   media relationship and a pairwise SIP relationship with every other
475	   participant (a full mesh).  This model requires a mechanism to
476	   maintain a consistent view of distributed state across the group.
477	   This is a classic hard problem in computer science.  Also, this model
478	   does not scale well for large numbers of participants. because for
479	   <n> participants the number of media and SIP relationships is
480	   approximately n-squared.  As a result, this model is not generally
481	   available in commercial implementations; to the contrary it is
482	   primarily the topic of research or experimental implementations.
483	   Note that this model assumes peer-to-peer signaling.

485	3.5  Conveying Information and Events

487	   Participants should have access to information about the other
488	   participants in a conversation space, so that this information can be
489	   rendered to a human user or processed by an automaton.  Although some
490	   of this information may be available from the Request-URI or To,
491	   From, Contact, or other SIP headers, another mechanism of reporting
492	   this information is necessary.

494	   Many applications are driven by knowledge about the progress of calls
495	   and conferences.  In general these types of events allow for the
496	   construction of distributed applications, where the application
497	   requires information on session dialog and conference state, but is
498	   not necessarily co-resident with an endpoint user agent or conference
499	   server.  For example, a focus involved in a conversation space may
500	   wish to provide URLs for conference status, and/or conference/floor
501	   control.

503	   The SIP Events [4] architecture defines general mechanisms for
504	   subscription to and notification of events within SIP networks.  It
505	   introduces the notion of a package which is a specific
506	   "instantiation" of the events mechanism for a well-defined set of
507	   events.

509	   Event packages are needed to provide the status of a user's session
510	   dialogs, provide the status of conferences and its participants,
511	   provide user presence information, provide the status of
512	   registrations, and provide the status of user's messages.  While this
513	   is not an exhaustive list, these are sufficient to enable the sample
514	   features described in this document.

516	   The conference event package [12] allows users to subscribe to
517	   information about an entire tightly-coupled SIP conference.
518	   Notifications convey information about the pariticipants such as: the
519	   SIP URL identifying each user, their status in the space (active,
520	   declined, departed), URLs to invoke other features (such as sidebar
521	   conversations), links to other relevant information (such as floor
522	   control policies), and if floor control policies are in place, the
523	   user's floor control status.  For conversation spaces created from
524	   cascaded conferences, converstation state can be gathered from
525	   relevant foci and merged into a cohesive set of state.

527	   The session dialog package [11] provides information about all the
528	   dialogs the target user is maintaining, what conversations the user
529	   in participating in, and how these are correlated.  Likewise the
530	   registration package [13] provides notifications when contacts have
531	   changed for a specific address-of-record.  The combination of these
532	   allows a user agent to learn about all conversations occurring for
533	   the entire registered contact set for an address-of-record.

535	   Note that user presence in SIP [14] has a close relationship with
536	   these later two event packages.  It is fundamental to the presence
537	   model that the information used to obtain user presence is
538	   constructed from any number of different input sources.  Examples of
539	   other such sources include calendaring information and uploads of
540	   presence documents.  These two packages can be considered another
541	   mechanism that allows a presence agent to determine the presence
542	   state of the user.  Specifically, a user presence server can act as a
543	   subscriber for the session dialog and registration packages to obtain
544	   additional information that can be used to construct a presence
545	   document.

547	   The multi-party architecture may also need to provide a mechanism to
548	   get information about the status /handling of a dialog (for example,
549	   information about the history of other contacts attempted prior to
550	   the current contact).  Finally, the architecture should provide ample
551	   opportunities to present informational URIs which relate to calls,
552	   conversations, or dialogs in some way.  For example, consider the SIP
553	   Call-Info header, or Contact headers returned in a 300-class
554	   response.  Frequently additional information about a call or dialog
555	   can be fetched via non-SIP URIs.  For example, consider a web page
556	   for package tracking when calling a delivery company, or a web page
557	   with related documentation when joining a dial-in conference.  The
558	   use of URIs in the multiparty framework is discussed in more detail
559	   in Section 3.7.

561	   Finally the interaction of SIP with stimulus-signaling-based
562	   applications, which allow a user agent to interact with an
563	   application without knowledge of the semantics of that application,
564	   is discussed in the SIP application interaction framework [16].
565	   Stimulus signaling can occur to a user interface running locally with
566	   the client, or to a remote user interface, through media streams.
567	   Stimulus signaling encompasses a wide range of mechanisms, ranging
568	   from clicking on hyperlinks, to pressing buttons, to traditional Dual
569	   Tone Multi Frequency (DTMF) input.  In all cases, stimulus signaling
570	   is supported through the use of markup languages, which play a key
571	   role in that framework.

573	3.6  Componentization and Decomposition

575	   This framework proposes a decomposed component architecture with a
576	   very loose coupling of services and components.  This means that a
577	   service (such as a conferencing server or an auto-attendant) need not
578	   be implemented as an actual server.  Rather, these services can be
579	   built by combining a few basic components in straightforward or
580	   arbitrarily complex ways.

582	   Since the components are easily deployed on separate boxes, by
583	   separate vendors, or even with separate providers, we achieve a
584	   separation of function that allows each piece to be developed in
585	   complete isolation.  We can also reuse existing components for new
586	   applications.  This allows rapid service creation, and the ability
587	   for services to be distributed across organizational domains anywhere
588	   in the Internet.

590	   For many of these components it is also desirable to discover their
591	   capabilities, for example querying the ability of a mixer to host a
592	   10 dialog conference, or to reserve resources for a specific time.
593	   These actions could be provided in the form of URLs, provided there
594	   is an a priori means of understanding their semantics.  For example
595	   if there is a published dictionary of operations, a way to query the
596	   service for the available operations and the associated URLs, the URL
597	   can be the interface for providing these service operations.  This
598	   concept is described in more detail in the context of dialog
599	   operations in section

601	3.6.1  Media Intermediaries

603	   Media Intermediaries are not participants in any conversation space,
604	   although an entity which is also a media translator may also have a
605	   colocated participant component (for example a mixer which also
606	   announces the arrival of a new participant; the announcement portion
607	   is a participant, but the mixer itself is not).  Media intermediaries
608	   should be as transparent as possible to the end users--offering a
609	   useful, fundamental service; without getting in the way of new
610	   features implemented by participants.  Some common media
611	   intermediaries are desribed below.

613	3.6.2  Mixer

615	   A SIP mixer is a component that combines media from all dialogs in
616	   the same conversation in a media specific way.  For example, the
617	   default combining for an audio conference might be an N-1
618	   configuration, while a text mixer might interleave text messages on a
619	   per-line basis.  More details about how to manipulate the media
620	   policy used by mixers is being discussed in the XCON Working Group.

622	3.6.3  Transcoder

624	   A transcoder translates media from one encoding or format to another
625	   (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to
626	   text/plain), or from one media type to another (for example text to
627	   speech).  A more thorough discussion of transcoding is described in
628	   SIP transcoding services invocation [17].

630	3.6.4  Media Relay

632	   A media relay terminates media and simply forwards it to a new
633	   destination without changing the content in any way.  Sometimes media
634	   relays are used to provide source IP address anonymity, to facilitate
635	   middlebox traversal, or to provide a trusted entity where media can
636	   be forcefully disconnected.

638	3.6.5  Queue Server

640	   A queue server is a location where calls can be entered into one of
641	   several FIFO (first-in, first-out) queues.  A queue server would
642	   subscribe to the presence of groups or individuals who are interested
643	   in its queues.  When detecting that a user is available to service a
644	   queue, the server redirects or transfers the last call in the
645	   relevant queue to the available user.  On a queue-by-queue basis,
646	   authorized users could also subscribe to the call state (dialog
647	   information) of calls within a queue.  Authorized users could use
648	   this information to effectively pluck (take) a call out of the queue
649	   (for example by sending an INVITE with a Replaces header to one of
650	   the user agents in the queue).

652	3.6.6  Parking Place

654	   A parking place is a location where calls can be terminated
655	   temporarily and then retrieved later.  While a call is "parked", it
656	   can receive media "on-hold" such as music, announcements, or
657	   advertisements.  Such a service could be further decomposed such that
658	   announcements or music are handled by a separate component.

660	3.6.7  Announcements and Voice Dialogs

662	   An announcement server is a server which can play digitized media
663	   (frequently audio), such as music or recorded speech.  These servers
664	   are typically accessible via SIP, HTTP, or RTSP.  An analogous
665	   service is a recording service which stores digitized media.  A
666	   convention for specifying announcements in SIP URIs is described in
667	   [netann].  Likewise the same server could easily provide a service
668	   which records digitized media.

670	   A "voice dialog" is a model of spoken interactive behavior between a
671	   human and an automaton which can include synthesized speech,
672	   digitized audio, recognition of spoken and DTMF key input, recording
673	   of spoken input, and interaction with call control.  Voice dialogs
674	   frequently consist of forms or menus.  Forms present information and
675	   gather input; menus offer choices of what to do next.

677	   Spoken dialogs are a basic building block of applications which use
678	   voice.  Consider for example that a voice mail system, the
679	   conference-id and passcode collection system for a conferencing
680	   system, and complicated voice portal applications all require a voice
681	   dialog component.

683	3.6.7.1  Text-to-Speech and Automatic Speech Recognition

685	   Text-to-Speech (TTS) is a service which converts text into digitized
686	   audio.  TTS is frequently integrated into other applications, but
687	   when separated as a component, it provides greater opportunity for
688	   broad reuse.  Automatic Speech Recognition (ASR) is a service which
689	   attempts to decipher digitized speech based on a proposed grammar.
690	   Like TTS, ASR services can be embedded, or exposed so that many
691	   applications can take advantage of such services.  A standardized
692	   (decomposed) interface to access standalone TTS and ASR services is
693	   currently being developed in the SPEECHSC Working Group.

695	3.6.7.2  VoiceXML

697	   [VoiceXML] is a W3C recommendation that was designed to give authors
698	   control over the spoken dialog between users and applications.  The
699	   application and user take turns speaking: the application prompts the
700	   user, and the user in turn responds.  Its major goal is to bring the
701	   advantages of web-based development and content delivery to
702	   interactive voice response applications.  We believe that VoiceXML
703	   represents the ideal partner for SIP in the development of
704	   distributed IVR servers.  VoiceXML is an XML based scripting language
705	   for describing IVR services at an abstract level.  VoiceXML supports
706	   DTMF recognition, speech recognition, text-to-speech, and playing out
707	   of recorded media files.  The results of the data collected from the
708	   user are passed to a controlling entity through an HTTP POST
709	   operation.  The controller can then return another script, or
710	   terminate the interaction with the IVR server.

712	   A VoiceXML server also need not be implemented as a monolithic
713	   server.  Below is a diagram of a VoiceXML browser which is split into
714	   media and non-media handling parts.  The VoiceXML interpreter handles
715	   SIP dialog state and state within a VoiceXML document, and sends
716	   requests to the media component over another protocol.

718	                       +-------------+
719	                       |             |
720	                       | VoiceXML    |
721	                       | Interpreter |
722	                       | (signaling) |
723	                       +-------------+
724	                         ^          ^
725	                         |          |
726	                     SIP |          | RTSP
727	                         |          |
728	                         |          |
729	                         v          v
730	            +-------------+        +-------------+
731	            |             |        |             |
732	            |  SIP UA     |   RTP  | RTSP Server |
733	            |             |<------>|   (media)   |
734	            |             |        |             |
735	            +-------------+        +-------------+

737	                   Figure : Decomposed VoiceXML Server

739	3.7  Use of URIs

741	   All naming in SIP uses URIs.  URIs in SIP are used in a plethora of
742	   contexts: the Request-URI; Contact, To, From, and *-Info headers;
743	   application/uri bodies; and embedded in email, web pages, instant
744	   messages, and ENUM records.  The request-URI identifies the user or
745	   service that the call is destined for.

747	   SIP URIs embedded in informational SIP headers, SIP bodies, and non-
748	   SIP content can also specify methods, special parameters, headers,
749	   and even bodies.  For example:

751	   sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098
752	     &To=<sip:bob@biloxi.com>;tag=879738
753	     &From=<sip:alice@atlanta.com>;tag=023214

755	   sip:bob@babylon.biloxi.com;method=REFER?
756	     Refer-To=<http://www.atlanta.com/~alice>

758	   Throughout this draft we discuss call control primitive operations.
759	   One of the biggest problems is defining how these operations may be
760	   invoked.  There are a number of ways to do this.  One way is to
761	   define the primitives in the protocol itself such that SIP methods
762	   (for example REFER) or SIP headers (for example Replaces) indicate a
763	   specific call control action.  Another way to invoke call control
764	   primitives is to define a specific Request-URI naming convention.
765	   Either these conventions must be shared between the client (the
766	   invoker) and the server, or published by or on behlf of the server.
767	   The former involves defining URL construction techniques (e.g.  URL
768	   parameters and/or token conventions) as proposed in [netannc].  The
769	   latter technique usually involves discovering the URI via a SIP event
770	   package, a web page, a business card, or an Instant Message.  Yet
771	   another means to acquire the URLs is to define a dictionary of
772	   primitives with well-defined semantics and provide a means to query
773	   the named primitives and corresponding URLs that may be invoked on
774	   the service or dialogs.

776	3.7.1  Naming Users in SIP

778	   An address-of-record, or public SIP address, is a SIP (or SIPS) URI
779	   that points to a domain with a location server that can map the URI
780	   to set of Contact URIs where the user might be available.  Typically
781	   the Contact URIs are populated via registration.

783	        Address of Record        Contacts

785	        sip:bob@biloxi.com   ->  sip:bob@babylon.biloxi.com:5060
786	                                 sip:bbrown@mailbox.provider.net
787	                                 sip:+1.408.555.6789@mobile.net

789	   Callee Capabilities [20] defines a set of additional parameters to
790	   the Contact header that define the characteristics of the user agent
791	   at the specified URI.  For example, there is a mobility parameter
792	   which indicates whether the UA is fixed or mobile.  When a user agent
793	   registers, it places these parameters in the Contact headers to
794	   characterize the URIs it is registering.  This allows a proxy for
795	   that domain to have information about the contact addresses for that
796	   user.

798	   When a caller sends a request, it can optionally request Caller
799	   Preferences [21], by including the Accept-Contact and Reject-Contact
800	   headers which request certain handling by the proxy in the target
801	   domain.  These headers contain preferences that describe the set of
802	   desired URIs to which the caller would like their request routed.
803	   The proxy in the target domain matches these preferences with the
804	   Contact characteristics originally registered by the target user.
805	   The target user can also choose to run arbitrarily complex "Find-me"
806	   feature logic on a proxy in the target domain.

808	   There is a strong asymmetry in how preferences for callers and
809	   callees can be presented to the network.  While a caller takes an
810	   active role by initiating the request, the callee takes a passive
811	   role in waiting for requests.  This motivates the use of callee-
812	   supplied scripts and caller preferences included in the call
813	   request.  This asymmetry is also reflected in the appropriate
814	   relationship between caller and callee preferences.  A server for a
815	   callee should respect the wishes of the caller to avoid certain
816	   locations, while the preferences among locations has to be the
817	   callee's choice, as it determines where, for example, the phone rings
818	   and whether the callee incurs mobile telephone charges for incoming
819	   calls.

821	   SIP User Agent implementations are encouraged to make intelligent
822	   decisions based on the type of participants (active/passive, hidden,
823	   human/robot) in a conversation space.  This information is conveyed
824	   via the session dialog package or in a SIP header parameter
825	   communicated using an appropriate SIP header.  For example, a music
826	   on hold service may take the sensible approach that if there are two
827	   or more unhidden participants, it should not provide hold music; or
828	   that it will not send hold music to robots.

830	   Multiple participants in the same conversation space may represent
831	   the same human user.  For example, the user may use one participant
832	   for video, chat, and whiteboard media on a PC and another for audio
833	   media on a SIP phone.  In this case, the address-of-record is the
834	   same for both user agents, but the Contacts are different.  In
835	   addition, human users may add robot participants which act on their
836	   behalf (for example a call recording service, or a calendar
837	   reminder).  Call Control features in SIP should continue to function
838	   as expected in such an environment.

840	3.7.2  Naming Services with SIP URIs

842	   [Editor's Note: this section needs to be pared down considerably, and
843	   the examples replaced with example.{com|org|net} domain names.]  A
844	   critical piece of defining a session level service that can be
845	   accessed by SIP is defining the naming of the resources within that
846	   service.  This point cannot be overstated.

848	   In the context of SIP control of application components, we take
849	   advantage of the fact that the standard SIP URI has a user part.
850	   Most services may be thought of as user automatons that participate
851	   in SIP sessions.  It naturally follows that the user address, or the
852	   left-hand-side of the URI, should be utilized as a service indicator.

854	   For example, media servers commonly offer multiple services at a
855	   single host address.  Use of the user part as a service indicator
856	   enables service consumers to direct their requests without ambiguity.
857	   It has the added benefit of enabling media services to register their
858	   availability with SIP Registrars just as any "real" SIP user would.
859	   This maintains consistency and provides enhanced flexibility in the
860	   deployment of media services in the network.

862	   There has been much discussion about the potential for confusion if
863	   media services URIs are not readily distinguishable from other types
864	   of SIP UA's.  The use of a service namespace provides a mechanism to
865	   unambiguously identify standard interfaces while not constraining
866	   the development of private or experimental services.

868	   In SIP, the request-URI identifies the user or service that the call
869	   is destined for.  The great advantage of using URIs (specifically,
870	   the SIP request URI) as a service identifier comes because of the
871	   combination of two facts.  First, unlike in the PSTN, where the
872	   namespace (dialable telephone numbers) are limited, URIs come from an
873	   infinite space.  They are plentiful, and they are free.  Secondly,
874	   the primary function of SIP is call routing through manipulations of
875	   the request URI.  In the traditional SIP application, this URI
876	   represents people.  However, the URI can also represent services, as
877	   we propose here.  This means we can apply the routing services SIP
878	   provides to routing of calls to services.  The result - the problem
879	   of service invocation and service location becomes a routing problem,
880	   for which SIP provides a scalable and flexible solution.  Since there
881	   is such a vast namespace of services, we can explicitly name each
882	   service in a finely granular way.  This allows the distribution of
883	   services across the network.

885	   Consider a conferencing service, where we have separated the names of
886	   ad-hoc conferences from scheduled conferences, we can program proxies
887	   to route calls for ad-hoc conferences to one set of servers, and
888	   calls for scheduled ones to another, possibly even in a different
889	   provider.  In fact, since each conference itself is given a URI, we
890	   can distribute conferences across servers, and easily guarantee that
891	   calls for the same conference always get routed to the same server.
892	   This is in stark contrast to conferences in the telephone network,
893	   where the equivalent of the URI - the phone number - is scarce.  An
894	   entire conferencing provider generally has one or two numbers.
895	   Conference IDs must be obtained through IVR interactions with the
896	   caller, or through a human attendant.  This makes it difficult to
897	   distribute conferences across servers all over the network, since the
898	   PSTN routing only knows about the dialed number.

900	   In the case of a dialog server, the voice dialog itself is the target
901	   for the call.  As such, the request URI should contain the identifier
902	   for this spoken dialog.  This is consistent with the Request-URI
903	   service invocation model of RFC 3087.  This URL can be in one of two
904	   formats.  In the first, the VoiceXML script is identified directly by
905	   an HTTP URL.  In the second, the script is not specified.  Rather,
906	   the dialog server uses its configuration to map the incoming request
907	   to a specific script.

909	   Since the request URI could indicate a request for a variety of
910	   different services, of which a dialog server is only one type, this
911	   example request URI first begins with a service identifier, that
912	   indicates the basic service required.  For VoiceXML scripts, this
913	   identification information is a URL-encoded version of the URL which
914	   references the script to execute, or if not present, the dialog
915	   server uses server-specific configuration to determine which script
916	   to execute.

918	   Examples of URLs that invoke VoiceXML dialogs are: (line folding for
919	   clarity only)

921	      sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml
922	       @vxmlservers.com

924	      sip:dialog.vxml@vxmlservers.com

926	   The first of these indicates that the dialog server (located at
927	   vxmlservers.com) should invoke a VoiceXML script fetched from
928	   http://dialogs.server.com/script32.vxml.  Since the user part of the
929	   SIP URL cannot contain the : character, this must be escaped to %3a.

931	   These types of conventions are not limited to application component
932	   servers.  An ordinary SIP User Agent can have a special URIs as well,
933	   for example, one which is automatically answered by a speakerphone.
934	   Since URIs are so plentiful, using a separate URI for this service
935	   does not exhaust a valuable resource.  The requested service is clear
936	   to the user agent receiving the request.  This URI can also be
937	   included as part of another feature (for example, the Intercom
938	   feature described in Section 6.1.6).  This feature can be specified
939	   with a SIP user parameter, since are part of the userpart of a SIP
940	   URI.

942	   Likewise a Request URI can fully describe an announcement service
943	   through the use of the user part of the address and additional URI
944	   parameters.  In our example, the user portion of the address, "annc",
945	   specifies the announcement service on the media server.  The two URI
946	   parameters "play=" and "early=" specify the audio resource to play
947	   and whether early media is desired.

949	       sip:annc@ms2.carrier.net;
950	        play=http://audio.carrier.net/allcircuitsbusy.au;early=yes

952	       sip:annc@ms2.carrier.net;
953	        play=file://fileserver.carrier.net/geminii/yourHoroscope.wav

955	   In practical applications, it is important that an invoker does not
956	   necessarily apply semantic rules to various URIs it did not create.
957	   Instead, it should allow any arbitrary string to be provisioned, and
958	   map the string to the desired behavior.  The administrator of a
959	   service may choose to provision specific conventions or mnemonic
960	   strings, but the application should not require it.  In any large
961	   installation, the system owner is likely to have pre-existing rules
962	   for mnemonic URIs, and any attempt by an application to define its
963	   own rules may create a conflict.  Implementations should allow an
964	   arbitrary mix of URLs from these schemes, or any other scheme that
965	   renders valid SIP URIs to be provisioned, rather than enforce only
966	   one particular scheme.

968	   For example, a voicemail application can be built using very
969	   different sets of URI conventions, as illustrated below:

971	        URI Identity       Example Scheme 1
972	                                Example Scheme 2
973	                                     Example Scheme 3

975	        Deposit with       sip:sub-rjs-deposit@vm.wcom.com
976	        standard greeting       sip:677283@vm.wcom.com
977	                                     sip:rjs@vm.wcom.com;mode=deposit

979	        Deposit with on    sip:sub-rjs-deposit-busy.vm.wcom.com
980	        phone greeting          sip:677372@vm.wcom.com
981	                                     sip:rjs@vm.wcom.com;mode=3991243

983	        Deposit with       sip:sub-rjs-deposit-sg@vm.wcom.com
984	        special greeting        sip:677384@vm.wcom.com
985	                                     sip:rjs@vm.wcom.com;mode=sg

987	        Retrieve - SIP     sip:sub-rjs-retrieve@vm.wcom.com
988	        authentication          sip:677405@vm.wcom.com
989	                                     sip:rjs@vm.wcom.com;mode=retrieve

991	        Retrieve - prompt  sip:sub-rjs-retrieve-inpin.vm.wcom.com
992	        for PIN in-band         sip:677415@vm.wcom.com
993	                                     sip:rjs@vm.wcom.com;mode=inpin

995	   As we have shown, SIP URIs represent an ideal, flexbile mechanism for
996	   describing and naming service resources, be they queues, conferences,
997	   voice dialogs, announcements, voicemail treatments, or phone
998	   features.

1000	3.8  Invoker Independence

1002	   With functional signaling, only the invoker of features in SIP need
1003	   to know exactly which feature they are invoking.  One of the primary
1004	   benefits of this approach is that combinations of functional features
1005	   work in SIP call control without requiring complex feature
1006	   interaction matrices.  For example, let us examine the combination of
1007	   a "transfer" of a call which is "conferenced".

1009	   Alice calls Bob. Alice silently "conferences in" her robotic
1010	   assistant Albert as a hidden party.  Bob transfers Alice to Carol.
1011	   If Bob asks Alice to Replace her leg with a new one to Carol then
1012	   both Alice and Albert should be communicating with Carol
1013	   (transparently).

1015	   Using the peer-to-peer model, this combination of features works fine
1016	   if A is doing local mixing (Alice replaces Bob's call-leg with
1017	   Carol's), or if A is using a central mixer (the mixer replaces Bob's
1018	   call leg with Carol's).  A clever implementation using the 3pcc model
1019	   can generate similar results.

1021	   New extensions to the SIP Call Control Framework should attempt to
1022	   preserve this property.

1024	3.9  Billing issues

1026	   Billing in the PSTN is typically based on who initiated a call.  At
1027	   the moment billing in a SIP network is neither consistent with
1028	   itself, nor with the PSTN.  (A billing model for SIP should allow for
1029	   both PSTN-style billing, and non-PSTN billing.)  The example below
1030	   demonstrates one such inconsistency.

1032	   Alice places a call to Bob. Alice then blind transfers Bob to Carol
1033	   through a PSTN gateway.  In current usage of REFER, Bob may be billed
1034	   for a call he did not initiate (his UA originated the outgoing call
1035	   leg however).  This is not necessarily a terrible thing, but it
1036	   demonstrates a security concern (Bob must have appropriate local
1037	   policy to prevent fraud).  Also, Alice may wish to pay for Bob's
1038	   session with Carol.  There should be a way to signal this in SIP.

1040	   Likewise a Replacement call may maintain the same billing
1041	   relationship as a Replaced call, so if Alice first calls Carol, then
1042	   asks Bob to Replace this call, Alice may continue to receive a bill.

1044	   Further work in SIP billing should define a way to set or discover
1045	   the direction of billing.

1047	4.  Catalog of call control actions and sample features

1049	   Call control actions can be categorized by the dialogs upon which
1050	   they operate.  The actions may involve a single or multiple dialogs.
1051	   These dialogs can be early or established.  Multiple dialogs may be
1052	   related in a conversation space to form a conference or other
1053	   interesting media topologies.

1055	   It should be noted that it is desirable to provide a means by which a
1056	   party can discover the actions which may be performed on a dialog.
1057	   The interested party may be independent or related to the dialogs.
1058	   One means of accomplishing this is through the ability to define and
1059	   obtain URLs for these actions as described in section .

1061	   Below are listed several call control "actions" which establish or
1062	   modify dialogs and relate the participants in a conversation space.
1063	   The names of the actions listed are for descriptive purposes only
1064	   (they are not normative).  This list of actions is not meant to be
1065	   exhaustive.

1067	   In the examples, all actions are initiated by the user "Alice"
1068	   represented by UA "A".

1070	4.1  Early Dialog Actions

1072	   The following are a set of actions that may be performed on a single
1073	   early dialog.  These actions can be thought of as a set of remote
1074	   control operations.  For example an automaton might perform the
1075	   operation on behalf of a user.  Alternatively a user might use the
1076	   remote control in the form of an application to perform the action on
1077	   the early dialog of a UA which may be out of reach.  All of these
1078	   actions correspond to telling the UA how to respond to a request to
1079	   establish an early dialog.  These actions provide useful
1080	   functionality for PDA, PC and server based applications which desire
1081	   the ability to control a UA.  A proposed mechanism for this type of
1082	   functionality is described in Remote Call Control [23].

1084	4.1.1  Remote Answer

1086	   A dialog is in some early dialog state such as 180 Ringing.  It may
1087	   be desirable to tell the UA to answer the dialog.  That is tell it to
1088	   send a 200 Ok response to establish the dialog.

1090	4.1.2  Remote Forward or Put

1092	   It may be desirable to tell the UA to respond with a 3xx class
1093	   response to forward an early dialog to another UA.

1095	4.1.3  Remote Busy or Error Out

1097	   It may be desirable to instruct the UA to send an error response such
1098	   as 486 Busy Here.

1100	4.2  Single Dialog Actions

1102	   There is another useful set of actions which operate on a single
1103	   established dialog.  These operations are useful in building
1104	   productivity applications for aiding users to control their phone.
1105	   For example a CRM application which sets up calls for a user
1106	   eliminating the need for the user to actually enter an address.
1107	   These operations can also be thought of a remote control actions.  A
1108	   proposed mechanism for this type of functionality is described in
1109	   Remote Call Control [23].

1111	4.2.1  Remote Dial

1113	   This action instructs the UA to initiate a dialog.  This action can
1114	   be performed using the REFER method.

1116	4.2.2  Remote On and Off Hold

1118	   This action instructs the UA to put an established dialog on hold.
1119	   Though this operation can be conceptually be performed with the REFER
1120	   method, there is no semantics defined as to what the referred party
1121	   should do with the SDP.  There is no way to distinguish between the
1122	   desire to go on or off hold.

1124	4.2.3  Remote Hangup

1126	   This action instructs the UA to terminate an early or established
1127	   dialog.  A REFER request with the following Refer-To URI performs
1128	   this action.  Note: this URL is not properly escaped.

1130	   sip:bob@babylon.biloxi.example.com;method=BYE?Call-ID=13413098
1131	     &To=<sip:bob@biloxi.com>;tag=879738
1132	     &From=<sip:alice@atlanta.example.com>;tag=023214

1134	4.3  Multi-dialog actions

1136	   These actions apply to a set of related dialogs.

1138	4.3.1  Transfer

1140	   The conversation space changes as follows:

1142	            before            after
1143	   { A , B }  -->   { C , B }

1145	   A replaces itself with C.

1147	   To make this happen using the peer-to-peer approach, "A" would send
1148	   two SIP requests.  A shorthand for those requests is shown below:

1150	   REFER B  Refer-To:C
1151	   BYE B

1153	   To make this happen instead using the 3pcc approach, the controller
1154	   sends requests represented by the shorthand below:

1156	   INVITE C (w/SDP of B)
1157	   reINVITE B (w/SDP of C)
1158	   BYE A

1160	   Features enabled by this action: - blind transfer - transfer to a
1161	   central mixer (some type of conference or forking) - transfer to park
1162	   server (park) - transfer to music on hold or announcement server -
1163	   transfer to a "queue" - transfer to a service (such as Voice Dialogs
1164	   service) - transition from local mixer to central mixer

1166	   This action is frequently referred to as "completing an attended
1167	   transfer".  It is described in more detail in cc-transfer [18].

1169	4.3.2  Take

1171	   The conversation space changes as follows: { B , C }  -->   { B , A }
1172	   A forcibly replaces C with itself.  In most uses of this primitive, A
1173	   is just "un-replacing" itself.  Using the peer-to-peer approach, "A"
1174	   sends: INVITE B  Replaces: <call leg between B and C>

1176	   Using the 3pcc approach (all requests sent from controller) INVITE A
1177	   (w/SDP of B) reINVITE B (w/SDP of A) BYE C

1179	   Features enabled by this action: - transferee completes an attended
1180	   transfer - retrieve from central mixer (not recommended) - retrieve
1181	   from music on hold or park - retrieve from queue - call center take -
1182	   voice portal resuming ownership of a call it originated - answering-
1183	   machine style screening (pickup) - pickup of a ringing call (i.e.
1184	   early dialog)

1186	   Note: that pick up of a ringing call has perhaps some interesting
1187	   additional requirements.  First of all it is an early dialog as
1188	   opposed to an established dialog.  Secondly the party which is to
1189	   pickup the call may only wish to do so only while it is an early
1190	   dialog.  That is in the race condition where the ringing UA accepts
1191	   just before it receives signaling from the party wishing to take the
1192	   call, the taking party wishes to yield or cancel the take.  The goal
1193	   is to avoid yanking an answered call from the called party.

1195	   This action is described in Replaces [9] and in cc-transfer [18].

1197	4.3.3  Add

1199	   Note that the following 4 actions are described in cc-conferencing
1200	   [19].

1202	   This is merely adding a participant to a SIP conference.  The
1203	   conversation space changes as follows: { A , B } -->    { A, B, C } A
1204	   adds C to the conversation.  Using the peer-to-peer approach, adding
1205	   a party using local mixing requires no signaling.  To transition from
1206	   a 2-party call or a locally mixed conference to centrally mixing A
1207	   could send the following requests: REFER B  Refer-To: conference-URI
1208	   INVITE conference-URI BYE B To add a party to a conference: REFER C
1209	   Refer-To: conference-URI or REFER conference-URI  Refer-To: C Using
1210	   the 3pcc approach to transition to centrally mixed, the controller
1211	   would send: INVITE mixer leg 1 (w/SDP of A) INVITE mixer leg 2 (w/SDP
1212	   of B) INVITE C (late SDP) reINVITE A (w/SDP of mixer leg 1) reINVITE
1213	   B (w/SDP of mixer leg 2) INVITE mixer leg3 (w/SDP of C) To add a
1214	   party to a SIP conference: INVITE C (late SDP) INVITE conference-URI
1215	   (w/SDP of C) Features enabled: - standard conference feature - call
1216	   recording - answering-machine style screening (screening)

1218	4.3.4  Local Join

1220	   The conversation space changes like this: { A, B}  , {A, C}  -->  {A,
1221	   B, C} or like this { A, B}  , {C, D}  -->  {A, B, C, D} A takes two
1222	   conversation spaces and joins them together into a single space.
1223	   Using the peer-to-peer approach, A can mix locally, or REFER the
1224	   participants of both conversation spaces to the same central mixer
1225	   (as in 5.3) For the 3pcc approach, the call flows for inserting
1226	   participants, and joining and splitting conversation spaces are
1227	   tedious yet straightforward, so these are left as an exercise for the
1228	   reader.  Features enabled: - standard conference feature - leaving a
1229	   sidebar to rejoin a larger conference

1231	4.3.5  Insert

1233	   The conversation space changes like this: { B , C }  -->  {A, B, C }
1234	   A inserts itself into a conversation space.  A proposed mechanism for
1235	   signaling this using the peer-to-peer approach is to send a new
1236	   header in an INVITE with "joining" semantics.  For example: INVITE B
1237	   Join: <call id of B and C> If B accepted the INVITE, B would accept
1238	   responsibility to setup the call legs and mixing necessary (for
1239	   example: to mix locally or to transfer the participants to a central
1240	   mixer) Features enabled: - barge-in - call center monitoring - call
1241	   recording

1243	4.3.6  Split

1245	   { A, B, C, D } --> { A, B } , { C, D } If using a central conference
1246	   with peer-to-peer REFER C  Refer-To: conference-URI (new URI) REFER D
1247	   Refer-To: conference-URI (new URI) BYE C BYE D Features enabled: -
1248	   sidebar conversations during a larger conference

1250	4.3.7  Near-fork

1252	   A participates in two conversation spaces simultaneously: { A, B }
1253	   --> { B , A } & { A , C } A is a participant in two conversation
1254	   spaces such that A sends the same media to both spaces, and renders
1255	   media from both spaces, presumably by mixing or rendering the media
1256	   from both.  We can define that A is the "anchor" point for both
1257	   forks, each of which is a separate conversation space.  This action
1258	   is purely local implementation (it requires no special signaling).

1260	   Local features such as switching calls between the background and
1261	   foreground are possible using this media relationship.

1263	4.3.8  Far fork

1265	   The conversation space diagram... { A, B } --> { A ,  B } & { B , C }
1266	   A requests B to be the "anchor" of two conversation spaces.  This is
1267	   easily setup by creating a conference with two subconferences and
1268	   setting the media policy appopriately such that B is a participant in
1269	   both.  Media forking can also be setup using 3pcc as described in
1270	   Section 5.1 of RFC3264 [3] (an offer/answer model for SDP).  The
1271	   session descriptions for forking are quite complex.  Controllers
1272	   should verify that endpoints can handle forked-media, for example
1273	   using prior configuration.

1275	   Features enabled:
1276	   o  barge-in
1277	   o  voice portal services
1278	   o  whisper
1279	   o  hotword detection
1280	   o  sending DTMF somewhere else

1282	5.  Security Considerations

1284	   Call Control primitives provide a powerful set of features that can
1285	   be dangerous in the hands of an attacker.  To complicate matters,
1286	   call control primitives are likely to be automatically authorized
1287	   without direct human oversight.

1289	   The class of attacks which are possible using these tools include the
1290	   ability to eavesdrop on calls, disconnect calls, redirect calls,
1291	   render irritating content (including ringing) at a user agent, cause
1292	   an action that has billing consequences, subvert billing (theft-of-
1293	   service), and obtain private information.  Call control extensions
1294	   must take extra care to describe how these attacks will be prevented.

1296	   We can also make some general observations about authorization and
1297	   trust with respect to call control.  The security model is
1298	   dramatically dependent on the signaling model chosen (see section
1299	   3.2)

1301	   Let us first examine the security model used in the 3pcc approach.
1302	   All signaling goes through the controller, which is a trusted entity.
1303	   Traditional SIP authentication and hop-by-hop encrpytion and message
1304	   integrity work fine in this environment, but end-to-end encrpytion
1305	   and message integrity may not be possible.

1307	   When using the peer-to-peer approach, call control actions and
1308	   primitives can be legitimately initiated by a) an existing
1309	   participant in the conversation space, b) a former participant in the
1310	   conversation space, or c) an entity trusted by one of the
1311	   participants.  For example, a participant always initiates a
1312	   transfer; a retrieve from Park (a take) is initiated on behalf of a
1313	   former participant; and a barge-in (insert or far-fork) is initiated
1314	   by a trusted entity (an operator for example).

1316	   Authenticating requests by an existing participant or a trusted
1317	   entity can be done with baseline SIP mechanisms.  In the case of
1318	   features initiated by a former participant, these should be protected
1319	   against replay attacks by using a unique name or identifier per
1320	   invocation.  The Replaces header exhibits this behavior as a by-
1321	   product of its operation (once a Replaces operation is successful,
1322	   the call-leg being Replaced no longer exists).  For other requests, a
1323	   "one-time" Request-URI may be provided to the feature invoker.

1325	   To authorize call control primitives that trigger special behavior
1326	   (such as an INVITE with Replaces or Join semantics), the receiving
1327	   user agent may have trouble finding appropriate credentials with
1328	   which to challenge or authorize the request, as the sender may be
1329	   completely unknown to the receiver, except through the introduction
1330	   of a third party.  These credentials need to be passed transitively
1331	   in some way or fetched in an event body, for example.

1333	6.  IANA Considerations

1335	   This document required no action by IANA.

1337	7.  Appendix A: Example Features

1339	   Primitives are defined in terms of their ability to provide features.
1340	   These example features should require an amply robust set of services
1341	   to demonstrate a useful set of primitives.  They are described here
1342	   briefly.  Note that the descriptions of these features are non-
1343	   normative.  Some of these features are used as examples in section 6
1344	   to demonstrate how some features may require certain media
1345	   relationships.  Note also that this document describes a mixture of
1346	   both features originating in the world of telephones, and features
1347	   which are clearly Internet oriented.

1349	   Example Feature Definitions:

1351	   Call Waiting - Alice is in a call, then receives another call.  Alice
1352	   can place the first call on hold, and talk with the other caller.
1353	   She can typically switch back and forth between the callers.

1355	   Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob
1356	   to contact Carol, but makes no attempt to contact Craol
1357	   independently.  In many implementations, Alice does not verify Bob's
1358	   success or failure in contacting Carol.

1360	   Attended Transfer - The transferring party establishes a session with
1361	   the transfer target before completing the transfer.

1363	   Consultative transfer - the transferring party establishes a session
1364	   with the target and mixes both sessions together so that all three
1365	   parties can participate, then disconnects leaving the transferee and
1366	   transfer target with an active session.

1368	   Conference Call - Three or more active, visible participants in the
1369	   same conversation space.

1371	   Call Park - A call participant parks a call (essentially puts the
1372	   call on hold), and then retrieves it at a later time (typically from
1373	   another location).

1375	   Call Pickup - A party picks up a call that was ringing at another
1376	   location.  One variation allows the caller to choose which location,
1377	   another variation just picks up any call in that user's "pickup
1378	   group".

1380	   Music on Hold - When Alice places a call with Bob on hold, it
1381	   replaces its audio with streaming content such as music,
1382	   announcements, or advertisements.

1384	   Call Monitoring - A call center supervisor joins an in-progress call
1385	   for monitoring purposes.

1387	   Barge-in - Carol interrupts Alice who has a call in-progress call
1388	   with Bob. In some variations, Alice forcibly joins a new conversation
1389	   with Carol, in other variations, all three parties are placed in the
1390	   same conversation (basically a 3-way conference).

1392	   Hotline - Alice picks up a phone and is immediately connected to the
1393	   technical support hotline, for example.

1395	   Autoanswer - Calls to a certain address or location answer
1396	   immediately via a speakerphone.

1398	   Intercom - Alice typically presses a button on a phone which
1399	   immediately connects to another user or phone and casues that phone
1400	   to play her voice over its speaker.  Some variations immediately
1401	   setup two-way communications, other variations require another button
1402	   to be pressed to enable a two-way conversation.

1404	   Speakerphone paging - Alice calls the paging address and speaks.  Her
1405	   voice is played on the speaker of every idle phone in a preconfigured
1406	   group of phones.

1408	   Speed dial - Alice dials an abbreviated number, or enters an alias,
1409	   or presses a special speed dial button representing Bob. Her action
1410	   is interpreted as if she specified the full address of Bob.

1412	   Call Return - Alice calls Bob. Bob misses the call or is disconnected
1413	   before he is finished talking to Alice.  Bob invokes Call return
1414	   which calls Alice, even if Alice did not provide her real identity or
1415	   location to Bob.

1417	   Inbound Call Screening - Alice doesn't want to receive calls from
1418	   Matt.  Inbound Screening prevents Matt from disturbing Alice.  In
1419	   some variations this works even if Matt hides his identity.

1421	   Outbound Call Screening - Alice is paged and unknowingly calls a PSTN
1422	   pay-service telephone number in the Carribean, but local policy
1423	   blocks her call, and possibly informs her why.

1425	   Call Forwarding - Before a call-leg is accepted it is redirected to
1426	   another location, for example, because the originally intended
1427	   recipient is busy, does not answer, is disconnected from the network,
1428	   configured all requests to go soemwhere else.

1430	   Message Waiting - Bob calls Alice when she steps away from her phone,
1431	   when she returns a visible or audible indicator conveys that someone
1432	   has left her a voicemail message.  The message waiting indication may
1433	   also convey how many messages are waiting, from whom, what time, and
1434	   other useful pieces of information.

1436	   Do Not Disturb - Alice selects the Do Not Disturb option.  Calls to
1437	   her either ring briefly or not at all and are forwarded elsewhere.
1438	   Some variations allow specially authorized callers to override this
1439	   feature and ring Alice anyway.

1441	   Distinctive ring - Incoming calls have different ring cadences or
1442	   sample sounds depending on the From party, the To party, or other
1443	   factors.

1445	   Automatic Callback: Alice calls Bob, but Bob is busy.  Alice would
1446	   like Bob to call her automatically when he is available.  When Bob
1447	   hangs up, alice's phone rings.  When Alice answers, Bob's phone
1448	   rings.  Bob answers and they talk.

1450	   Find-Me - Alice sets up complicated rules for how she can be reached
1451	   (possibly using [CPL], [presence] or other factors).  When Bob calls
1452	   Alice, his call is eventually routed to a temporary Contact where
1453	   Alice happens to be available.

1455	   Whispered call waiting - Alice is in a conversation with Bob. Carol
1456	   calls Alice.  Either Carol can "whisper" to Alice directly ("Can you
1457	   get lunch in 15 minutes?"), or an automaton whispers to Alice
1458	   informing her that Carol is trying to reach her.

1460	   Voice message screening - Bob calls Alice.  Alice is screening her
1461	   calls, so Bob hears Alice's voicemail greeting.  Alice can hear Bob
1462	   leave his message.  If she decides to talk to Bob, she can take the
1463	   call back from the voicemail system, otherwise she can let Bob leave
1464	   a message.  This emulates the behavior of a home telephone answering
1465	   machine

1467	   Presence-Enabled Conferencing: Alice wants to set up a conference
1468	   call with Bob and Cathy when they all happen to be available (rather
1469	   than scheduling a predefined time).  The server providing the
1470	   application monitors their status, and calls all three when they are
1471	   all "online", not idle, and not in another call.

1473	   IM Conference Alerts: A user receives an notification as an Instant
1474	   Message whenever someone joins a conference they are also in.

1476	   Single Line Extension -- A group of phones are all treated as
1477	   "extensions" of a single line.  A call for one rings them all.  As
1478	   soon as one answers, the others stop ringing.  If any extension is
1479	   actively in a coversation, another extension can "pick up" and
1480	   immediately join the conversation.  This emulates the behavior of a
1481	   home telephone line with multiple phones.

1483	   Click-to-dial - Alice looks in her company directory for Bob. When
1484	   she finds Bob, she clicks on a URL to call him.  Her phone rings (or
1485	   possibly answers automatically), and when she answers, Bob's phone
1486	   rings.

1488	   Pre-paid calling - Alice pays for a certain currency or unit amount
1489	   of calling value.  When she places a call, she provides her account
1490	   number somehow.  If her account runs out of calling value during a
1491	   call her call is disconnected or redirected to a service where she
1492	   can purchase more calling value.

1494	   Voice Portal - A service that allows users to access a portal site
1495	   using spoken dialog interaction.  For example, Alice needs to
1496	   schedule a working dinner with her co-worker Carol.  Alice uses a
1497	   voice portal to check Carol's flight schedule, find a restauraunt
1498	   near her hotel, make a reservation, get directions there, and page
1499	   Carol with this information.

1501	7.1  Implementation of these features

1503	   Example Features:
1504	   Call Hold                       [Offer/Answer] for SIP
1505	   Call Waiting                    Local Implementation
1506	   Blind Transfer          [cc-transfer]
1507	   Attended Transfer               [cc-transfer]
1508	   Consultative transfer   [cc-transfer]
1509	   Conference Call         [conf-models]
1510	   Call Park                       *[examples]
1511	   Call Pickup                     *[examples]
1512	   Music on Hold           *[examples]
1513	   Call Monitoring         *Insert
1514	   Barge-in                        *Insert or Far-Fork
1515	   Hotline                 Local Implementation
1516	   Autoanswer                      Local URI convention
1517	   Speed dial                      Local Implementation
1518	   Intercom                        *Speed dial + autoanswer
1519	   Speakerphone paging             *Speed dial + autoanswer
1520	   Call Return                     Proxy feature
1521	   Inbound Call Screening  Proxy or Local implementation
1522	   Outbound Call Screening Proxy feature
1523	   Call Forwarding         Proxy or Local implementation
1524	   Message Waiting         [msg-waiting]
1525	   Do Not Disturb          [presence]
1526	   Distinctive ring                *Proxy or Local implementation
1527	   Automatic Callback              2 person presence-based conference
1528	   Find-Me                 Proxy service based on presence
1529	   Whispered call waiting  Local implementation
1530	   Voice message screening *
1531	   Presence-based Conferencing*call when presence = available
1532	   IM Conference Alerts    subscribe to conference status
1533	   Single Line Extension   *
1534	   Click-to-dial           *
1535	   Pre-paid calling                *
1536	   Voice Portal                    *

1538	7.1.1  Call Park

1540	   Call park requires the ability to: put a dialog some place, advertise
1541	   it to users in a pickup group and to uniquely identify it in a means
1542	   that can be communicated (including human voice).  The dialog can be
1543	   held locally on the UA parking the dialog or alternatively
1544	   transferred to the park service for the pickup group.  The parked
1545	   dialog then needs to be labeled (e.g. orbit 12) in a way that can be
1546	   communicated to the party that is to pick up the call.  The UAs in
1547	   the pick up group discovers the parked dialog(s) via the dialog
1548	   package from the park service.  If the dialog is parked locally the
1549	   park service merely aggregates the parked call states from the set of
1550	   UAs in the pickup up group.

1552	7.1.2  Call Pickup

1554	   There are two different features which are called call pickup.  The
1555	   first is the pickup of a parked dialog.  The UA from which the dialog
1556	   is to be picked up subscribes to the session dialog state of the park
1557	   service or the UA which has locally parked the dialog.  Dialogs which
1558	   are parked should be labeled with an identifier.  The labels are used
1559	   by the UA to allow the user to indicate which dialog is to be picked
1560	   up.  The UA picking up the call invoked the URL in the call state
1561	   which is labeled as replace-remote.

1563	   The other call pickup feature involves picking up an early dialog
1564	   (typically ringing).  This feature uses some of the same primitives
1565	   as the pick up of a parked call.  The call state of the UA ringing
1566	   phone is advertised using the dialog package.  The UA which is to
1567	   pickup the early dialog subscribes either directly to the ringing UA
1568	   or to a service aggregating the states for UAs in the pickup group.
1569	   The call state identifies early dialogs.  The UA uses the call
1570	   state(s) to help the user choose which early dialog that is to be
1571	   picked up.  The UA then invokes the URL in the call state labeled as
1572	   replace-remote.

1574	7.1.3  Music on Hold

1576	   Music on hold can be implemented a number of ways.  One way is to
1577	   transfer the held call to a holding service.  When the UA wishes to
1578	   take the call off hold it basically performs a take on the call from
1579	   the holding service.  This involves subscribing to call state on the
1580	   holding service and then invoking the URL in the call state labeled
1581	   as replace-remote.

1583	   Alternatively music on hold can be performed as a local mixing
1584	   operation.  The UA holding the call can mix in the music from the
1585	   music service via RTP (i.e. an additional dialog) or RTSP or other
1586	   streaming media source.  This approach is simpler (i.e. the held
1587	   dialog does not move so there is less chance of loosing them) from a
1588	   protocol perspective, however it does use more LAN bandwidth and
1589	   resources on the UA.

1591	7.1.4  Call Monitoring

1593	   Call monitoring is a Join operation.  The monitoring UA sends a Join
1594	   to the dialog it wants to listen to.  It is able to discover the
1595	   dialog via the dialog state on the monitored UA.  The monitoring UA
1596	   sends SDP in the INVITE which indicates receive only media.  As the
1597	   UA is monitoring only it does not matter whether the UA indicates it
1598	   wishes the send stream be mix or point to point.

1600	7.1.5  Barge-in

1602	   Barge-in works the same as call monitoring except that it must
1603	   indicate that the send media stream to be mixed so that all of the
1604	   other parties can hear the stream from UA barging in.

1606	7.1.6  Intercom

1608	   The UA initiates a dialog using INVITE in the ordinary way.  The
1609	   calling UA then signals the paged UA to answer the call.  The calling
1610	   UA may discover the URL to answer the call via the session dialog
1611	   package of the called UA.  The called UA accepts the INVITE with a
1612	   200 Ok and automatically enables the speakerphone.

1614	   Alternatively this can be a local decision for the UA to answer based
1615	   upon called party identification.

1617	7.1.7  Speakerphone paging

1619	   Speakerphone paging can be implemented using either multicast or
1620	   through a simple multipoint mixer.  In the multicast solution the
1621	   paging UA sends a multicast INVITE with send only media in the SDP
1622	   (see also RFC3264).  The automatic answer and enabling of the
1623	   speakerphone is a locally configured decision on the paged UAs.  The
1624	   paging UA sends RTP via the multicast address indicated in the SDP.

1626	   The multipoint solution is accomplished by sending an INVITE to the
1627	   multipoint mixer.  The mixer is configured to automatically answer
1628	   the dialog.  The paging UA then sends REFER requests for each of the
1629	   UAs that are to become paging speakers (The UA is likely to send out
1630	   a single REFER which is parallel forked by the proxy server).  The
1631	   UAs performing as paging speakers are configured to automatically
1632	   answer based upon caller identification (e.g.  To field, URI or
1633	   Referred-To headers).

1635	   Finally as a third option, the user agent can send a mass-invitation
1636	   request to a conference server, which would create a conference and
1637	   send invitations to the conference to all user agents in the paging
1638	   group.

1640	7.1.8  Distinctive ring

1642	   The target UA either makes a local decision based on information in
1643	   an incoming INVITE (To, From, Contact, Request-URI) or trusts an
1644	   Alert-Info header provded by the caller or inserted by a trusted
1645	   proxy.  In the latter case, the UA fetches the content described in
1646	   the URI (typically via http) and renders it to the user.

1648	7.1.9  Voice message screening

1650	   At first, this is the same as call monitoring.  In this case the
1651	   voicemail service is one of the UAs.  The UA screening the message
1652	   monitors the call on the voicemail service, and also subscribes to
1653	   call-leg information.  If the user screening their messages decides
1654	   to answer, they perform a Take from the voicemail system (for
1655	   example, send an INVITE with Replaces to the UA leaving the message)

1657	7.1.10  Single Line Extension

1659	   Incoming calls ring all the extensions through basic parallel forking
1660	   [bis].  Each extension subscribes to call-leg events from each other
1661	   extension.  While one user has an active call, any other UA extension
1662	   can insert itself into that conversation (it already knows the call-
1663	   leg information)in the same way as barge-in.

1665	7.1.11  Click-to-dial

1667	   The application or server which hosts the click-to-dial application
1668	   captures the URL to be dialed and can setup the call using 3pcc or
1669	   can send a REFER request to the UA which is to dial the address.  As
1670	   users sometimes change their mind or wish to give up listing to a
1671	   ringing or voicemail answered phone, this application illustrates the
1672	   need to also have the ability to remotely hangup a call.

1674	7.1.12  Pre-paid calling

1676	   For prepaid calling, the user's media always passes through a device
1677	   which is trusted by the pre-paid provider.  This may be the other
1678	   endpoint (for example a PSTN gateway).  In either case, an
1679	   intermediary proxy or B2BUA can periodically verify the amount of
1680	   time available on the pre-paid account, and use the session-timer
1681	   extension to cause the trusted endpoint (gateway) or intermediary
1682	   (media relay) to send a reINVITE before that time runs out.  During
1683	   the reINVITE, the SIP intermediary can reverify the account and
1684	   insert another session-timer header.

1686	   Note that while most pre-paid systems on the PSTN use an IVR to
1687	   collect the account number and destination, this isn't strictly
1688	   necessary for a SIP-originated prepaid call.  SIP requests and SIP
1689	   URIs are sufficiently expressive to convey the final destination, the
1690	   provider of the prepaid service, the location from which the user is
1691	   calling, and the prepaid account they want to use.  If a pre-paid IVR
1692	   is used, the mechanism described below (Voice Portals) can be
1693	   combined as well.

1695	7.1.13  Voice Portal

1697	   A voice portal is essentially a complex collection of voice dialogs
1698	   used to access interesting content.  One of the most desirable call
1699	   control features of a Voice Portal is the ability to start a new
1700	   outgoing call from within the context of the Portal (to make a
1701	   restauraunt reservation, or return a voicemail message for example).
1702	   Once the new call is over, the user should be able to return to the
1703	   Portal by pressing a special key, using some DTMF sequence (ex: a
1704	   very long pound or hash tone), or by speaking a hotword (ex: "Main
1705	   Menu").

1707	   In order to accomplish this, the Voice Portal starts with the
1708	   following media relationship:

1710	   { User , Voice Portal }

1712	   The user then asks to make an outgoing call.  The Voice Portal asks
1713	   the User to perform a Far-Fork.  In other words the Voice Portal
1714	   wants the following media relationship:

1716	           { Target , User }  &  { User , Voice Portal }

1718	   The Voice Portal is now just listening for a hotword or the
1719	   appropriate DTMF.  As soon as the user indicates they are done, the
1720	   Voice Portal Takes the call from the old Target, and we are back to
1721	   the original media relationship.

1723	   This feature can also be used by the account number and phone number
1724	   collection menu in a pre-paid calling service.  A user can press a
1725	   DTMF sequence which presents them with the appropriate menu again.

1727	8.  References

1729	8.1  Normative References

1731	   [1]   Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
1732	         Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
1733	         Session Initiation Protocol", RFC 3261, June 2002.

1735	   [2]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
1736	         Levels", BCP 14, RFC 2119, March 1997.

1738	   [3]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1739	         Session Description Protocol (SDP)", RFC 3264, June 2002.

1741	   [4]   Roach, A., "Session Initiation Protocol (SIP)-Specific Event
1742	         Notification", RFC 3265, June 2002.

1744	   [5]   Handley, M. and V. Jacobson, "SDP: Session Description
1745	         Protocol", RFC 2327, April 1998.

1747	   [6]   Johnston, A., "Session Initiation Protocol Service Examples",
1748	         draft-ietf-sipping-service-examples-09 (work in progress),
1749	         July 2005.

1751	   [7]   Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo,
1752	         "Best Current Practices for Third Party Call Control (3pcc) in
1753	         the Session Initiation Protocol (SIP)", BCP 85, RFC 3725,
1754	         April 2004.

1756	   [8]   Sparks, R., "The Session Initiation Protocol (SIP) Refer
1757	         Method", RFC 3515, April 2003.

1759	   [9]   Mahy, R., Biggs, B., and R. Dean, "The Session Initiation
1760	         Protocol (SIP) "Replaces" Header", RFC 3891, September 2004.

1762	   [10]  Mahy, R. and D. Petrie, "The Session Initiation Protocol (SIP)
1763	         "Join" Header", RFC 3911, October 2004.

1765	   [11]  Rosenberg, J., "An INVITE Inititiated Dialog Event Package for
1766	         the Session Initiation  Protocol (SIP)",
1767	         draft-ietf-sipping-dialog-package-06 (work in progress),
1768	         April 2005.

1770	   [12]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
1771	         Package for Conference State",
1772	         draft-ietf-sipping-conference-package-12 (work in progress),
1773	         July 2005.

1775	   [13]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
1776	         Package for Registrations", RFC 3680, March 2004.

1778	   [14]  Rosenberg, J., "A Presence Event Package for the Session
1779	         Initiation Protocol (SIP)", RFC 3856, August 2004.

1781	   [15]  Rosenberg, J., "A Framework for Conferencing with the Session
1782	         Initiation Protocol",
1783	         draft-ietf-sipping-conferencing-framework-05 (work in
1784	         progress), May 2005.

1786	   [16]  Rosenberg, J., "A Framework for Application Interaction in the
1787	         Session Initiation Protocol  (SIP)",
1788	         draft-ietf-sipping-app-interaction-framework-05 (work in
1789	         progress), July 2005.

1791	   [17]  Camarillo, G., "Framework for Transcoding with the Session
1792	         Initiation Protocol (SIP)",
1793	         draft-ietf-sipping-transc-framework-02 (work in progress),
1794	         June 2005.

1796	   [18]  Sparks, R., "Session Initiation Protocol Call Control -
1797	         Transfer", draft-ietf-sipping-cc-transfer-05 (work in
1798	         progress), July 2005.

1800	   [19]  Johnston, A. and O. Levin, "Session Initiation Protocol Call
1801	         Control - Conferencing for User Agents",
1802	         draft-ietf-sipping-cc-conferencing-07 (work in progress),
1803	         June 2005.

1805	   [20]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating
1806	         User Agent Capabilities in the Session Initiation Protocol
1807	         (SIP)", RFC 3840, August 2004.

1809	   [21]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
1810	         Preferences for the Session Initiation Protocol (SIP)",
1811	         RFC 3841, August 2004.

1813	8.2  Informational References

1815	   [22]  Campbell, B. and R. Sparks, "Control of Service Context using
1816	         SIP Request-URI", RFC 3087, April 2001.

1818	   [23]  Mahy, R., "Remote Call Control in SIP using the REFER method
1819	         and the session-oriented  dialog package",
1820	         draft-mahy-sip-remote-cc-01 (work in progress), February 2004.

1822	   [24]  Burger, E., "Basic Network Media Services with SIP",
1823	         draft-burger-sipping-netann-11 (work in progress),
1824	         February 2005.

1826	Authors' Addresses

1828	   Rohan Mahy
1829	   SIP Edge LLC

1831	   Email: rohan@ekabal.com
1832	   Ben Campbell
1833	   Estacado Systems

1835	   Email: ben@nostrum.com

1837	   Robert Sparks
1838	   Estacado Systems

1840	   Email: rjsparks@nostrum.com

1842	   Jonathan Rosenberg
1843	   Cisco Systems

1845	   Email: jdrosen@cisco.com

1847	   Dan Petrie
1848	   SIP EZ

1850	   Email: dpetrie@sipez.com

1852	   Alan Johnston
1853	   MCI

1855	   Email: alan.johnston@mci.com

1857	Intellectual Property Statement

1859	   The IETF takes no position regarding the validity or scope of any
1860	   Intellectual Property Rights or other rights that might be claimed to
1861	   pertain to the implementation or use of the technology described in
1862	   this document or the extent to which any license under such rights
1863	   might or might not be available; nor does it represent that it has
1864	   made any independent effort to identify any such rights.  Information
1865	   on the procedures with respect to rights in RFC documents can be
1866	   found in BCP 78 and BCP 79.

1868	   Copies of IPR disclosures made to the IETF Secretariat and any
1869	   assurances of licenses to be made available, or the result of an
1870	   attempt made to obtain a general license or permission for the use of
1871	   such proprietary rights by implementers or users of this
1872	   specification can be obtained from the IETF on-line IPR repository at
1873	   http://www.ietf.org/ipr.

1875	   The IETF invites any interested party to bring to its attention any
1876	   copyrights, patents or patent applications, or other proprietary
1877	   rights that may cover technology that may be required to implement
1878	   this standard.  Please address the information to the IETF at
1879	   ietf-ipr@ietf.org.

1881	Disclaimer of Validity

1883	   This document and the information contained herein are provided on an
1884	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1885	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1886	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1887	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1888	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1889	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1891	Copyright Statement

1893	   Copyright (C) The Internet Society (2005).  This document is subject
1894	   to the rights, licenses and restrictions contained in BCP 78, and
1895	   except as set forth therein, the authors retain all their rights.

1897	Acknowledgment

1899	   Funding for the RFC Editor function is currently provided by the
1900	   Internet Society.