idnits 2.17.1 draft-ietf-sipping-cc-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 36 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1000 has weird spacing: '...with on sip...' == Line 1012 has weird spacing: '... prompt sip:s...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 7, 2003) is 7714 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '3pcc' on line 172 -- Looks like a reference, but probably isn't: 'JTAPI' on line 240 -- Looks like a reference, but probably isn't: 'CSTA' on line 241 -- Looks like a reference, but probably isn't: 'SDP' on line 479 -- Looks like a reference, but probably isn't: 'VoiceXML' on line 718 -- Looks like a reference, but probably isn't: 'CPL' on line 1471 ** Obsolete normative reference: RFC 3265 (ref. '4') (Obsoleted by RFC 6665) ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566) == Outdated reference: A later version (-15) exists of draft-ietf-sipping-service-examples-04 == Outdated reference: A later version (-06) exists of draft-ietf-sipping-3pcc-03 == Outdated reference: A later version (-05) exists of draft-ietf-sip-replaces-03 == Outdated reference: A later version (-03) exists of draft-ietf-sip-join-01 == Outdated reference: A later version (-06) exists of draft-ietf-sipping-dialog-package-01 == Outdated reference: A later version (-12) exists of draft-ietf-sipping-conference-package-00 -- Possible downref: Normative reference to a draft: ref. '15' == Outdated reference: A later version (-01) exists of draft-rosenberg-sipping-app-interaction-framework-00 -- Possible downref: Normative reference to a draft: ref. '16' -- Possible downref: Normative reference to a draft: ref. '17' -- Possible downref: Normative reference to a draft: ref. '18' == Outdated reference: A later version (-12) exists of draft-ietf-sipping-cc-transfer-01 -- Possible downref: Normative reference to a draft: ref. '20' == Outdated reference: A later version (-10) exists of draft-ietf-sip-callerprefs-08 Summary: 6 errors (**), 0 flaws (~~), 13 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING WG R. Mahy 3 Internet-Draft Cisco Systems 4 Expires: September 5, 2003 B. Campbell 5 R. Sparks 6 J. Rosenberg 7 dynamicsoft 8 D. Petrie 9 Pingtel 10 A. Johnston 11 WorldCom 12 March 7, 2003 14 A Call Control and Multi-party usage framework for the Session 15 Initiation Protocol (SIP) 16 draft-ietf-sipping-cc-framework-02.txt 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that other 25 groups may also distribute working documents as Internet-Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at http:// 33 www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on September 5, 2003. 40 Copyright Notice 42 Copyright (C) The Internet Society (2003). All Rights Reserved. 44 Abstract 46 This document defines a framework and requirements for multi-party 47 usage of SIP. To enable discussion of multi-party features and 48 applications we define an abstract call model for describing the 49 media relationships required by many of these. The model and actions 50 described here are specifically chosen to be independent of the SIP 51 signaling and/or mixing approach chosen to actually setup the media 52 relationships. In addition to its dialog manipulation aspect, this 53 framework includes requirements for communicating related information 54 and events such as conference and session state, and session history. 55 This framework also describes other goals which embody the spirit of 56 SIP applications as used on the Internet. 58 Table of Contents 60 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Motivation and Background . . . . . . . . . . . . . . . . 4 62 3. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.1 "Conversation Space" Model . . . . . . . . . . . . . . . . 6 64 3.2 Comparison with Related Definitions . . . . . . . . . . . 7 65 3.3 Signaling Models . . . . . . . . . . . . . . . . . . . . . 8 66 3.4 Mixing Models . . . . . . . . . . . . . . . . . . . . . . 9 67 3.4.1 Tightly Coupled . . . . . . . . . . . . . . . . . . . . . 10 68 3.4.2 Loosely Coupled . . . . . . . . . . . . . . . . . . . . . 11 69 3.5 Conveying Information and Events . . . . . . . . . . . . . 12 70 3.6 Componentization and Decomposition . . . . . . . . . . . . 13 71 3.6.1 Media Intermediaries . . . . . . . . . . . . . . . . . . . 14 72 3.6.2 Mixer . . . . . . . . . . . . . . . . . . . . . . . . . . 14 73 3.6.3 Transcoder . . . . . . . . . . . . . . . . . . . . . . . . 14 74 3.6.4 Media Relay . . . . . . . . . . . . . . . . . . . . . . . 15 75 3.6.5 Queue Server . . . . . . . . . . . . . . . . . . . . . . . 15 76 3.6.6 Parking Place . . . . . . . . . . . . . . . . . . . . . . 15 77 3.6.7 Announcements and Voice Dialogs . . . . . . . . . . . . . 15 78 3.7 Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 17 79 3.7.1 Naming Users in SIP . . . . . . . . . . . . . . . . . . . 18 80 3.7.2 Naming Services with SIP URIs . . . . . . . . . . . . . . 19 81 3.8 Invoker Independence . . . . . . . . . . . . . . . . . . . 22 82 3.9 Billing issues . . . . . . . . . . . . . . . . . . . . . . 23 83 4. Catalog of call control actions and sample features . . . 23 84 4.1 Early Dialog Actions . . . . . . . . . . . . . . . . . . . 24 85 4.1.1 Remote Answer . . . . . . . . . . . . . . . . . . . . . . 24 86 4.1.2 Remote Forward or Put . . . . . . . . . . . . . . . . . . 24 87 4.1.3 Remote Busy or Error Out . . . . . . . . . . . . . . . . . 24 88 4.2 Single Dialog Actions . . . . . . . . . . . . . . . . . . 25 89 4.2.1 Remote Dial . . . . . . . . . . . . . . . . . . . . . . . 25 90 4.2.2 Remote On and Off Hold . . . . . . . . . . . . . . . . . . 25 91 4.2.3 Remote Hangup . . . . . . . . . . . . . . . . . . . . . . 25 92 4.3 Multi-dialog actions . . . . . . . . . . . . . . . . . . . 25 93 4.3.1 Transfer . . . . . . . . . . . . . . . . . . . . . . . . . 25 94 4.3.2 Take . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 95 4.3.3 Add . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 96 4.3.4 Local Join . . . . . . . . . . . . . . . . . . . . . . . . 27 97 4.3.5 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . 27 98 4.3.6 Split . . . . . . . . . . . . . . . . . . . . . . . . . . 27 99 4.3.7 Near-fork . . . . . . . . . . . . . . . . . . . . . . . . 28 100 4.3.8 Far fork . . . . . . . . . . . . . . . . . . . . . . . . . 28 101 5. Security Considerations . . . . . . . . . . . . . . . . . 28 102 6. Appendix A: Example Features . . . . . . . . . . . . . . . 29 103 6.1 Implementation of these features . . . . . . . . . . . . . 33 104 6.1.1 Call Park . . . . . . . . . . . . . . . . . . . . . . . . 34 105 6.1.2 Call Pickup . . . . . . . . . . . . . . . . . . . . . . . 35 106 6.1.3 Music on Hold . . . . . . . . . . . . . . . . . . . . . . 35 107 6.1.4 Call Monitoring . . . . . . . . . . . . . . . . . . . . . 35 108 6.1.5 Barge-in . . . . . . . . . . . . . . . . . . . . . . . . . 36 109 6.1.6 Intercom . . . . . . . . . . . . . . . . . . . . . . . . . 36 110 6.1.7 Speakerphone paging . . . . . . . . . . . . . . . . . . . 36 111 6.1.8 Distinctive ring . . . . . . . . . . . . . . . . . . . . . 36 112 6.1.9 Voice message screening . . . . . . . . . . . . . . . . . 37 113 6.1.10 Single Line Extension . . . . . . . . . . . . . . . . . . 37 114 6.1.11 Click-to-dial . . . . . . . . . . . . . . . . . . . . . . 37 115 6.1.12 Pre-paid calling . . . . . . . . . . . . . . . . . . . . . 37 116 6.1.13 Voice Portal . . . . . . . . . . . . . . . . . . . . . . . 38 117 Normative References . . . . . . . . . . . . . . . . . . . 38 118 Informational References . . . . . . . . . . . . . . . . . 40 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 40 120 Intellectual Property and Copyright Statements . . . . . . 42 122 1. Conventions 124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 126 document are to be interpreted as described in RFC-2119 [2]. 128 2. Motivation and Background 130 The Session Initiation Protocol [1] (SIP) was defined for the 131 initiation, maintenance, and termination of sessions or calls between 132 one or more users. However, despite its origins as a large-scale 133 multiparty conferencing protocol, SIP is used today primarily for 134 point to point calls. This two-party configuration is the focus of 135 the SIP specification and most of its extensions. 137 This document defines a framework and requirements for multi-party 138 usage of SIP. Most multi-party operations manipulate SIP session 139 dialogs (also known as call legs) or SIP conference media policy to 140 cause participants in a conversation to perceive specific media 141 relationships. In other protocols that deal with the concept of 142 calls, this manipulation is known as call control. In addition to 143 its dialog or policy manipulation aspect, "call control" also 144 includes communicating information and events related to manipulating 145 calls, including information and events dealing with session state 146 and history, conference state, user state, and even message state. 148 Based on input from the SIP community, the authors compiled the 149 following set of goals for SIP call control and multiparty 150 applications: 152 o Define Primitives, Not Services. Allow for a handful of robust 153 yet simple mechanisms which can be combined to deliver features 154 and services. Throughout this document we refer to these simple 155 mechanisms as "primitives". Primitives should be sufficiently 156 robust that when they are combined they can be used to build lots 157 of services. However, the goal is not to define a provably 158 complete set of primitives. Note that while the IETF will NOT 159 standardize behavior or services, it may define example services 160 for informational purposes, as in service examples [6]. 162 o Participant oriented. The primitives should be designed to 163 provide services which are oriented around the experience of the 164 participants. The authors observe that end users of features and 165 services usually don't care how a media relationship is setup. 166 Their ultimate experience is based only on the resulting media and 167 other externally visible characteristics. 169 o Signaling Model independent: Support both a central control and a 170 peer-to-peer feature invocation model (and combinations of the 171 two). Baseline SIP already supports a centralized control model 172 described in [3pcc], and the SIP community has expressed a great 173 deal of interest in peer-to-peer or distributed call control using 174 primitives such as those defined in REFER [8], Replaces [9], and 175 Join [10]. 177 o Mixing Model independent: The bulk of interesting multiparty 178 applications involve mixing or combining media from multiple 179 participants. This mixing can be performed by one or more of the 180 participants, or by a centralized mixing resource. The experience 181 of the participants should not depend on the mixing model used. 182 While most examples in this document refer to audio mixing, the 183 framework applies to any media type. In this context a "mixer" 184 refers to combining media in an appropriate, media-specific way. 185 This is consistent with model described in the SIP conferencing 186 framework. 188 o Invoker oriented. Only the user who invokes a feature or a service 189 needs to know exactly which service is invoked or why. This is 190 good because it allows new services to be created without 191 requiring new primitives from all the participants; and it allows 192 for much simpler feature authorization policies, for example, when 193 participation spans organizational boundaries. As discussed in 194 section 3.8, this also avoids exponential state explosion when 195 combining features. The invoker only has to manage a user 196 interface or API to prevent local feature interactions. All the 197 other participants simply need to manage the feature interactions 198 of a much smaller number of primitives. 200 o Primitives make full use of URIs. URIs are a very powerful 201 mechanism for describing users and services. They represent a 202 plentiful resource which can be extremely expressive and easily 203 routed, translated, and manipulated--even across organizational 204 boundaries. URIs can contain special parameters and informational 205 headers which need only be relevant to the owner of the namespace 206 (domain) of the URI. Just as a user who selects an http: URL need 207 not understand the significance and organization of the web site 208 it references, a user may encounter a SIP URL which translates 209 into an email-style group alias, which plays a pre-recorded 210 message, or runs some complex call-handling logic. Note that 211 while this may seem paradoxical to the previous goal, both goals 212 can be satisfied by the same model. 214 o Make use of SIP headers and SIP event packages to provide SIP 215 entities with information about their environment. These should 216 include information about the status / handling of dialogs on 217 other user agents, information about the history of other contacts 218 attempted prior to the current contact, the status of 219 participants, the status of conferences, user presence 220 information, and the status of messages. 222 o Encourage service decomposition, and design to make use of 223 standard components using well-defined, simple interfaces. Sample 224 components include a SIP mixer, recording service, announcement 225 server, and voice dialog server. (This is not an exhaustive 226 list). 228 o Include authentication, authorization, policy, logging, and 229 accounting mechanisms to allow these primitives to be used safely 230 among mutually untrusted participants. Some of these mechanisms 231 may be used to assist in billing, but no specific billing system 232 will be endorsed. 234 o Permit graceful fallback to baseline SIP. Definitions for new SIP 235 call control extensions/primitives MUST describe a graceful way to 236 fallback to baseline SIP behavior. Support for one primitive MUST 237 NOT imply support for another primitive. 239 o There is no desire or goal to reinvent traditional models, such as 240 the model used the [H.450] family of protocols, [JTAPI], or the 241 [CSTA] call model, as these other models do not share the design 242 goals presented in this document. 244 3. Key Concepts 246 3.1 "Conversation Space" Model 248 This document introduces the concept of an abstract "conversation 249 space" (essentially as a set of participants who believe they are all 250 communicating among one another). Each conversation space contains 251 one or more participants. 253 Participants are SIP User Agents which send original media to or 254 terminate and receive media from other members of the conversation 255 space. Logically, every participant in the conversation space has 256 access to all the media generated in that space (this is strictly 257 true if all participants share a common media type). A SIP User 258 Agent which does not contribute or consume any media is NOT a 259 participant; nor is a user agent which merely forwards, transcodes, 260 mixes, or selects media originating elsewhere in the conversation 261 space. [Note that a conversation space consists of zero or more SIP 262 calls or SIP conferences. A conversation space is similar to the 263 definition of a "call" in some other call models.] 264 Participants may represent human users or non-human users (referred 265 to as robots or automatons in this document). Some participants may 266 be hidden within a conversation space. Some examples of hidden 267 participants include: robots which generate tones, images, or 268 announcements during a conference to announce users arriving and 269 departing, a human call center supervisor monitoring a conversation 270 between a trainee and a customer, and robots which record media for 271 training or archival purposes. 273 Participants may also be active or passive. Active participants are 274 expected to be intelligent enough to leave a conversation space when 275 they no longer desire to participate. (An attentive human 276 participant is obviously active.) Some robotic participants (such as 277 a voice messaging system, an instant messaging agent, or a voice 278 dialog system) may be active participants if they can leave the 279 conversation space when there is no human interaction. Other robots 280 (for example our tone generating robot from the previous example) are 281 passive participants. A human participant "on-hold" is passive. 283 An example diagram of a conversation space can be shown as a "bubble" 284 or ovals, or as a "set" in curly or square brace notation. Each set, 285 oval, or "bubble" represents a conversation space. Hidden 286 participants are shown in lowercase letters. 288 { A , B } [ A , B ] 290 .-. .---. 291 / \ / \ 292 / A \ / A b \ 293 ( ) ( ) 294 \ B / \ C D / 295 \ / \ / 296 '-' '---' 298 3.2 Comparison with Related Definitions 300 In SIP, a call is "an informal term that refers to some communication 301 between peers, generally set up for the purposes of a multimedia 302 conversation." Obviously we cannot discuss normative behavior based 303 on such an intentionally vague definition. The concept of a 304 conversation space is needed because the SIP definition of call is 305 not sufficiently precise for the purpose of describing the user 306 experience of multiparty features. 308 Do any other definitions convey the correct meaning? SIP, and SDP 309 [5] both define a conference as "a multimedia session identified by a 310 common session description." A session is defined as "a set of 311 multimedia senders and receivers and the data streams flowing from 312 senders to receivers." Both of these definitions are heavily 313 oriented toward multicast sessions with little differenciation among 314 participants. As such, neither is particularly useful for our 315 purposes. In fact, the definition of "call" in some call models is 316 more similar to our definition of a conversation space. 318 Some examples of the relationship between conversation spaces, SIP 319 call legs, and SIP sessions are listed below. In each example, a 320 human user will perceive that there is a single call. 322 o A simple two-party call is a single conversation space, a single 323 session, and a single call-leg. 325 o A locally mixed three-way call is two sessions and two call-legs. 326 It is also a single conversation space. 328 o A simple dial-in audio conference is a single conversation space, 329 but is represented by as many call-legs and sessions as there are 330 human participants. 332 o A multicast conference is a single conversation space, a single 333 session, and as many call-legs as participants. 335 3.3 Signaling Models 337 Obviously to make changes to a conversation space, you must be able 338 to use SIP signaling to cause these changes. Specifically there must 339 be a way to manipulate SIP dialogs (call legs) to move participants 340 into and out of conversation spaces. Although this is not as 341 obvious, there also must be a way to manipulate SIP dialogs to 342 include non-participant user agents which are otherwise involved in a 343 conversation space (ex: B2BUAs, 3pcc controllers, mixers, 344 transcoders, translators, or relays). 346 Implementations may setup the media relationships described in the 347 conversation space model using the approach described in 3pcc [7]. 348 The 3pcc approach relies on only the following 3 primitive 349 operations: 351 o Create a new call-leg (INVITE) 353 o Modify a call-leg (reINVITE) 355 o Destroy a call-leg (BYE) 357 The main advantage of the 3pcc approach is that it only requires very 358 basic SIP support from end systems to support call control features. 359 As such, third-party call control is a natural way to handle protocol 360 conversion and mid-call features. It also has the advantage and 361 disadvantage that new features can/must be implemented in one place 362 only (the controller), and neither requires enhanced client 363 functionality, nor takes advantage of it. 365 In addition, a peer-to-peer approach is discussed at length in this 366 draft. The primary drawback of the peer-to-peer model is additional 367 end system complexity. The benefits of the peer-to-peer model 368 include: 370 o state remains at the edges 372 o call signaling need only go through participants involved (there 373 are no additional points of failure) 375 o peers can take advantage of end-to-end message integrity or 376 encryption 378 o setup time is shorter (fewer messages and round trips are 379 required) 381 The peer-to-peer approach relies on additional "primitive" 382 operations, some of which are identified here. 384 o Replace an existing dialog 386 o Join a new dialog with an existing dialog 388 o Support SIP conference policy control 390 o Locally perform media forking (multi-unicast) 392 o Ask another UA to send a request on your behalf 394 Many of the features, primitives, and actions described in this 395 document also require some type of media mixing, combining, or 396 selection as described in the next section. 398 3.4 Mixing Models 400 SIP permits a variety of mixing models, which are discussed here 401 briefly. This topic is discussed more thoroughly in the SIP 402 conferencing framework [15] and cc-conferencing [20]. SIP supports 403 both tightly-coupled and loosely-coupled conferencing, although more 404 sophisticated behavior is available in tightly-coupled conferences. 405 In a tightly-coupled conference, a single SIP user agent (called the 406 focus) has a direct dialog relationship with each participant (and 407 may control non participant user agents as well). In a 408 loosely-coupled conference there is no coordinated signaling 409 relationships among the participants. 411 For brevity, only the two most popular conferencing models are 412 significantly discussed in this document (local and centralized 413 mixing). Applications of the conversation spaces model to 414 loosely-coupled multicast and distributed full unicast mesh 415 conferences are left as an exercise for the reader. Note that a 416 distributed full mesh conference can be used for basic conferences, 417 but does not easily allow for more complex conferencing actions like 418 splitting, merging, and sidebars. 420 Call control features should be designed to allow a mixer (local or 421 centralized) to decide when to reduce a conference back to a 2-party 422 call, or drop all the participants (for example if only two 423 automatons are communicating). The actual heuristics used to release 424 calls are beyond the scope of this document, but may depend on 425 properties in the conversation space, such as the number of active, 426 passive, or hidden participants; and the send-only, receive-only, or 427 send-and-receive orientation of various participants. 429 3.4.1 Tightly Coupled 431 3.4.1.1 (Single) End System Mixing 433 The first model we call "end system mixing". In this model, user A 434 calls user B, and they have a conversation. At some point later, A 435 decides to conference in user C. To do this, A calls C, using a 436 completely separate SIP call. This call uses a different Call-ID, 437 different tags, etc. There is no call set up directly between B and 438 C. No SIP extension or external signaling is needed. A merely 439 decides to locally join two call-legs. 441 B C 442 \ / 443 \ / 444 A 446 A receives media streams from both B and C, and mixes them. A sends a 447 stream containing A's and C's streams to B, and a stream containing 448 A's and B's streams to C. Basically, user A handles both signaling 449 and media mixing. 451 3.4.1.2 Centralized Mixing 453 In a centralized mixing model, all participants have a pairwise SIP 454 and media relationship with the mixer. Common applications of 455 centralized mixing include ad-hoc conferences and scheduled dial-in 456 or dial-out conferences. [need diagram] 458 3.4.1.3 Centralized Signaling, Distributed Media 460 In this conferencing model, there is a centralized controller, as in 461 the dial-in and dial-out cases. However, the centralized server 462 handles signaling only. The media is still sent directly between 463 participants, using either multicast or multi-unicast. Multi-unicast 464 is when a user sends multiple packets (one for each recipient, 465 addressed to that recipient). This is referred to as a "Decentralized 466 Multipoint Conference" in [H.323]. 468 3.4.2 Loosely Coupled 470 In these models, there is no point of central control of SIP 471 signaling. As in the "Centralized Signaling, Distributed Media" case 472 above, all endpoints send media to all other endpoints. Consequently 473 every endpoint mixes their own media from all the other sources, and 474 sends their own media to every other participant. [add diagrams] 476 3.4.2.1 Large-Scale Multicast Conferences 478 Large-scale multicast conferences were the original motivation for 479 both the Session Description Protocol [SDP] and SIP. In a large- 480 scale multicast conference, one or more multicast addresses are 481 allocated to the conference. Each participant joins that multicast 482 groups, and sends their media to those groups. Signaling is not sent 483 to the multicast groups. The sole purpose of the signaling is to 484 inform participants of which multicast groups to join. Large-scale 485 multicast conferences are usually pre-arranged, with specific start 486 and stop times. However, multicast conferences do not need to be 487 pre-arranged, so long as a mechanism exists to dynamically obtain a 488 multicast address. 490 3.4.2.2 Full Distributed Unicast Conferencing 492 In this conferencing model, each participant has both a pairwise 493 media relationship and a pairwise SIP relationship with every other 494 participant (a full mesh). This model requires a mechanism to 495 maintain a consistent view of distributed state across the group. 496 This is a classic hard problem in computer science. Also, this model 497 does not scale well for large numbers of participants. because for 498 participants the number of media and SIP relationships is 499 approximately n-squared. As a result, this model is not generally 500 available in commercial implementations; to the contrary it is 501 primarily the topic of research or experimental implementations. 503 Note that this model assumes peer-to-peer signaling. 505 3.5 Conveying Information and Events 507 Participants should have access to information about the other 508 participants in a conversation space, so that this information can be 509 rendered to a human user or processed by an automaton. Although some 510 of this information may be available from the Request-URI or To, 511 From, Contact, or other SIP headers, another mechanism of reporting 512 this information is necessary. 514 Many applications are driven by knowledge about the progress of calls 515 and conferences. In general these types of events allow for the 516 construction of distributed applications, where the application 517 requires information on session dialog and conference state, but is 518 not necessarily co-resident with an endpoint user agent or conference 519 server. For example, a focus involved in a conversation space may 520 wish to provide URLs for conference status, and/or conference/floor 521 control. 523 The SIP Events [4] architecture defines general mechanisms for 524 subscription to and notification of events within SIP networks. It 525 introduces the notion of a package which is a specific 526 "instantiation" of the events mechanism for a well-defined set of 527 events. 529 Event packages are needed to provide the status of a user's session 530 dialogs, provide the status of conferences and its participants, 531 provide user presence information, provide the status of 532 registrations, and provide the status of user's messages. While this 533 is not an exhaustive list, these are sufficient to enable the sample 534 features described in this document. 536 The conference event package [12] allows users to subscribe to 537 information about an entire tightly-coupled SIP conference. 538 Notifications convey information about the pariticipants such as: the 539 SIP URL identifying each user, their status in the space (active, 540 declined, departed), URLs to invoke other features (such as sidebar 541 conversations), links to other relevant information (such as floor 542 control policies), and if floor control policies are in place, the 543 user's floor control status. For conversation spaces created from 544 cascaded conferences, converstation state can be gathered from 545 relevant foci and merged into a cohesive set of state. 547 The session dialog package [11] provides information about all the 548 dialogs the target user is maintaining, what conversations the user 549 in participating in, and how these are correlated. Likewise the 550 registration package [13] provides notifications when contacts have 551 changed for a specific address-of-record. The combination of these 552 allows a user agent to learn about all conversations occurring for 553 the entire registered contact set for an address-of-record. 555 Note that user presence in SIP [14] has a close relationship with 556 these later two event packages. It is fundamental to the presence 557 model that the information used to obtain user presence is 558 constructed from any number of different input sources. Examples of 559 other such sources include calendaring information and uploads of 560 presence documents. These two packages can be considered another 561 mechanism that allows a presence agent to determine the presence 562 state of the user. Specifically, a user presence server can act as a 563 subscriber for the session dialog and registration packages to obtain 564 additional information that can be used to construct a presence 565 document. 567 The multi-party architecture may also need to provide a mechanism to 568 get information about the status /handling of a dialog (for example, 569 information about the history of other contacts attempted prior to 570 the current contact). Finally, the architecture should provide ample 571 opportunities to present informational URIs which relate to calls, 572 conversations, or dialogs in some way. For example, consider the SIP 573 Call-Info header, or Contact headers returned in a 300-class 574 response. Frequently additional information about a call or dialog 575 can be fetched via non-SIP URIs. For example, consider a web page 576 for package tracking when calling a delivery company, or a web page 577 with related documentation when joining a dial-in conference. The 578 use of URIs in the multiparty framework is discussed in more detail 579 in Section 3.7. 581 Finally the interaction of SIP with stimulus-signaling-based 582 applications, which allow a user agent to interact with an 583 application without knowledge of the semantics of that application, 584 is discussed in the SIP application interaction framework [16]. 585 Stimulus signaling can occur to a user interface running locally with 586 the client, or to a remote user interface, through media streams. 587 Stimulus signaling encompasses a wide range of mechanisms, ranging 588 from clicking on hyperlinks, to pressing buttons, to traditional Dual 589 Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling 590 is supported through the use of markup languages, which play a key 591 role in that framework. 593 3.6 Componentization and Decomposition 595 This framework proposes a decomposed component architecture with a 596 very loose coupling of services and components. This means that a 597 service (such as a conferencing server or an auto-attendant) need not 598 be implemented as an actual server. Rather, these services can be 599 built by combining a few basic components in straightforward or 600 arbitrarily complex ways. 602 Since the components are easily deployed on separate boxes, by 603 separate vendors, or even with separate providers, we achieve a 604 separation of function that allows each piece to be developed in 605 complete isolation. We can also reuse existing components for new 606 applications. This allows rapid service creation, and the ability 607 for services to be distributed across organizational domains anywhere 608 in the Internet. 610 For many of these components it is also desirable to discover their 611 capabilities, for example querying the ability of a mixer to host a 612 10 dialog conference, or to reserve resources for a specific time. 613 These actions could be provided in the form of URLs, provided there 614 is an a priori means of understanding their semantics. For example 615 if there is a published dictionary of operations, a way to query the 616 service for the available operations and the associated URLs, the URL 617 can be the interface for providing these service operations. This 618 concept is described in more detail in the context of dialog 619 operations in section 621 3.6.1 Media Intermediaries 623 Media Intermediaries are not participants in any conversation space, 624 although an entity which is also a media translator may also have a 625 colocated participant component (for example a mixer which also 626 announces the arrival of a new participant; the announcement portion 627 is a participant, but the mixer itself is not). Media intermediaries 628 should be as transparent as possible to the end users--offering a 629 useful, fundamental service; without getting in the way of new 630 features implemented by participants. Some common media 631 intermediaries are desribed below. 633 3.6.2 Mixer 635 A SIP mixer is a component that combines media from all dialogs in 636 the same conversation in a media specific way. For example, the 637 default combining for an audio conference might be an N-1 638 configuration, while a text mixer might interleave text messages on a 639 per-line basis. More details about the media policy used by mixers 640 is described in media policy manipulation in the conference policy 641 control protocol [17]. 643 3.6.3 Transcoder 645 A transcoder translates media from one encoding or format to another 646 (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to 647 text/plain), or from one media type to another (for example text to 648 speech). A more thorough discussion of transcoding is described in 649 SIP transcoding services invocation [18]. 651 3.6.4 Media Relay 653 A media relay terminates media and simply forwards it to a new 654 destination without changing the content in any way. Sometimes media 655 relays are used to provide source IP address anonymity, to facilitate 656 middlebox traversal, or to provide a trusted entity where media can 657 be forcefully disconnected. 659 3.6.5 Queue Server 661 A queue server is a location where calls can be entered into one of 662 several FIFO (first-in, first-out) queues. A queue server would 663 subscribe to the presence of groups or individuals who are interested 664 in its queues. When detecting that a user is available to service a 665 queue, the server redirects or transfers the last call in the 666 relevant queue to the available user. On a queue-by-queue basis, 667 authorized users could also subscribe to the call state (dialog 668 information) of calls within a queue. Authorized users could use 669 this information to effectively pluck (take) a call out of the queue 670 (for example by sending an INVITE with a Replaces header to one of 671 the user agents in the queue). 673 3.6.6 Parking Place 675 A parking place is a location where calls can be terminated 676 temporarily and then retrieved later. While a call is "parked", it 677 can receive media "on-hold" such as music, announcements, or 678 advertisements. Such a service could be further decomposed such that 679 announcements or music are handled by a separate component. 681 3.6.7 Announcements and Voice Dialogs 683 An announcement server is a server which can play digitized media 684 (frequently audio), such as music or recorded speech. These servers 685 are typically accessible via SIP, HTTP, or RTSP. An analogous 686 service is a recording service which stores digitized media. A 687 convention for specifying announcements in SIP URIs is described in 688 [netann]. Likewise the same server could easily provide a service 689 which records digitized media. 691 A "voice dialog" is a model of spoken interactive behavior between a 692 human and an automaton which can include synthesized speech, 693 digitized audio, recognition of spoken and DTMF key input, recording 694 of spoken input, and interaction with call control. Voice dialogs 695 frequently consist of forms or menus. Forms present information and 696 gather input; menus offer choices of what to do next. 698 Spoken dialogs are a basic building block of applications which use 699 voice. Consider for example that a voice mail system, the 700 conference-id and passcode collection system for a conferencing 701 system, and complicated voice portal applications all require a voice 702 dialog component. 704 3.6.7.1 Text-to-Speech and Automatic Speech Recognition 706 Text-to-Speech (TTS) is a service which converts text into digitized 707 audio. TTS is frequently integrated into other applications, but 708 when separated as a component, it provides greater opportunity for 709 broad reuse. Automatic Speech Recognition (ASR) is a service which 710 attempts to decipher digitized speech based on a proposed grammar. 711 Like TTS, ASR services can be embedded, or exposed so that many 712 applications can take advantage of such services. A standardized 713 (decomposed) interface to access standalone TTS and ASR services is 714 currently being developed in the SPEECHSC Workin Group. 716 3.6.7.2 VoiceXML 718 [VoiceXML] is a W3C recommendation that was designed to give authors 719 control over the spoken dialog between users and applications. The 720 application and user take turns speaking: the application prompts the 721 user, and the user in turn responds. Its major goal is to bring the 722 advantages of web-based development and content delivery to 723 interactive voice response applications. We believe that VoiceXML 724 represents the ideal partner for SIP in the development of 725 distributed IVR servers. VoiceXML is an XML based scripting language 726 for describing IVR services at an abstract level. VoiceXML supports 727 DTMF recognition, speech recognition, text-to-speech, and playing out 728 of recorded media files. The results of the data collected from the 729 user are passed to a controlling entity through an HTTP POST 730 operation. The controller can then return another script, or 731 terminate the interaction with the IVR server. 733 A VoiceXML server also need not be implemented as a monolithic 734 server. Below is a diagram of a VoiceXML browser which is split into 735 media and non-media handling parts. The VoiceXML interpreter handles 736 SIP dialog state and state within a VoiceXML document, and sends 737 requests to the media component over another protocol. 739 +-------------+ 740 | | 741 | VoiceXML | 742 | Interpreter | 743 | (signaling) | 744 +-------------+ 745 ^ ^ 746 | | 747 SIP | | RTSP 748 | | 749 | | 750 v v 751 +-------------+ +-------------+ 752 | | | | 753 | SIP UA | RTP | RTSP Server | 754 | |<------>| (media) | 755 | | | | 756 +-------------+ +-------------+ 758 Figure : Decomposed VoiceXML Server 760 3.7 Use of URIs 762 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 763 contexts: the Request-URI; Contact, To, From, and *-Info headers; 764 application/uri bodies; and embedded in email, web pages, instant 765 messages, and ENUM records. The request-URI identifies the user or 766 service that the call is destined for. 768 SIP URIs embedded in informational SIP headers, SIP bodies, and 769 non-SIP content can also specify methods, special parameters, 770 headers, and even bodies. For example: 772 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 773 &To=;tag=879738 774 &From=;tag=023214 776 sip:bob@babylon.biloxi.com;method=REFER? 777 Refer-To= 779 Throughout this draft we discuss call control primitive operations. 780 One of the biggest problems is defining how these operations may be 781 invoked. There are a number of ways to do this. One way is to 782 define the primitives in the protocol itself such that SIP methods 783 (for example REFER) or SIP headers (for example Replaces) indicate a 784 specific call control action. Another way to invoke call control 785 primitives is to define a specific Request-URI naming convention. 786 Either these conventions must be shared between the client (the 787 invoker) and the server, or published by or on behlf of the server. 788 The former involves defining URL construction techniques (e.g. URL 789 parameters and/or token conventions) as proposed in [netannc]. The 790 latter technique usually involves discovering the URI via a SIP event 791 package, a web page, a business card, or an Instant Message. Yet 792 another means to acquire the URLs is to define a dictionary of 793 primitives with well-defined semantics and provide a means to query 794 the named primitives and corresponding URLs that may be invoked on 795 the service or dialogs. 797 3.7.1 Naming Users in SIP 799 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 800 that points to a domain with a location server that can map the URI 801 to set of Contact URIs where the user might be available. Typically 802 the Contact URIs are populated via registration. 804 Address of Record Contacts 806 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 807 sip:bbrown@mailbox.provider.net 808 sip:+1.408.555.6789@mobile.net 810 Caller Preferences and Callee Capabilities [21] defines a set of 811 additional parameters to the Contact header that define the 812 characteristics of the user agent at the specified URI. For example, 813 there is a mobility parameter which indicates whether the UA is fixed 814 or mobile. When a user agent registers, it places these parameters 815 in the Contact headers to characterize the URIs it is registering. 816 This allows a proxy for that domain to have information about the 817 contact addresses for that user. 819 When a caller sends a request, it can optionally include the 820 Accept-Contact and Reject-Contact headers which request certain 821 handling by the proxy in the target domain. These headers contain 822 preferences that describe the set of desired URIs to which the caller 823 would like their request routed. The proxy in the target domain 824 matches these preferences with the Contact characteristics originally 825 registered by the target user. The target user can also choose to 826 run arbitrarily complex "Find-me" feature logic on a proxy in the 827 target domain. 829 There is a strong asymmetry in how preferences for callers and 830 callees can be presented to the network. While a caller takes an 831 active role by initiating the request, the callee takes a passive 832 role in waiting for requests. This motivates the use of 833 callee-supplied scripts and caller preferences included in the call 834 request. This asymmetry is also reflected in the appropriate 835 relationship between caller and callee preferences. A server for a 836 callee should respect the wishes of the caller to avoid certain 837 locations, while the preferences among locations has to be the 838 callee's choice, as it determines where, for example, the phone rings 839 and whether the callee incurs mobile telephone charges for incoming 840 calls. 842 SIP User Agent implementations are encouraged to make intelligent 843 decisions based on the type of participants (active/passive, hidden, 844 human/robot) in a conversation space. This information is conveyed 845 via the session dialog package or in a SIP header parameter 846 communicated using an appropriate SIP header. For example, a music 847 on hold service may take the sensible approach that if there are two 848 or more unhidden participants, it should not provide hold music; or 849 that it will not send hold music to robots. 851 Multiple participants in the same conversation space may represent 852 the same human user. For example, the user may use one participant 853 for video, chat, and whiteboard media on a PC and another for audio 854 media on a SIP phone. In this case, the address-of-record is the 855 same for both user agents, but the Contacts are different. In 856 addition, human users may add robot participants which act on their 857 behalf (for example a call recording service, or a calendar 858 reminder). Call Control features in SIP should continue to function 859 as expected in such an environment. 861 3.7.2 Naming Services with SIP URIs 863 [Editor's Note: this section needs to be pared down considerably, and 864 the examples replaced with example.{com|org|net} domain names.] A 865 critical piece of defining a session level service that can be 866 accessed by SIP is defining the naming of the resources within that 867 service. This point cannot be overstated. 869 In the context of SIP control of application components, we take 870 advantage of the fact that the standard SIP URI has a user part. 871 Most services may be thought of as user automatons that participate 872 in SIP sessions. It naturally follows that the user address, or the 873 left-hand-side of the URI, should be utilized as a service indicator. 875 For example, media servers commonly offer multiple services at a 876 single host address. Use of the user part as a service indicator 877 enables service consumers to direct their requests without ambiguity. 878 It has the added benefit of enabling media services to register their 879 availability with SIP Registrars just as any "real" SIP user would. 880 This maintains consistency and provides enhanced flexibility in the 881 deployment of media services in the network. 883 There has been much discussion about the potential for confusion if 884 media services URIs are not readily distinguishable from other types 885 of SIP UA's. The use of a service namespace provides a mechanism to 886 unambiguously identify standard interfaces while not constraining 887 the development of private or experimental services. 889 In SIP, the request-URI identifies the user or service that the call 890 is destined for. The great advantage of using URIs (specifically, 891 the SIP request URI) as a service identifier comes because of the 892 combination of two facts. First, unlike in the PSTN, where the 893 namespace (dialable telephone numbers) are limited, URIs come from an 894 infinite space. They are plentiful, and they are free. Secondly, the 895 primary function of SIP is call routing through manipulations of the 896 request URI. In the traditional SIP application, this URI represents 897 people. However, the URI can also represent services, as we propose 898 here. This means we can apply the routing services SIP provides to 899 routing of calls to services. The result - the problem of service 900 invocation and service location becomes a routing problem, for which 901 SIP provides a scalable and flexible solution. Since there is such a 902 vast namespace of services, we can explicitly name each service in a 903 finely granular way. This allows the distribution of services across 904 the network. 906 Consider a conferencing service, where we have separated the names of 907 ad-hoc conferences from scheduled conferences, we can program proxies 908 to route calls for ad-hoc conferences to one set of servers, and 909 calls for scheduled ones to another, possibly even in a different 910 provider. In fact, since each conference itself is given a URI, we 911 can distribute conferences across servers, and easily guarantee that 912 calls for the same conference always get routed to the same server. 913 This is in stark contrast to conferences in the telephone network, 914 where the equivalent of the URI - the phone number - is scarce. An 915 entire conferencing provider generally has one or two numbers. 916 Conference IDs must be obtained through IVR interactions with the 917 caller, or through a human attendant. This makes it difficult to 918 distribute conferences across servers all over the network, since the 919 PSTN routing only knows about the dialed number. 921 In the case of a dialog server, the voice dialog itself is the target 922 for the call. As such, the request URI should contain the identifier 923 for this spoken dialog. This is consistent with the Request-URI 924 service invocation model of RFC 3087. This URL can be in one of two 925 formats. In the first, the VoiceXML script is identified directly by 926 an HTTP URL. In the second, the script is not specified. Rather, the 927 dialog server uses its configuration to map the incoming request to a 928 specific script. 930 Since the request URI could indicate a request for a variety of 931 different services, of which a dialog server is only one type, this 932 example request URI first begins with a service identifier, that 933 indicates the basic service required. For VoiceXML scripts, this 934 identification information is a URL-encoded version of the URL which 935 references the script to execute, or if not present, the dialog 936 server uses server-specific configuration to determine which script 937 to execute. 939 Examples of URLs that invoke VoiceXML dialogs are: (line folding for 940 clarity only) 942 sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml 943 @vxmlservers.com 945 sip:dialog.vxml@vxmlservers.com 947 The first of these indicates that the dialog server (located at 948 vxmlservers.com) should invoke a VoiceXML script fetched from 949 http://dialogs.server.com/script32.vxml. Since the user part of the 950 SIP URL cannot contain the : character, this must be escaped to %3a. 952 These types of conventions are not limited to application component 953 servers. An ordinary SIP User Agent can have a special URIs as well, 954 for example, one which is automatically answered by a speakerphone. 955 Since URIs are so plentiful, using a separate URI for this service 956 does not exhaust a valuable resource. The requested service is clear 957 to the user agent receiving the request. This URI can also be 958 included as part of another feature (for example, the Intercom 959 feature described in Section 6.1.6). This feature can be specified 960 with a SIP user parameter, since are part of the userpart of a SIP 961 URI. 963 Likewise a Request URI can fully describe an announcement service 964 through the use of the user part of the address and additional URI 965 parameters. In our example, the user portion of the address, "annc", 966 specifies the announcement service on the media server. The two URI 967 parameters "play=" and "early=" specify the audio resource to play 968 and whether early media is desired. 970 sip:annc@ms2.carrier.net; 971 play=http://audio.carrier.net/allcircuitsbusy.au;early=yes 973 sip:annc@ms2.carrier.net; 974 play=file://fileserver.carrier.net/geminii/yourHoroscope.wav 976 In practical applications, it is important that an invoker does not 977 necessarily apply semantic rules to various URIs it did not create. 978 Instead, it should allow any arbitrary string to be provisioned, and 979 map the string to the desired behavior. The administrator of a 980 service may choose to provision specific conventions or mnemonic 981 strings, but the application should not require it. In any large 982 installation, the system owner is likely to have pre-existing rules 983 for mnemonic URIs, and any attempt by an application to define its 984 own rules may create a conflict. Implementations should allow an 985 arbitrary mix of URLs from these schemes, or any other scheme that 986 renders valid SIP URIs to be provisioned, rather than enforce only 987 one particular scheme. 989 For example, a voicemail application can be built using very 990 different sets of URI conventions, as illustrated below: 992 URI Identity Example Scheme 1 993 Example Scheme 2 994 Example Scheme 3 996 Deposit with sip:sub-rjs-deposit@vm.wcom.com 997 standard greeting sip:677283@vm.wcom.com 998 sip:rjs@vm.wcom.com;mode=deposit 1000 Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com 1001 phone greeting sip:677372@vm.wcom.com 1002 sip:rjs@vm.wcom.com;mode=3991243 1004 Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com 1005 special greeting sip:677384@vm.wcom.com 1006 sip:rjs@vm.wcom.com;mode=sg 1008 Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com 1009 authentication sip:677405@vm.wcom.com 1010 sip:rjs@vm.wcom.com;mode=retrieve 1012 Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com 1013 for PIN in-band sip:677415@vm.wcom.com 1014 sip:rjs@vm.wcom.com;mode=inpin 1016 As we have shown, SIP URIs represent an ideal, flexbile mechanism for 1017 describing and naming service resources, be they queues, conferences, 1018 voice dialogs, announcements, voicemail treatments, or phone 1019 features. 1021 3.8 Invoker Independence 1023 With functional signaling, only the invoker of features in SIP need 1024 to know exactly which feature they are invoking. One of the primary 1025 benefits of this approach is that combinations of functional features 1026 work in SIP call control without requiring complex feature 1027 interaction matrices. For example, let us examine the combination of 1028 a "transfer" of a call which is "conferenced". 1030 Alice calls Bob. Alice silently "conferences in" her robotic 1031 assistant Albert as a hidden party. Bob transfers Alice to Carol. 1032 If Bob asks Alice to Replace her leg with a new one to Carol then 1033 both Alice and Albert should be communicating with Carol 1034 (transparently). 1036 Using the peer-to-peer model, this combination of features works fine 1037 if A is doing local mixing (Alice replaces Bob's call-leg with 1038 Carol's), or if A is using a central mixer (the mixer replaces Bob's 1039 call leg with Carol's). A clever implementation using the 3pcc model 1040 can generate similar results. 1042 New extensions to the SIP Call Control Framework should attempt to 1043 preserve this property. 1045 3.9 Billing issues 1047 Billing in the PSTN is typically based on who initiated a call. At 1048 the moment billing in a SIP network is neither consistent with 1049 itself, nor with the PSTN. (A billing model for SIP should allow for 1050 both PSTN-style billing, and non-PSTN billing.) The example below 1051 demonstrates one such inconsistency. 1053 Alice places a call to Bob. Alice then blind transfers Bob to Carol 1054 through a PSTN gateway. In current usage of REFER, Bob may be billed 1055 for a call he did not initiate (his UA originated the outgoing call 1056 leg however). This is not necessarily a terrible thing, but it 1057 demonstrates a security concern (Bob must have appropriate local 1058 policy to prevent fraud). Also, Alice may wish to pay for Bob's 1059 session with Carol. There should be a way to signal this in SIP. 1061 Likewise a Replacement call may maintain the same billing 1062 relationship as a Replaced call, so if Alice first calls Carol, then 1063 asks Bob to Replace this call, Alice may continue to receive a bill. 1065 Further work in SIP billing should define a way to set or discover 1066 the direction of billing. 1068 4. Catalog of call control actions and sample features 1070 Call control actions can be categorized by the dialogs upon which 1071 they operate. The actions may involve a single or multiple dialogs. 1073 These dialogs can be early or established. Multiple dialogs may be 1074 related in a conversation space to form a conference or other 1075 interesting media topologies. 1077 It should be noted that it is desirable to provide a means by which a 1078 party can discover the actions which may be performed on a dialog. 1079 The interested party may be independent or related to the dialogs. 1080 One means of accomplishing this is through the ability to define and 1081 obtain URLs for these actions as described in section . 1083 Below are listed several call control "actions" which establish or 1084 modify dialogs and relate the participants in a conversation space. 1085 The names of the actions listed are for descriptive purposes only 1086 (they are not normative). This list of actions is not meant to be 1087 exhaustive. 1089 In the examples, all actions are initiated by the user "Alice" 1090 represented by UA "A". 1092 4.1 Early Dialog Actions 1094 The following are a set of actions that may be performed on a single 1095 early dialog. These actions can be thought of as a set of remote 1096 control operations. For example an automaton might perform the 1097 operation on behalf of a user. Alternatively a user might use the 1098 remote control in the form of an application to perform the action on 1099 the early dialog of a UA which may be out of reach. All of these 1100 actions correspond to telling the UA how to respond to a request to 1101 establish an early dialog. These actions provide useful functionality 1102 for PDA, PC and server based applications which desire the ability to 1103 control a UA. 1105 4.1.1 Remote Answer 1107 A dialog is in some early dialog state such as 180 Ringing. It may 1108 be desirable to tell the UA to answer the dialog. That is tell it to 1109 send a 200 Ok response to establish the dialog. 1111 4.1.2 Remote Forward or Put 1113 It may be desirable to tell the UA to respond with a 3xx class 1114 response to forward an early dialog to another UA. 1116 4.1.3 Remote Busy or Error Out 1118 It may be desirable to instruct the UA to send an error response such 1119 as 486 Busy Here. 1121 4.2 Single Dialog Actions 1123 There is another useful set of actions which operate on a single 1124 established dialog. These operations are useful in building 1125 productivity applications for aiding users to control their phone. 1126 For example a CRM application which sets up calls for a user 1127 eliminating the need for the user to actually enter an address. 1128 These operations can also be thought of a remote control actions. 1130 4.2.1 Remote Dial 1132 This action instructs the UA to initiate a dialog. This action can 1133 be performed using the REFER method. 1135 4.2.2 Remote On and Off Hold 1137 This action instructs the UA to put an established dialog on hold. 1138 Though this operation can be conceptually be performed with the REFER 1139 method, there is no semantics defined as to what the referred party 1140 should do with the SDP. There is no way to distinguish between the 1141 desire to go on or off hold. 1143 4.2.3 Remote Hangup 1145 This action instructs the UA to terminate an early or established 1146 dialog. A REFER request with the following Refer-To URI performs this 1147 action. Note: this URL is not properly escaped. 1149 sip:bob@babylon.biloxi.example.com;method=BYE?Call-ID=13413098 1150 &To=;tag=879738 1151 &From=;tag=023214 1153 4.3 Multi-dialog actions 1155 These actions apply to a set of related dialogs. 1157 4.3.1 Transfer 1159 The conversation space changes as follows: 1161 before after 1162 { A , B } --> { C , B } 1164 A replaces itself with C. 1166 To make this happen using the peer-to-peer approach, "A" would send 1167 two SIP requests. A shorthand for those requests is shown below: 1169 REFER B Refer-To:C 1170 BYE B 1172 To make this happen instead using the 3pcc approach, the controller 1173 sends requests represented by the shorthand below: 1175 INVITE C (w/SDP of B) 1176 reINVITE B (w/SDP of C) 1177 BYE A 1179 Features enabled by this action: - blind transfer - transfer to a 1180 central mixer (some type of conference or forking) - transfer to park 1181 server (park) - transfer to music on hold or announcement server - 1182 transfer to a "queue" - transfer to a service (such as Voice Dialogs 1183 service) - transition from local mixer to central mixer 1185 This action is frequently referred to as "completing an attended 1186 transfer". It is described in more detail in cc-transfer [19]. 1188 4.3.2 Take 1190 The conversation space changes as follows: { B , C } --> { B , A } 1191 A forcibly replaces C with itself. In most uses of this primitive, A 1192 is just "un-replacing" itself. Using the peer-to-peer approach, "A" 1193 sends: INVITE B Replaces: 1195 Using the 3pcc approach (all requests sent from controller) INVITE A 1196 (w/SDP of B) reINVITE B (w/SDP of A) BYE C 1198 Features enabled by this action: - transferee completes an attended 1199 transfer - retrieve from central mixer (not recommended) - retrieve 1200 from music on hold or park - retrieve from queue - call center take - 1201 voice portal resuming ownership of a call it originated - 1202 answering-machine style screening (pickup) - pickup of a ringing call 1203 (i.e. early dialog) 1205 Note: that pick up of a ringing call has perhaps some interesting 1206 additional requirements. First of all it is an early dialog as 1207 opposed to an established dialog. Secondly the party which is to 1208 pickup the call may only wish to do so only while it is an early 1209 dialog. That is in the race condition where the ringing UA accepts 1210 just before it receives signaling from the party wishing to take the 1211 call, the taking party wishes to yield or cancel the take. The goal 1212 is to avoid yanking an answered call from the called party. 1214 This action is described in Replaces [9] and in cc-transfer [19]. 1216 4.3.3 Add 1218 Note that the following 4 actions are described in cc-conferencing 1219 [20]. 1221 This is merely adding a participant to a SIP conference. The 1222 conversation space changes as follows: { A , B } --> { A, B, C } A 1223 adds C to the conversation. Using the peer-to-peer approach, adding a 1224 party using local mixing requires no signaling. To transition from a 1225 2-party call or a locally mixed conference to centrally mixing A 1226 could send the following requests: REFER B Refer-To: conference-URI 1227 INVITE conference-URI BYE B To add a party to a conference: REFER C 1228 Refer-To: conference-URI or REFER conference-URI Refer-To: C Using 1229 the 3pcc approach to transition to centrally mixed, the controller 1230 would send: INVITE mixer leg 1 (w/SDP of A) INVITE mixer leg 2 (w/SDP 1231 of B) INVITE C (late SDP) reINVITE A (w/SDP of mixer leg 1) reINVITE 1232 B (w/SDP of mixer leg 2) INVITE mixer leg3 (w/SDP of C) To add a 1233 party to a SIP conference: INVITE C (late SDP) INVITE conference-URI 1234 (w/SDP of C) Features enabled: - standard conference feature - call 1235 recording - answering-machine style screening (screening) 1237 4.3.4 Local Join 1239 The conversation space changes like this: { A, B} , {A, C} --> {A, 1240 B, C} or like this { A, B} , {C, D} --> {A, B, C, D} A takes two 1241 conversation spaces and joins them together into a single space. 1242 Using the peer-to-peer approach, A can mix locally, or REFER the 1243 participants of both conversation spaces to the same central mixer 1244 (as in 5.3) For the 3pcc approach, the call flows for inserting 1245 participants, and joining and splitting conversation spaces are 1246 tedious yet straightforward, so these are left as an exercise for the 1247 reader. Features enabled: - standard conference feature - leaving a 1248 sidebar to rejoin a larger conference 1250 4.3.5 Insert 1252 The conversation space changes like this: { B , C } --> {A, B, C } 1253 A inserts itself into a conversation space. A proposed mechanism for 1254 signaling this using the peer-to-peer approach is to send a new 1255 header in an INVITE with "joining" semantics. For example: INVITE B 1256 Join: If B accepted the INVITE, B would accept 1257 responsibility to setup the call legs and mixing necessary (for 1258 example: to mix locally or to transfer the participants to a central 1259 mixer) Features enabled: - barge-in - call center monitoring - call 1260 recording 1262 4.3.6 Split 1263 { A, B, C, D } --> { A, B } , { C, D } If using a central conference 1264 with peer-to-peer REFER C Refer-To: conference-URI (new URI) REFER D 1265 Refer-To: conference-URI (new URI) BYE C BYE D Features enabled: - 1266 sidebar conversations during a larger conference 1268 4.3.7 Near-fork 1270 A participates in two conversation spaces simultaneously: { A, B } 1271 --> { B , A } & { A , C } A is a participant in two conversation 1272 spaces such that A sends the same media to both spaces, and renders 1273 media from both spaces, presumably by mixing or rendering the media 1274 from both. We can define that A is the "anchor" point for both 1275 forks, each of which is a separate conversation space. This action is 1276 purely local implementation (it requires no special signaling). 1277 Local features such as switching calls between the background and 1278 foreground are possible using this media relationship. 1280 4.3.8 Far fork 1282 The conversation space diagram... { A, B } --> { A , B } & { B , C } 1283 A requests B to be the "anchor" of two conversation spaces. This is 1284 easily setup by creating a conference with two subconferences and 1285 setting the media policy appopriately such that B is a participant in 1286 both. Media forking can also be setup using 3pcc as described in 1287 Section 5.1 of RFC3264 [3] (an offer/answer model for SDP). The 1288 session descriptions for forking are quite complex. Controllers 1289 should verify that endpoints can handle forked-media, for example 1290 using prior configuration. 1292 Features enabled: 1294 o barge-in 1296 o voice portal services 1298 o whisper 1300 o hotword detection 1302 o sending DTMF somewhere else 1304 5. Security Considerations 1306 Call Control primitives provide a powerful set of features that can 1307 be dangerous in the hands of an attacker. To complicate matters, 1308 call control primitives are likely to be automatically authorized 1309 without direct human oversight. 1311 The class of attacks which are possible using these tools include the 1312 ability to eavesdrop on calls, disconnect calls, redirect calls, 1313 render irritating content (including ringing) at a user agent, cause 1314 an action that has billing consequences, subvert billing 1315 (theft-of-service), and obtain private information. Call control 1316 extensions must take extra care to describe how these attacks will be 1317 prevented. 1319 We can also make some general observations about authorization and 1320 trust with respect to call control. The security model is 1321 dramatically dependent on the signaling model chosen (see section 1322 3.2) 1324 Let us first examine the security model used in the 3pcc approach. 1325 All signaling goes through the controller, which is a trusted entity. 1326 Traditional SIP authentication and hop-by-hop encrpytion and message 1327 integrity work fine in this environment, but end-to-end encrpytion 1328 and message integrity may not be possible. 1330 When using the peer-to-peer approach, call control actions and 1331 primitives can be legitimately initiated by a) an existing 1332 participant in the conversation space, b) a former participant in the 1333 conversation space, or c) an entity trusted by one of the 1334 participants. For example, a participant always initiates a 1335 transfer; a retrieve from Park (a take) is initiated on behalf of a 1336 former participant; and a barge-in (insert or far-fork) is initiated 1337 by a trusted entity (an operator for example). 1339 Authenticating requests by an existing participant or a trusted 1340 entity can be done with baseline SIP mechanisms. In the case of 1341 features initiated by a former participant, these should be protected 1342 against replay attacks by using a unique name or identifier per 1343 invocation. The Replaces header exhibits this behavior as a 1344 by-product of its operation (once a Replaces operation is successful, 1345 the call-leg being Replaced no longer exists). For other requests, a 1346 "one-time" Request-URI may be provided to the feature invoker. 1348 To authorize call control primitives that trigger special behavior 1349 (such as an INVITE with Replaces or Join semantics), the receiving 1350 user agent may have trouble finding appropriate credentials with 1351 which to challenge or authorize the request, as the sender may be 1352 completely unknown to the receiver, except through the introduction 1353 of a third party. These credentials need to be passed transitively 1354 in some way or fetched in an event body, for example. 1356 6. Appendix A: Example Features 1358 Primitives are defined in terms of their ability to provide features. 1360 These example features should require an amply robust set of services 1361 to demonstrate a useful set of primitives. They are described here 1362 briefly. Note that the descriptions of these features are 1363 non-normative. Some of these features are used as examples in 1364 section 6 to demonstrate how some features may require certain media 1365 relationships. Note also that this document describes a mixture of 1366 both features originating in the world of telephones, and features 1367 which are clearly Internet oriented. 1369 Example Feature Definitions: 1371 Call Waiting - Alice is in a call, then receives another call. Alice 1372 can place the first call on hold, and talk with the other caller. 1373 She can typically switch back and forth between the callers. 1375 Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob 1376 to contact Carol, but makes no attempt to contact Craol 1377 independently. In many implementations, Alice does not verify Bob's 1378 success or failure in contacting Carol. 1380 Attended Transfer - The transferring party establishes a session with 1381 the transfer target before completing the transfer. 1383 Consultative transfer - the transferring party establishes a session 1384 with the target and mixes both sessions together so that all three 1385 parties can participate, then disconnects leaving the transferee and 1386 transfer target with an active session. 1388 Conference Call - Three or more active, visible participants in the 1389 same conversation space. 1391 Call Park - A call participant parks a call (essentially puts the 1392 call on hold), and then retrieves it at a later time (typically from 1393 another location). 1395 Call Pickup - A party picks up a call that was ringing at another 1396 location. One variation allows the caller to choose which location, 1397 another variation just picks up any call in that user's "pickup 1398 group". 1400 Music on Hold - When Alice places a call with Bob on hold, it 1401 replaces its audio with streaming content such as music, 1402 announcements, or advertisements. 1404 Call Monitoring - A call center supervisor joins an in-progress call 1405 for monitoring purposes. 1407 Barge-in - Carol interrupts Alice who has a call in-progress call 1408 with Bob. In some variations, Alice forcibly joins a new 1409 conversation with Carol, in other variations, all three parties are 1410 placed in the same conversation (basically a 3-way conference). 1412 Hotline - Alice picks up a phone and is immediately connected to the 1413 technical support hotline, for example. 1415 Autoanswer - Calls to a certain address or location answer 1416 immediately via a speakerphone. 1418 Intercom - Alice typically presses a button on a phone which 1419 immediately connects to another user or phone and casues that phone 1420 to play her voice over its speaker. Some variations immediately 1421 setup two-way communications, other variations require another button 1422 to be pressed to enable a two-way conversation. 1424 Speakerphone paging - Alice calls the paging address and speaks. Her 1425 voice is played on the speaker of every idle phone in a preconfigured 1426 group of phones. 1428 Speed dial - Alice dials an abbreviated number, or enters an alias, 1429 or presses a special speed dial button representing Bob. Her action 1430 is interpreted as if she specified the full address of Bob. 1432 Call Return - Alice calls Bob. Bob misses the call or is 1433 disconnected before he is finished talking to Alice. Bob invokes 1434 Call return which calls Alice, even if Alice did not provide her real 1435 identity or location to Bob. 1437 Inbound Call Screening - Alice doesn't want to receive calls from 1438 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1439 some variations this works even if Matt hides his identity. 1441 Outbound Call Screening - Alice is paged and unknowingly calls a PSTN 1442 pay-service telephone number in the Carribean, but local policy 1443 blocks her call, and possibly informs her why. 1445 Call Forwarding - Before a call-leg is accepted it is redirected to 1446 another location, for example, because the originally intended 1447 recipient is busy, does not answer, is disconnected from the network, 1448 configured all requests to go soemwhere else. 1450 Message Waiting - Bob calls Alice when she steps away from her phone, 1451 when she returns a visible or audible indicator conveys that someone 1452 has left her a voicemail message. The message waiting indication may 1453 also convey how many messages are waiting, from whom, what time, and 1454 other useful pieces of information. 1456 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 1457 her either ring briefly or not at all and are forwarded elsewhere. 1458 Some variations allow specially authorized callers to override this 1459 feature and ring Alice anyway. 1461 Distinctive ring - Incoming calls have different ring cadences or 1462 sample sounds depending on the From party, the To party, or other 1463 factors. 1465 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 1466 like Bob to call her automatically when he is available. When Bob 1467 hangs up, alice's phone rings. When Alice answers, Bob's phone rings. 1468 Bob answers and they talk. 1470 Find-Me - Alice sets up complicated rules for how she can be reached 1471 (possibly using [CPL], [presence] or other factors). When Bob calls 1472 Alice, his call is eventually routed to a temporary Contact where 1473 Alice happens to be available. 1475 Whispered call waiting - Alice is in a conversation with Bob. Carol 1476 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1477 get lunch in 15 minutes?"), or an automaton whispers to Alice 1478 informing her that Carol is trying to reach her. 1480 Voice message screening - Bob calls Alice. Alice is screening her 1481 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1482 leave his message. If she decides to talk to Bob, she can take the 1483 call back from the voicemail system, otherwise she can let Bob leave 1484 a message. This emulates the behavior of a home telephone answering 1485 machine 1487 Presence-Enabled Conferencing: Alice wants to set up a conference 1488 call with Bob and Cathy when they all happen to be available (rather 1489 than scheduling a predefined time). The server providing the 1490 application monitors their status, and calls all three when they are 1491 all "online", not idle, and not in another call. 1493 IM Conference Alerts: A user receives an notification as an Instant 1494 Message whenever someone joins a conference they are also in. 1496 Single Line Extension -- A group of phones are all treated as 1497 "extensions" of a single line. A call for one rings them all. As 1498 soon as one answers, the others stop ringing. If any extension is 1499 actively in a coversation, another extension can "pick up" and 1500 immediately join the conversation. This emulates the behavior of a 1501 home telephone line with multiple phones. 1503 Click-to-dial - Alice looks in her company directory for Bob. When 1504 she finds Bob, she clicks on a URL to call him. Her phone rings (or 1505 possibly answers automatically), and when she answers, Bob's phone 1506 rings. 1508 Pre-paid calling - Alice pays for a certain currency or unit amount 1509 of calling value. When she places a call, she provides her account 1510 number somehow. If her account runs out of calling value during a 1511 call her call is disconnected or redirected to a service where she 1512 can purchase more calling value. 1514 Voice Portal - A service that allows users to access a portal site 1515 using spoken dialog interaction. For example, Alice needs to 1516 schedule a working dinner with her co-worker Carol. Alice uses a 1517 voice portal to check Carol's flight schedule, find a restauraunt 1518 near her hotel, make a reservation, get directions there, and page 1519 Carol with this information. 1521 6.1 Implementation of these features 1522 Example Features: 1523 Call Hold [Offer/Answer] for SIP 1524 Call Waiting Local Implementation 1525 Blind Transfer [cc-transfer] 1526 Attended Transfer [cc-transfer] 1527 Consultative transfer [cc-transfer] 1528 Conference Call [conf-models] 1529 Call Park *[examples] 1530 Call Pickup *[examples] 1531 Music on Hold *[examples] 1532 Call Monitoring *Insert 1533 Barge-in *Insert or Far-Fork 1534 Hotline Local Implementation 1535 Autoanswer Local URI convention 1536 Speed dial Local Implementation 1537 Intercom *Speed dial + autoanswer 1538 Speakerphone paging *Speed dial + autoanswer 1539 Call Return Proxy feature 1540 Inbound Call Screening Proxy or Local implementation 1541 Outbound Call Screening Proxy feature 1542 Call Forwarding Proxy or Local implementation 1543 Message Waiting [msg-waiting] 1544 Do Not Disturb [presence] 1545 Distinctive ring *Proxy or Local implementation 1546 Automatic Callback 2 person presence-based conference 1547 Find-Me Proxy service based on presence 1548 Whispered call waiting Local implementation 1549 Voice message screening * 1550 Presence-based Conferencing*call when presence = available 1551 IM Conference Alerts subscribe to conference status 1552 Single Line Extension * 1553 Click-to-dial * 1554 Pre-paid calling * 1555 Voice Portal * 1557 6.1.1 Call Park 1559 Call park requires the ability to: put a dialog some place, advertise 1560 it to users in a pickup group and to uniquely identify it in a means 1561 that can be communicated (including human voice). The dialog can be 1562 held locally on the UA parking the dialog or alternatively 1563 transferred to the park service for the pickup group. The parked 1564 dialog then needs to be labeled (e.g. orbit 12) in a way that can be 1565 communicated to the party that is to pick up the call. The UAs in 1566 the pick up group discovers the parked dialog(s) via the dialog 1567 package from the park service. If the dialog is parked locally the 1568 park service merely aggregates the parked call states from the set of 1569 UAs in the pickup up group. 1571 6.1.2 Call Pickup 1573 There are two different features which are called call pickup. The 1574 first is the pickup of a parked dialog. The UA from which the dialog 1575 is to be picked up subscribes to the session dialog state of the park 1576 service or the UA which has locally parked the dialog. Dialogs which 1577 are parked should be labeled with an identifier. The labels are used 1578 by the UA to allow the user to indicate which dialog is to be picked 1579 up. The UA picking up the call invoked the URL in the call state 1580 which is labeled as replace-remote. 1582 The other call pickup feature involves picking up an early dialog 1583 (typically ringing). This feature uses some of the same primitives 1584 as the pick up of a parked call. The call state of the UA ringing 1585 phone is advertised using the dialog package. The UA which is to 1586 pickup the early dialog subscribes either directly to the ringing UA 1587 or to a service aggregating the states for UAs in the pickup group. 1588 The call state identifies early dialogs. The UA uses the call 1589 state(s) to help the user choose which early dialog that is to be 1590 picked up. The UA then invokes the URL in the call state labeled as 1591 replace-remote. 1593 6.1.3 Music on Hold 1595 Music on hold can be implemented a number of ways. One way is to 1596 transfer the held call to a holding service. When the UA wishes to 1597 take the call off hold it basically performs a take on the call from 1598 the holding service. This involves subscribing to call state on the 1599 holding service and then invoking the URL in the call state labeled 1600 as replace-remote. 1602 Alternatively music on hold can be performed as a local mixing 1603 operation. The UA holding the call can mix in the music from the 1604 music service via RTP (i.e. an additional dialog) or RTSP or other 1605 streaming media source. This approach is simpler (i.e. the held 1606 dialog does not move so there is less chance of loosing them) from a 1607 protocol perspective, however it does use more LAN bandwidth and 1608 resources on the UA. 1610 6.1.4 Call Monitoring 1612 Call monitoring is a Join operation. The monitoring UA sends a Join 1613 to the dialog it wants to listen to. It is able to discover the 1614 dialog via the dialog state on the monitored UA. The monitoring UA 1615 sends SDP in the INVITE which indicates receive only media. As the 1616 UA is monitoring only it does not matter whether the UA indicates it 1617 wishes the send stream be mix or point to point. 1619 6.1.5 Barge-in 1621 Barge-in works the same as call monitoring except that it must 1622 indicate that the send media stream to be mixed so that all of the 1623 other parties can hear the stream from UA barging in. 1625 6.1.6 Intercom 1627 The UA initiates a dialog using INVITE in the ordinary way. The 1628 calling UA then signals the paged UA to answer the call. The calling 1629 UA may discover the URL to answer the call via the session dialog 1630 package of the called UA. The called UA accepts the INVITE with a 200 1631 Ok and automatically enables the speakerphone. 1633 Alternatively this can be a local decision for the UA to answer based 1634 upon called party identification. 1636 6.1.7 Speakerphone paging 1638 Speakerphone paging can be implemented using either multicast or 1639 through a simple multipoint mixer. In the multicast solution the 1640 paging UA sends a multicast INVITE with send only media in the SDP 1641 (see also RFC3264). The automatic answer and enabling of the 1642 speakerphone is a locally configured decision on the paged UAs. The 1643 paging UA sends RTP via the multicast address indicated in the SDP. 1645 The multipoint solution is accomplished by sending an INVITE to the 1646 multipoint mixer. The mixer is configured to automatically answer 1647 the dialog. The paging UA then sends REFER requests for each of the 1648 UAs that are to become paging speakers (The UA is likely to send out 1649 a single REFER which is parallel forked by the proxy server). The 1650 UAs performing as paging speakers are configured to automatically 1651 answer based upon caller identification (e.g. To field, URI or 1652 Referred-To headers). 1654 Finally as a third option, the user agent can send a mass-invitation 1655 request to a conference server, which would create a conference and 1656 send invitations to the conference to all user agents in the paging 1657 group. 1659 6.1.8 Distinctive ring 1661 The target UA either makes a local decision based on information in 1662 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1663 Alert-Info header provded by the caller or inserted by a trusted 1664 proxy. In the latter case, the UA fetches the content described in 1665 the URI (typically via http) and renders it to the user. 1667 6.1.9 Voice message screening 1669 At first, this is the same as call monitoring. In this case the 1670 voicemail service is one of the UAs. The UA screening the message 1671 monitors the call on the voicemail service, and also subscribes to 1672 call-leg information. If the user screening their messages decides 1673 to answer, they perform a Take from the voicemail system (for 1674 example, send an INVITE with Replaces to the UA leaving the message) 1676 6.1.10 Single Line Extension 1678 Incoming calls ring all the extensions through basic parallel forking 1679 [bis]. Each extension subscribes to call-leg events from each other 1680 extension. While one user has an active call, any other UA extension 1681 can insert itself into that conversation (it already knows the 1682 call-leg information)in the same way as barge-in. 1684 6.1.11 Click-to-dial 1686 The application or server which hosts the click-to-dial application 1687 captures the URL to be dialed and can setup the call using 3pcc or 1688 can send a REFER request to the UA which is to dial the address. As 1689 users sometimes change their mind or wish to give up listing to a 1690 ringing or voicemail answered phone, this application illustrates the 1691 need to also have the ability to remotely hangup a call. 1693 6.1.12 Pre-paid calling 1695 For prepaid calling, the user's media always passes through a device 1696 which is trusted by the pre-paid provider. This may be the other 1697 endpoint (for example a PSTN gateway). In either case, an 1698 intermediary proxy or B2BUA can periodically verify the amount of 1699 time available on the pre-paid account, and use the session-timer 1700 extension to cause the trusted endpoint (gateway) or intermediary 1701 (media relay) to send a reINVITE before that time runs out. During 1702 the reINVITE, the SIP intermediary can reverify the account and 1703 insert another session-timer header. 1705 Note that while most pre-paid systems on the PSTN use an IVR to 1706 collect the account number and destination, this isn't strictly 1707 necessary for a SIP-originated prepaid call. SIP requests and SIP 1708 URIs are sufficiently expressive to convey the final destination, the 1709 provider of the prepaid service, the location from which the user is 1710 calling, and the prepaid account they want to use. If a pre-paid IVR 1711 is used, the mechanism described below (Voice Portals) can be 1712 combined as well. 1714 6.1.13 Voice Portal 1716 A voice portal is essentially a complex collection of voice dialogs 1717 used to access interesting content. One of the most desirable call 1718 control features of a Voice Portal is the ability to start a new 1719 outgoing call from within the context of the Portal (to make a 1720 restauraunt reservation, or return a voicemail message for example). 1721 Once the new call is over, the user should be able to return to the 1722 Portal by pressing a special key, using some DTMF sequence (ex: a 1723 very long pound or hash tone), or by speaking a hotword (ex: "Main 1724 Menu"). 1726 In order to accomplish this, the Voice Portal starts with the 1727 following media relationship: 1729 { User , Voice Portal } 1731 The user then asks to make an outgoing call. The Voice Portal asks 1732 the User to perform a Far-Fork. In other words the Voice Portal 1733 wants the following media relationship: 1735 { Target , User } & { User , Voice Portal } 1737 The Voice Portal is now just listening for a hotword or the 1738 appropriate DTMF. As soon as the user indicates they are done, the 1739 Voice Portal Takes the call from the old Target, and we are back to 1740 the original media relationship. 1742 This feature can also be used by the account number and phone number 1743 collection menu in a pre-paid calling service. A user can press a 1744 DTMF sequence which presents them with the appropriate menu again. 1746 Normative References 1748 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1749 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 1750 Session Initiation Protocol", RFC 3261, June 2002. 1752 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1753 Levels", BCP 14, RFC 2119, March 1997. 1755 [3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1756 Session Description Protocol (SDP)", RFC 3264, June 2002. 1758 [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1759 Notification", RFC 3265, June 2002. 1761 [5] Handley, M. and V. Jacobson, "SDP: Session Description 1762 Protocol", RFC 2327, April 1998. 1764 [6] Johnston, A. and S. Donovan, "Session Initiation Protocol 1765 Service Examples", draft-ietf-sipping-service-examples-04 (work 1766 in progress), March 2003. 1768 [7] Rosenberg, J., Schulzrinne, H., Camarillo, G. and J. Peterson, 1769 "Best Current Practices for Third Party Call Control in the 1770 Session Initiation Protocol", draft-ietf-sipping-3pcc-03 (work 1771 in progress), March 2003. 1773 [8] Sparks, R., "The SIP Refer Method", draft-ietf-sip-refer-07 1774 (work in progress), December 2002. 1776 [9] Dean, R., Biggs, B. and R. Mahy, "The Session Inititation 1777 Protocol (SIP) 'Replaces' Header", draft-ietf-sip-replaces-03 1778 (work in progress), March 2003. 1780 [10] Mahy, R. and D. Petrie, "The Session Inititation Protocol (SIP) 1781 'Join' Header", draft-ietf-sip-join-01 (work in progress), 1782 March 2003. 1784 [11] Rosenberg, J. and H. Schulzrinne, "An INVITE Inititiated Dialog 1785 Event Package for the Session Initiation Protocol (SIP", 1786 draft-ietf-sipping-dialog-package-01 (work in progress), March 1787 2003. 1789 [12] Rosenberg, J. and H. Schulzrinne, "A Session Initiation 1790 Protocol (SIP) Event Package for Conference State", 1791 draft-ietf-sipping-conference-package-00 (work in progress), 1792 June 2002. 1794 [13] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1795 Package for Registrations", draft-ietf-sipping-reg-event-00 1796 (work in progress), October 2002. 1798 [14] Rosenberg, J., "A Presence Event Package for the Session 1799 Initiation Protocol (SIP)", draft-ietf-simple-presence-10 (work 1800 in progress), January 2003. 1802 [15] Rosenberg, J., "A Framework for Conferencing with the Session 1803 Initiation Protocol", 1804 draft-rosenberg-sipping-conferencing-framework-01 (work in 1805 progress), February 2003. 1807 [16] Rosenberg, J., "A Framework and Requirements for Application 1808 Interaction in SIP", 1809 draft-rosenberg-sipping-app-interaction-framework-00 (work in 1810 progress), November 2002. 1812 [17] Mahy, R. and N. Ismail, "Media Policy Manipulation in the 1813 Conference Policy Control Protocol", 1814 draft-mahy-sipping-media-policy-control-00 (work in progress), 1815 February 2003. 1817 [18] Camarillo, G., "Transcoding Services Invocation in the Session 1818 Initiation Protocol", draft-camarillo-sip-deaf-02 (work in 1819 progress), February 2003. 1821 [19] Sparks, R. and A. Johnston, "Session Initiation Protocol Call 1822 Control - Transfer", draft-ietf-sipping-cc-transfer-01 (work in 1823 progress), February 2003. 1825 [20] Johnston, A. and O. Levin, "Session Initiation Protocol Call 1826 Control - Conferencing for User Agents", 1827 draft-johnston-sipping-cc-conferencing-01 (work in progress), 1828 February 2003. 1830 [21] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller 1831 Preferences and Callee Capabilities for the Session Initiation 1832 Protocol (SIP)", draft-ietf-sip-callerprefs-08 (work in 1833 progress), March 2003. 1835 Informational References 1837 Authors' Addresses 1839 Rohan Mahy 1840 Cisco Systems 1842 EMail: rohan@cisco.com 1844 Ben Campbell 1845 dynamicsoft 1847 EMail: bcampbell@dynamicsoft.com 1849 Robert Sparks 1850 dynamicsoft 1852 EMail: rsparks@dynamicsoft.com 1853 Jonathan Rosenberg 1854 dynamicsoft 1856 EMail: jdrosen@dynamicsoft.com 1858 Dan Petrie 1859 Pingtel 1861 EMail: dpetrie@pingtel.com 1863 Alan Johnston 1864 WorldCom 1866 EMail: alan.johnston@wcom.com 1868 Intellectual Property Statement 1870 The IETF takes no position regarding the validity or scope of any 1871 intellectual property or other rights that might be claimed to 1872 pertain to the implementation or use of the technology described in 1873 this document or the extent to which any license under such rights 1874 might or might not be available; neither does it represent that it 1875 has made any effort to identify any such rights. Information on the 1876 IETF's procedures with respect to rights in standards-track and 1877 standards-related documentation can be found in BCP-11. Copies of 1878 claims of rights made available for publication and any assurances of 1879 licenses to be made available, or the result of an attempt made to 1880 obtain a general license or permission for the use of such 1881 proprietary rights by implementors or users of this specification can 1882 be obtained from the IETF Secretariat. 1884 The IETF invites any interested party to bring to its attention any 1885 copyrights, patents or patent applications, or other proprietary 1886 rights which may cover technology that may be required to practice 1887 this standard. Please address the information to the IETF Executive 1888 Director. 1890 Full Copyright Statement 1892 Copyright (C) The Internet Society (2003). All Rights Reserved. 1894 This document and translations of it may be copied and furnished to 1895 others, and derivative works that comment on or otherwise explain it 1896 or assist in its implementation may be prepared, copied, published 1897 and distributed, in whole or in part, without restriction of any 1898 kind, provided that the above copyright notice and this paragraph are 1899 included on all such copies and derivative works. However, this 1900 document itself may not be modified in any way, such as by removing 1901 the copyright notice or references to the Internet Society or other 1902 Internet organizations, except as needed for the purpose of 1903 developing Internet standards in which case the procedures for 1904 copyrights defined in the Internet Standards process must be 1905 followed, or as required to translate it into languages other than 1906 English. 1908 The limited permissions granted above are perpetual and will not be 1909 revoked by the Internet Society or its successors or assignees. 1911 This document and the information contained herein is provided on an 1912 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1913 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1914 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1915 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1916 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1918 Acknowledgement 1920 Funding for the RFC Editor function is currently provided by the 1921 Internet Society.