idnits 2.17.1 draft-ietf-sipping-cc-framework-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 24. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1934. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1945. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1952. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1958. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 28, 2007) is 5995 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'JTAPI' on line 231 -- Looks like a reference, but probably isn't: 'CSTA' on line 232 -- Looks like a reference, but probably isn't: 'VoiceXML' on line 725 == Unused Reference: '2' is defined on line 1781, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 3265 (ref. '4') (Obsoleted by RFC 6665) -- Obsolete informational reference (is this intentional?): RFC 4566 (ref. '5') (Obsoleted by RFC 8866) == Outdated reference: A later version (-15) exists of draft-ietf-sipping-service-examples-13 == Outdated reference: A later version (-12) exists of draft-ietf-sipping-cc-transfer-08 == Outdated reference: A later version (-07) exists of draft-ietf-sip-answermode-06 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING WG R. Mahy 3 Internet-Draft Plantronics 4 Intended status: Informational B. Campbell 5 Expires: May 31, 2008 R. Sparks 6 Estacado Systems 7 J. Rosenberg 8 Cisco Systems 9 D. Petrie 10 SIP EZ 11 A. Johnston, Ed. 12 Avaya 13 November 28, 2007 15 A Call Control and Multi-party usage framework for the Session 16 Initiation Protocol (SIP) 17 draft-ietf-sipping-cc-framework-08 19 Status of this Memo 21 By submitting this Internet-Draft, each author represents that any 22 applicable patent or other IPR claims of which he or she is aware 23 have been or will be disclosed, and any of which he or she becomes 24 aware will be disclosed, in accordance with Section 6 of BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on May 31, 2008. 44 Copyright Notice 46 Copyright (C) The IETF Trust (2007). 48 Abstract 50 This document defines a framework and requirements for multi-party 51 usage of SIP. To enable discussion of multi-party features and 52 applications we define an abstract call model for describing the 53 media relationships required by many of these. The model and actions 54 described here are specifically chosen to be independent of the SIP 55 signaling and/or mixing approach chosen to actually setup the media 56 relationships. In addition to its dialog manipulation aspect, this 57 framework includes requirements for communicating related information 58 and events such as conference and session state, and session history. 59 This framework also describes other goals that embody the spirit of 60 SIP applications as used on the Internet. 62 Table of Contents 64 1. Motivation and Background . . . . . . . . . . . . . . . . . . 4 65 2. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 2.1. "Conversation Space" Model . . . . . . . . . . . . . . . . 6 67 2.2. Relationship Between Conversation Space, SIP Dialogs, 68 and SIP Sessions . . . . . . . . . . . . . . . . . . . . . 7 69 2.3. Signaling Models . . . . . . . . . . . . . . . . . . . . . 8 70 2.4. Mixing Models . . . . . . . . . . . . . . . . . . . . . . 9 71 2.4.1. Tightly Coupled . . . . . . . . . . . . . . . . . . . 10 72 2.4.2. Loosely Coupled . . . . . . . . . . . . . . . . . . . 11 73 2.5. Conveying Information and Events . . . . . . . . . . . . . 12 74 2.6. Componentization and Decomposition . . . . . . . . . . . . 14 75 2.6.1. Media Intermediaries . . . . . . . . . . . . . . . . . 14 76 2.6.2. Mixer . . . . . . . . . . . . . . . . . . . . . . . . 14 77 2.6.3. Transcoder . . . . . . . . . . . . . . . . . . . . . . 15 78 2.6.4. Media Relay . . . . . . . . . . . . . . . . . . . . . 15 79 2.6.5. Queue Server . . . . . . . . . . . . . . . . . . . . . 15 80 2.6.6. Parking Place . . . . . . . . . . . . . . . . . . . . 15 81 2.6.7. Announcements and Voice Dialogs . . . . . . . . . . . 15 82 2.7. Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 17 83 2.7.1. Naming Users in SIP . . . . . . . . . . . . . . . . . 18 84 2.7.2. Naming Services with SIP URIs . . . . . . . . . . . . 19 85 2.8. Invoker Independence . . . . . . . . . . . . . . . . . . . 21 86 2.9. Billing issues . . . . . . . . . . . . . . . . . . . . . . 21 87 3. Catalog of call control actions and sample features . . . . . 22 88 3.1. Remote Call Control Actions on Early Dialogs . . . . . . . 22 89 3.1.1. Remote Answer . . . . . . . . . . . . . . . . . . . . 22 90 3.1.2. Remote Forward or Put . . . . . . . . . . . . . . . . 23 91 3.1.3. Remote Busy or Error Out . . . . . . . . . . . . . . . 23 92 3.2. Remote Call Control Actions on Single Dialogs . . . . . . 23 93 3.2.1. Remote Dial . . . . . . . . . . . . . . . . . . . . . 23 94 3.2.2. Remote On and Off Hold . . . . . . . . . . . . . . . . 23 95 3.2.3. Remote Hangup . . . . . . . . . . . . . . . . . . . . 23 96 3.3. Call Control Actions on Multiple Dialogs . . . . . . . . . 24 97 3.3.1. Transfer . . . . . . . . . . . . . . . . . . . . . . . 24 98 3.3.2. Take . . . . . . . . . . . . . . . . . . . . . . . . . 25 99 3.3.3. Add . . . . . . . . . . . . . . . . . . . . . . . . . 25 100 3.3.4. Local Join . . . . . . . . . . . . . . . . . . . . . . 26 101 3.3.5. Insert . . . . . . . . . . . . . . . . . . . . . . . . 27 102 3.3.6. Split . . . . . . . . . . . . . . . . . . . . . . . . 27 103 3.3.7. Near-fork . . . . . . . . . . . . . . . . . . . . . . 28 104 3.3.8. Far fork . . . . . . . . . . . . . . . . . . . . . . . 28 105 4. Security Considerations . . . . . . . . . . . . . . . . . . . 29 106 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 107 6. Appendix A: Example Features . . . . . . . . . . . . . . . . . 30 108 6.1. Implementation of these features . . . . . . . . . . . . . 33 109 6.1.1. Barge-in . . . . . . . . . . . . . . . . . . . . . . . 34 110 6.1.2. Call Monitoring . . . . . . . . . . . . . . . . . . . 34 111 6.1.3. Call Park . . . . . . . . . . . . . . . . . . . . . . 35 112 6.1.4. Call Pickup . . . . . . . . . . . . . . . . . . . . . 35 113 6.1.5. Click-to-dial . . . . . . . . . . . . . . . . . . . . 35 114 6.1.6. Distinctive ring . . . . . . . . . . . . . . . . . . . 36 115 6.1.7. Intercom . . . . . . . . . . . . . . . . . . . . . . . 36 116 6.1.8. Music on Hold . . . . . . . . . . . . . . . . . . . . 36 117 6.1.9. Pre-paid calling . . . . . . . . . . . . . . . . . . . 36 118 6.1.10. Single Line Extension/Multiple Line Appearance . . . . 37 119 6.1.11. Speakerphone paging . . . . . . . . . . . . . . . . . 37 120 6.1.12. Voice message screening . . . . . . . . . . . . . . . 37 121 6.1.13. Voice Portal . . . . . . . . . . . . . . . . . . . . . 38 122 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 38 123 8. Informative References . . . . . . . . . . . . . . . . . . . . 38 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 125 Intellectual Property and Copyright Statements . . . . . . . . . . 43 127 1. Motivation and Background 129 The Session Initiation Protocol [1] (SIP) was defined for the 130 initiation, maintenance, and termination of sessions or calls between 131 one or more users. However, despite its origins as a large-scale 132 multiparty conferencing protocol, SIP is used today primarily for 133 point to point calls. This two-party configuration is the focus of 134 the SIP specification and most of its extensions. 136 This document defines a framework and requirements for multi-party 137 usage of SIP. Most multi-party operations manipulate SIP dialogs 138 (also known as call legs) or SIP conference media policy to cause 139 participants in a conversation to perceive specific media 140 relationships. In other protocols that deal with the concept of 141 calls, this manipulation is known as call control. In addition to 142 its dialog or policy manipulation aspect, "call control" also 143 includes communicating information and events related to manipulating 144 calls, including information and events dealing with session state 145 and history, conference state, user state, and even message state. 147 Based on input from the SIP community, the authors compiled the 148 following set of goals for SIP call control and multiparty 149 applications: 150 o Define Primitives, Not Services. Allow for a handful of robust 151 yet simple mechanisms that can be combined to deliver features and 152 services. Throughout this document we refer to these simple 153 mechanisms as "primitives". Primitives should be sufficiently 154 robust so that when they are combined with eachother, they can be 155 used to build lots of services. However, the goal is not to 156 define a provably complete set of primitives. Note that while the 157 IETF will NOT standardize behavior or services, it may define 158 example services for informational purposes, as in service 159 examples [6]. 160 o Participant oriented. The primitives should be designed to 161 provide services that are oriented around the experience of the 162 participants. The authors observe that end users of features and 163 services usually don't care how a media relationship is setup. 164 Their ultimate experience is based only on the resulting media and 165 other externally visible characteristics. 166 o Signaling Model independent: Support both a central control and a 167 peer-to-peer feature invocation model (and combinations of the 168 two). Baseline SIP already supports a centralized control model 169 described in 3pcc [7], and the SIP community has expressed a great 170 deal of interest in peer-to-peer or distributed call control using 171 primitives such as those defined in REFER [8], Replaces [9], and 172 Join [10]. 174 o Mixing Model independent: The bulk of interesting multiparty 175 applications involve mixing or combining media from multiple 176 participants. This mixing can be performed by one or more of the 177 participants, or by a centralized mixing resource. The experience 178 of the participants should not depend on the mixing model used. 179 While most examples in this document refer to audio mixing, the 180 framework applies to any media type. In this context a "mixer" 181 refers to combining media of the same type in an appropriate, 182 media-specific way. This is consistent with model described in 183 the SIP conferencing framework. 184 o Invoker oriented. Only the user who invokes a feature or a 185 service needs to know exactly which service is invoked or why. 186 This is good because it allows new services to be created without 187 requiring new primitives from all the participants; and it allows 188 for much simpler feature authorization policies, for example, when 189 participation spans organizational boundaries. As discussed in 190 section 3.8, this also avoids exponential state explosion when 191 combining features. The invoker only has to manage a user 192 interface or API to prevent local feature interactions. All the 193 other participants simply need to manage the feature interactions 194 of a much smaller number of primitives. 195 o Primitives make full use of URIs. URIs are a very powerful 196 mechanism for describing users and services. They represent a 197 plentiful resource that can be extremely expressive and easily 198 routed, translated, and manipulated--even across organizational 199 boundaries. URIs can contain special parameters and informational 200 headers that need only be relevant to the owner of the namespace 201 (domain) of the URI. Just as a user who selects an http: URL need 202 not understand the significance and organization of the web site 203 it references, a user may encounter a SIP URI that translates into 204 an email-style group alias, that plays a pre-recorded message, or 205 runs some complex call-handling logic. Note that while this may 206 seem paradoxical to the previous goal, both goals can be satisfied 207 by the same model. 208 o Make use of SIP headers and SIP event packages to provide SIP 209 entities with information about their environment. These should 210 include information about the status / handling of dialogs on 211 other user agents, information about the history of other contacts 212 attempted prior to the current contact, the status of 213 participants, the status of conferences, user presence 214 information, and the status of messages. 215 o Encourage service decomposition, and design to make use of 216 standard components using well-defined, simple interfaces. Sample 217 components include a SIP mixer, recording service, announcement 218 server, and voice dialog server. (This is not an exhaustive 219 list). 221 o Include authentication, authorization, policy, logging, and 222 accounting mechanisms to allow these primitives to be used safely 223 among mutually untrusted participants. Some of these mechanisms 224 may be used to assist in billing, but no specific billing system 225 will be endorsed. 226 o Permit graceful fallback to baseline SIP. Definitions for new SIP 227 call control extensions/primitives must describe a graceful way to 228 fallback to baseline SIP behavior. Support for one primitive must 229 not imply support for another primitive. 230 o There is no desire or goal to reinvent traditional models, such as 231 the model used the [H.450] family of protocols, [JTAPI], or the 232 [CSTA] call model, as these other models do not share the design 233 goals presented in this document. 235 2. Key Concepts 237 This section introduces a number of key concepts which will be used 238 to describe and explain various call control operations and services 239 in the remainder of this document. This includes the conversation 240 space model, signaling and mixing models, common components, and the 241 use of URIs. 243 2.1. "Conversation Space" Model 245 This document introduces the concept of an abstract "conversation 246 space" as a set of participants who believe they are all 247 communicating among one another. Each conversation space contains 248 one or more participants. 250 Participants are SIP User Agents that send original media to or 251 terminate and receive media from other members of the conversation 252 space. Logically, every participant in the conversation space has 253 access to all the media generated in that space (this is strictly 254 true if all participants share a common media type). A SIP User 255 Agent that does not contribute or consume any media is NOT a 256 participant; nor is a user agent that merely forwards, transcoders, 257 mixes, or selects media originating elsewhere in the conversation 258 space. [Note that a conversation space consists of zero or more SIP 259 calls or SIP conferences. A conversation space is similar to the 260 definition of a "call" in some other call models.] 262 Participants may represent human users or non-human users (referred 263 to as robots or automatons in this document). Some participants may 264 be hidden within a conversation space. Some examples of hidden 265 participants include: robots that generate tones, images, or 266 announcements during a conference to announce users arriving and 267 departing, a human call center supervisor monitoring a conversation 268 between a trainee and a customer, and robots that record media for 269 training or archival purposes. 271 Participants may also be active or passive. Active participants are 272 expected to be intelligent enough to leave a conversation space when 273 they no longer desire to participate. (An attentive human 274 participant is obviously active.) Some robotic participants (such as 275 a voice messaging system, an instant messaging agent, or a voice 276 dialog system) may be active participants if they can leave the 277 conversation space when there is no human interaction. Other robots 278 (for example our tone generating robot from the previous example) are 279 passive participants. A human participant "on-hold" is passive. 281 An example diagram of a conversation space can be shown as a "bubble" 282 or ovals, or as a "set" in curly or square brace notation. Each set, 283 oval, or "bubble" represents a conversation space. Hidden 284 participants are shown in lowercase letters. 286 Note that while the term "conversation" usually applies to oral 287 exchange of information, we apply the conversation space model to any 288 media exchange between participants. 290 { A , B } [ A , b, C, D ] 292 .-. .---. 293 / \ / \ 294 / A \ / A b \ 295 ( ) ( ) 296 \ B / \ C D / 297 \ / \ / 298 '-' '---' 300 2.2. Relationship Between Conversation Space, SIP Dialogs, and SIP 301 Sessions 303 In SIP, a call is "an informal term that refers to some communication 304 between peers, generally set up for the purposes of a multimedia 305 conversation." Obviously we cannot discuss normative behavior based 306 on such an intentionally vague definition. The concept of a 307 conversation space is needed because the SIP definition of call is 308 not sufficiently precise for the purpose of describing the user 309 experience of multiparty features. 311 Do any other definitions convey the correct meaning? SIP, and SDP 312 [5] both define a conference as "a multimedia session identified by a 313 common session description." A session is defined as "a set of 314 multimedia senders and receivers and the data streams flowing from 315 senders to receivers." Both of these definitions are heavily 316 oriented toward multicast sessions with little differentiation among 317 participants. As such, neither is particularly useful for our 318 purposes. In fact, the definition of "call" in some call models is 319 more similar to our definition of a conversation space. 321 Some examples of the relationship between conversation spaces, SIP 322 dialogs, and SIP sessions are listed below. In each example, a human 323 user will perceive that there is a single call. 324 o A simple two-party call is a single conversation space, a single 325 session, and a single dialog. 326 o A locally mixed three-way call is two sessions and two dialogs. 327 It is also a single conversation space. 328 o A simple dial-in audio conference is a single conversation space, 329 but is represented by as many dialogs and sessions as there are 330 human participants. 331 o A multicast conference is a single conversation space, a single 332 session, and as many dialogs as participants. 334 2.3. Signaling Models 336 Obviously to make changes to a conversation space, you must be able 337 to use SIP signaling to cause these changes. Specifically there must 338 be a way to manipulate SIP dialogs (call legs) to move participants 339 into and out of conversation spaces. Although this is not as 340 obvious, there also must be a way to manipulate SIP dialogs to 341 include non-participant user agents that are otherwise involved in a 342 conversation space (ex: B2BUAs, 3pcc controllers, mixers, 343 transcoders, translators, or relays). 345 Implementations may setup the media relationships described in the 346 conversation space model using a centralized control model. One 347 common way to implement this using SIP is known as 3rd Party Call 348 Control (3pcc) and is described in 3pcc [7]. The 3pcc approach 349 relies on only the following 3 primitive operations: 350 o Create a new dialog (INVITE) 351 o Modify a dialog (reINVITE) 352 o Destroy a dialog (BYE) 354 The main advantage of the 3pcc approach is that it only requires very 355 basic SIP support from end systems to support call control features. 356 As such, third-party call control is a natural way to handle protocol 357 conversion and mid-call features. It also has the advantage and 358 disadvantage that new features can/must be implemented in one place 359 only (the controller), and neither requires enhanced client 360 functionality, nor takes advantage of it. 362 In addition, a peer-to-peer approach is discussed at length in this 363 draft. The primary drawback of the peer-to-peer model is additional 364 complexity in the end system and authentication and management 365 models. The benefits of the peer-to-peer model include: 366 o state remains at the edges 367 o call signaling need only go through participants involved (there 368 are no additional points of failure) 369 o peers can take advantage of end-to-end message integrity or 370 encryption 371 o setup time is shorter (fewer messages are required to be sent by 372 the initiator of the action) 374 The peer-to-peer approach relies on additional "primitive" 375 operations, some of which are identified here. 376 o Replace an existing dialog 377 o Join a new dialog with an existing dialog 378 o Locally perform media forking (multi-unicast) 379 o Ask another UA to send a request on your behalf 381 The peer-to-peer approach also only results in a single SIP dialog, 382 directly between the two UAs. The 3pcc approach results in two SIP 383 dialogs, between each UA and the controller. As a result, the SIP 384 features and extensions that will be used during the dialog are 385 limited to the those understood by the controller. As a result, in a 386 situation where both the UAs support an advanced SIP feature but the 387 controller does not, the feature will not be able to be used. 389 Many of the features, primitives, and actions described in this 390 document also require some type of media mixing, combining, or 391 selection as described in the next section. 393 2.4. Mixing Models 395 SIP permits a variety of mixing models, which are discussed here 396 briefly. This topic is discussed more thoroughly in the SIP 397 conferencing framework [15] and cc-conferencing [19]. SIP supports 398 both tightly-coupled and loosely-coupled conferencing, although more 399 sophisticated behavior is available in tightly-coupled conferences. 400 In a tightly-coupled conference, a single SIP user agent (called the 401 focus) has a direct dialog relationship with each participant (and 402 may control non participant user agents as well). In a loosely- 403 coupled conference there is no coordinated signaling relationships 404 among the participants. 406 For brevity, only the two most popular conferencing models are 407 significantly discussed in this document (local and centralized 408 mixing). Applications of the conversation spaces model to loosely- 409 coupled multicast and distributed full unicast mesh conferences are 410 left as an exercise for the reader. Note that a distributed full 411 mesh conference can be used for basic conferences, but does not 412 easily allow for more complex conferencing actions like splitting, 413 merging, and sidebars. 415 Call control features should be designed to allow a mixer (local or 416 centralized) to decide when to reduce a conference back to a 2-party 417 call, or drop all the participants (for example if only two 418 automatons are communicating). The actual heuristics used to release 419 calls are beyond the scope of this document, but may depend on 420 properties in the conversation space, such as the number of active, 421 passive, or hidden participants; and the send-only, receive-only, or 422 send-and-receive orientation of various participants. 424 2.4.1. Tightly Coupled 426 Tightly coupled conferences utilize a central point for signaling and 427 authentication known as a focus [15]. The actual media can be 428 centrally mixed or distributed. 430 2.4.1.1. (Single) End System Mixing 432 The first model we call "end system mixing". In this model, user A 433 calls user B, and they have a conversation. At some point later, A 434 decides to conference in user C. To do this, A calls C, using a 435 completely separate SIP call. This call uses a different Call-ID, 436 different tags, etc. There is no call set up directly between B and 437 C. No SIP extension or external signaling is needed. A merely 438 decides to locally join two dialogs. 440 B C 441 \ / 442 \ / 443 A 445 A receives media streams from both B and C, and mixes them. A sends 446 a stream containing A's and C's streams to B, and a stream containing 447 A's and B's streams to C. Basically, user A handles both signaling 448 and media mixing. 450 2.4.1.2. Centralized Mixing 452 In a centralized mixing model, all participants have a pairwise SIP 453 and media relationship with the mixer. Common applications of 454 centralized mixing include ad-hoc conferences and scheduled dial-in 455 or dial-out conferences. In the figure below, the mixer M receives 456 and sends media to participants A, B, C, D, and E. 458 B C 459 \ / 460 \ / 461 M --- A 462 / \ 463 / \ 464 D E 466 2.4.1.3. Centralized Signaling, Distributed Media 468 In this conferencing model, there is a centralized controller, as in 469 the dial-in and dial-out cases. However, the centralized server 470 handles signaling only. The media is still sent directly between 471 participants, using either multicast or multi-unicast. Participants 472 perform their own mixing. Multi-unicast is when a user sends 473 multiple packets (one for each recipient, addressed to that 474 recipient). This is referred to as a "Decentralized Multipoint 475 Conference" in [H.323]. Full mesh media with centralized mixing is 476 another approach. 478 2.4.2. Loosely Coupled 480 In these models, there is no point of central control of SIP 481 signaling. As in the "Centralized Signaling, Distributed Media" case 482 above, all endpoints send media to all other endpoints. Consequently 483 every endpoint mixes their own media from all the other sources, and 484 sends their own media to every other participant. 486 2.4.2.1. Large-Scale Multicast Conferences 488 Large-scale multicast conferences were the original motivation for 489 both the Session Description Protocol SDP [5] and SIP. In a large- 490 scale multicast conference, one or more multicast addresses are 491 allocated to the conference. Each participant joins those multicast 492 groups, and sends their media to those groups. Signaling is not sent 493 to the multicast groups. The sole purpose of the signaling is to 494 inform participants of which multicast groups to join. Large-scale 495 multicast conferences are usually pre-arranged, with specific start 496 and stop times. However, multicast conferences do not need to be 497 pre-arranged, so long as a mechanism exists to dynamically obtain a 498 multicast address. 500 2.4.2.2. Full Distributed Unicast Conferencing 502 In this conferencing model, each participant has both a pairwise 503 media relationship and a pairwise signaling relationship with every 504 other participant (a full mesh). This model requires a mechanism to 505 maintain a consistent view of distributed state across the group. 506 This is a classic hard problem in computer science. Also, this model 507 does not scale well for large numbers of participants. because for 508 participants the number of media and signaling relationships is 509 approximately n-squared. As a result, this model is not generally 510 available in commercial implementations; to the contrary it is 511 primarily the topic of research or experimental implementations. 512 Note that this model assumes peer-to-peer signaling. 514 2.5. Conveying Information and Events 516 Participants should have access to information about the other 517 participants in a conversation space, so that this information can be 518 rendered to a human user or processed by an automaton. Although some 519 of this information may be available from the Request-URI or To, 520 From, Contact, or other SIP headers, another mechanism of reporting 521 this information is necessary. 523 Many applications are driven by knowledge about the progress of calls 524 and conferences. In general these types of events allow for the 525 construction of distributed applications, where the application 526 requires information on dialog and conference state, but is not 527 necessarily co-resident with an endpoint user agent or conference 528 server. For example, a focus involved in a conversation space may 529 wish to provide URIs for conference status, and/or conference/floor 530 control. 532 The SIP Events [4] architecture defines general mechanisms for 533 subscription to and notification of events within SIP networks. It 534 introduces the notion of a package that is a specific "instantiation" 535 of the events mechanism for a well-defined set of events. 537 Event packages are needed to provide the status of a user's dialogs, 538 provide the status of conferences and its participants, provide user 539 presence information, provide the status of registrations, and 540 provide the status of user's messages. While this is not an 541 exhaustive list, these are sufficient to enable the sample features 542 described in this document. 544 The conference event package [12] allows users to subscribe to 545 information about an entire tightly-coupled SIP conference. 546 Notifications convey information about the participants such as: the 547 SIP URI identifying each user, their status in the space (active, 548 declined, departed), URIs to invoke other features (such as sidebar 549 conversations), links to other relevant information (such as floor 550 control policies), and if floor control policies are in place, the 551 user's floor control status. For conversation spaces created from 552 cascaded conferences, conversation state can be gathered from 553 relevant foci and merged into a cohesive set of state. 555 The dialog package [11] provides information about all the dialogs 556 the target user is maintaining, what conversations the user in 557 participating in, and how these are correlated. Likewise the 558 registration package [13] provides notifications when contacts have 559 changed for a specific address-of-record. The combination of these 560 allows a user agent to learn about all conversations occurring for 561 the entire registered contact set for an address-of-record. 563 Note that user presence in SIP [14] has a close relationship with 564 these later two event packages. It is fundamental to the presence 565 model that the information used to obtain user presence is 566 constructed from any number of different input sources. Examples of 567 other such sources include calendaring information and uploads of 568 presence documents. These two packages can be considered another 569 mechanism that allows a presence agent to determine the presence 570 state of the user. Specifically, a user presence server can act as a 571 subscriber for the dialog and registration packages to obtain 572 additional information that can be used to construct a presence 573 document. 575 The multi-party architecture may also need to provide a mechanism to 576 get information about the status /handling of a dialog (for example, 577 information about the history of other contacts attempted prior to 578 the current contact). Finally, the architecture should provide ample 579 opportunities to present informational URIs that relate to calls, 580 conversations, or dialogs in some way. For example, consider the SIP 581 Call-Info header, or Contact headers returned in a 300-class 582 response. Frequently additional information about a call or dialog 583 can be fetched via non-SIP URIs. For example, consider a web page 584 for package tracking when calling a delivery company, or a web page 585 with related documentation when joining a dial-in conference. The 586 use of URIs in the multiparty framework is discussed in more detail 587 in Section 3.7. 589 Finally the interaction of SIP with stimulus-signaling-based 590 applications, that allow a user agent to interact with an application 591 without knowledge of the semantics of that application, is discussed 592 in the SIP application interaction framework [16]. Stimulus 593 signaling can occur to a user interface running locally with the 594 client, or to a remote user interface, through media streams. 595 Stimulus signaling encompasses a wide range of mechanisms, ranging 596 from clicking on hyperlinks, to pressing buttons, to traditional Dual 597 Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling 598 is supported through the use of markup languages, which play a key 599 role in that framework. 601 2.6. Componentization and Decomposition 603 This framework proposes a decomposed component architecture with a 604 very loose coupling of services and components. This means that a 605 service (such as a conferencing server or an auto-attendant) need not 606 be implemented as an actual server. Rather, these services can be 607 built by combining a few basic components in straightforward or 608 arbitrarily complex ways. 610 Since the components are easily deployed on separate boxes, by 611 separate vendors, or even with separate providers, we achieve a 612 separation of function that allows each piece to be developed in 613 complete isolation. We can also reuse existing components for new 614 applications. This allows rapid service creation, and the ability 615 for services to be distributed across organizational domains anywhere 616 in the Internet. 618 For many of these components it is also desirable to discover their 619 capabilities, for example querying the ability of a mixer to host a 620 10 dialog conference, or to reserve resources for a specific time. 621 These actions could be provided in the form of URIs, provided there 622 is an a priori means of understanding their semantics. For example 623 if there is a published dictionary of operations, a way to query the 624 service for the available operations and the associated URIs, the URI 625 can be the interface for providing these service operations. This 626 concept is described in more detail in the context of dialog 627 operations in Section 3. 629 2.6.1. Media Intermediaries 631 Media Intermediaries are not participants in any conversation space, 632 although an entity that is also a media translator may also have a 633 co-located participant component (for example a mixer that also 634 announces the arrival of a new participant; the announcement portion 635 is a participant, but the mixer itself is not). Media intermediaries 636 should be as transparent as possible to the end users--offering a 637 useful, fundamental service; without getting in the way of new 638 features implemented by participants. Some common media 639 intermediaries are described below. 641 2.6.2. Mixer 643 A SIP mixer is a component that combines media from all dialogs in 644 the same conversation in a media specific way. For example, the 645 default combining for an audio conference might be an N-1 646 configuration, while a text mixer might interleave text messages on a 647 per-line basis. More details about how to manipulate the media 648 policy used by mixers is being discussed in the XCON Working Group. 650 2.6.3. Transcoder 652 A transcoder translates media from one encoding or format to another 653 (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to 654 text/plain), or from one media type to another (for example text to 655 speech). A more thorough discussion of transcoding is described in 656 SIP transcoding services invocation [17]. 658 2.6.4. Media Relay 660 A media relay terminates media and simply forwards it to a new 661 destination without changing the content in any way. Sometimes media 662 relays are used to provide source IP address anonymity, to facilitate 663 middlebox traversal, or to provide a trusted entity where media can 664 be forcefully disconnected. 666 2.6.5. Queue Server 668 A queue server is a location where calls can be entered into one of 669 several FIFO (first-in, first-out) queues. A queue server would 670 subscribe to the presence of groups or individuals who are interested 671 in its queues. When detecting that a user is available to service a 672 queue, the server redirects or transfers the last call in the 673 relevant queue to the available user. On a queue-by-queue basis, 674 authorized users could also subscribe to the call state (dialog 675 information) of calls within a queue. Authorized users could use 676 this information to effectively pluck (take) a call out of the queue 677 (for example by sending an INVITE with a Replaces header to one of 678 the user agents in the queue). 680 2.6.6. Parking Place 682 A parking place is a location where calls can be terminated 683 temporarily and then retrieved later. While a call is "parked", it 684 can receive media "on-hold" such as music, announcements, or 685 advertisements. Such a service could be further decomposed such that 686 announcements or music are handled by a separate component. 688 2.6.7. Announcements and Voice Dialogs 690 An announcement server is a server that can play digitized media 691 (frequently audio), such as music or recorded speech. These servers 692 are typically accessible via SIP, HTTP, or RTSP. An analogous 693 service is a recording service that stores digitized media. A 694 convention for specifying announcements in SIP URIs is described in 695 [24]. Likewise the same server could easily provide a service that 696 records digitized media. 698 A "voice dialog" is a model of spoken interactive behavior between a 699 human and an automaton that can include synthesized speech, digitized 700 audio, recognition of spoken and DTMF key input, recording of spoken 701 input, and interaction with call control. Voice dialogs frequently 702 consist of forms or menus. Forms present information and gather 703 input; menus offer choices of what to do next. 705 Spoken dialogs are a basic building block of applications that use 706 voice. Consider for example that a voice mail system, the 707 conference-id and passcode collection system for a conferencing 708 system, and complicated voice portal applications all require a voice 709 dialog component. 711 2.6.7.1. Text-to-Speech and Automatic Speech Recognition 713 Text-to-Speech (TTS) is a service that converts text into digitized 714 audio. TTS is frequently integrated into other applications, but 715 when separated as a component, it provides greater opportunity for 716 broad reuse. Automatic Speech Recognition (ASR) is a service that 717 attempts to decipher digitized speech based on a proposed grammar. 718 Like TTS, ASR services can be embedded, or exposed so that many 719 applications can take advantage of such services. A standardized 720 (decomposed) interface to access standalone TTS and ASR services is 721 currently being developed in the SPEECHSC Working Group. 723 2.6.7.2. VoiceXML 725 [VoiceXML] is a W3C recommendation that was designed to give authors 726 control over the spoken dialog between users and applications. The 727 application and user take turns speaking: the application prompts the 728 user, and the user in turn responds. Its major goal is to bring the 729 advantages of web-based development and content delivery to 730 interactive voice response applications. We believe that VoiceXML 731 represents the ideal partner for SIP in the development of 732 distributed IVR servers. VoiceXML is an XML based scripting language 733 for describing IVR services at an abstract level. VoiceXML supports 734 DTMF recognition, speech recognition, text-to-speech, and playing out 735 of recorded media files. The results of the data collected from the 736 user are passed to a controlling entity through an HTTP POST 737 operation. The controller can then return another script, or 738 terminate the interaction with the IVR server. 740 A VoiceXML server also need not be implemented as a monolithic 741 server. Below is a diagram of a VoiceXML browser that is split into 742 media and non-media handling parts. The VoiceXML interpreter handles 743 SIP dialog state and state within a VoiceXML document, and sends 744 requests to the media component over another protocol. 746 +-------------+ 747 | | 748 | VoiceXML | 749 | Interpreter | 750 | (signaling) | 751 +-------------+ 752 ^ ^ 753 | | 754 SIP | | RTSP 755 | | 756 | | 757 v v 758 +-------------+ +-------------+ 759 | | | | 760 | SIP UA | RTP | RTSP Server | 761 | |<------>| (media) | 762 | | | | 763 +-------------+ +-------------+ 765 Figure : Decomposed VoiceXML Server 767 2.7. Use of URIs 769 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 770 contexts: the Request-URI; Contact, To, From, and *-Info headers; 771 application/uri bodies; and embedded in email, web pages, instant 772 messages, and ENUM records. The request-URI identifies the user or 773 service that the call is destined for. 775 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 776 SIP content can also specify methods, special parameters, headers, 777 and even bodies. For example: 779 sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice 781 Throughout this draft we discuss call control primitive operations. 782 One of the biggest problems is defining how these operations may be 783 invoked. There are a number of ways to do this. One way is to 784 define the primitives in the protocol itself such that SIP methods 785 (for example REFER) or SIP headers (for example Replaces) indicate a 786 specific call control action. Another way to invoke call control 787 primitives is to define a specific Request-URI naming convention. 788 Either these conventions must be shared between the client (the 789 invoker) and the server, or published by or on behalf of the server. 790 The former involves defining URI construction techniques (e.g. URI 791 parameters and/or token conventions) as proposed in [24]. The latter 792 technique usually involves discovering the URI via a SIP event 793 package, a web page, a business card, or an Instant Message. Yet 794 another means to acquire the URIs is to define a dictionary of 795 primitives with well-defined semantics and provide a means to query 796 the named primitives and corresponding URIs that may be invoked on 797 the service or dialogs. 799 2.7.1. Naming Users in SIP 801 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 802 that points to a domain with a location server that can map the URI 803 to set of Contact URIs where the user might be available. Typically 804 the Contact URIs are populated via registration. 806 Address of Record Contacts 808 sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060 809 sip:bbrown@mailbox.provider.example.net 810 sip:+1.408.555.6789@mobile.example.net 812 Callee Capabilities [20] defines a set of additional parameters to 813 the Contact header that define the characteristics of the user agent 814 at the specified URI. For example, there is a mobility parameter 815 that indicates whether the UA is fixed or mobile. When a user agent 816 registers, it places these parameters in the Contact headers to 817 characterize the URIs it is registering. This allows a proxy for 818 that domain to have information about the contact addresses for that 819 user. 821 When a caller sends a request, it can optionally request Caller 822 Preferences [21], by including the Accept-Contact, Request- 823 Disposition, and Reject-Contact headers that request certain handling 824 by the proxy in the target domain. These headers contain preferences 825 that describe the set of desired URIs to which the caller would like 826 their request routed. The proxy in the target domain matches these 827 preferences with the Contact characteristics originally registered by 828 the target user. The target user can also choose to run arbitrarily 829 complex "Find-me" feature logic on a proxy in the target domain. 831 There is a strong asymmetry in how preferences for callers and 832 callees can be presented to the network. While a caller takes an 833 active role by initiating the request, the callee takes a passive 834 role in waiting for requests. This motivates the use of callee- 835 supplied scripts and caller preferences included in the call request. 836 This asymmetry is also reflected in the appropriate relationship 837 between caller and callee preferences. A server for a callee should 838 respect the wishes of the caller to avoid certain locations, while 839 the preferences among locations has to be the callee's choice, as it 840 determines where, for example, the phone rings and whether the callee 841 incurs mobile telephone charges for incoming calls. 843 SIP User Agent implementations are encouraged to make intelligent 844 decisions based on the type of participants (active/passive, hidden, 845 human/robot) in a conversation space. This information is conveyed 846 via the dialog package or in a SIP header parameter communicated 847 using an appropriate SIP header. For example, a music on hold 848 service may take the sensible approach that if there are two or more 849 unhidden participants, it should not provide hold music; or that it 850 will not send hold music to robots. 852 Multiple participants in the same conversation space may represent 853 the same human user. For example, the user may use one participant 854 for video, chat, and whiteboard media on a PC and another for audio 855 media on a SIP phone. In this case, the address-of-record is the 856 same for both user agents, but the Contacts are different. In 857 addition, human users may add robot participants that act on their 858 behalf (for example a call recording service, or a calendar 859 announcement reminder). Call Control features in SIP should continue 860 to function as expected in such an environment. 862 2.7.2. Naming Services with SIP URIs 864 A critical piece of defining a session level service that can be 865 accessed by SIP is defining the naming of the resources within that 866 service. This point cannot be overstated. 868 In the context of SIP control of application components, we take 869 advantage of the fact that the left-hand-side of a standard SIP URI 870 is a user part. Most services may be thought of as user automatons 871 that participate in SIP sessions. It naturally follows that the user 872 part should be utilized as a service indicator. 874 For example, media servers commonly offer multiple services at a 875 single host address. Use of the user part as a service indicator 876 enables service consumers to direct their requests without ambiguity. 877 It has the added benefit of enabling media services to register their 878 availability with SIP Registrars just as any "real" SIP user would. 879 This maintains consistency and provides enhanced flexibility in the 880 deployment of media services in the network. 882 There has been much discussion about the potential for confusion if 883 media services URIs are not readily distinguishable from other types 884 of SIP UAs. The use of a service namespace provides a mechanism to 885 unambiguously identify standard interfaces while not constraining the 886 development of private or experimental services. 888 In SIP, the Request-URI identifies the user or service that the call 889 is destined for. The great advantage of using URIs (specifically, 890 the SIP Request-URI) as a service identifier comes because of the 891 combination of two facts. First, unlike in the PSTN, where the 892 namespace (dialable telephone numbers) are limited, URIs come from an 893 infinite space. They are plentiful, and they are free. Secondly, 894 the primary function of SIP is call routing through manipulations of 895 the Request-URI. In the traditional SIP application, this URI 896 represents a person. However, the URI can also represent a service, 897 as we propose here. This means we can apply the routing services SIP 898 provides to routing of calls to services. The result - the problem 899 of service invocation and service location becomes a routing problem, 900 for which SIP provides a scalable and flexible solution. Since there 901 is such a vast namespace of services, we can explicitly name each 902 service in a finely granular way. This allows the distribution of 903 services across the network. For further discussion about services 904 and SIP URIs, see RFC 3087 [22] 906 Consider a conferencing service, where we have separated the names of 907 ad-hoc conferences from scheduled conferences, we can program proxies 908 to route calls for ad-hoc conferences to one set of servers, and 909 calls for scheduled ones to another, possibly even in a different 910 provider. In fact, since each conference itself is given a URI, we 911 can distribute conferences across servers, and easily guarantee that 912 calls for the same conference always get routed to the same server. 913 This is in stark contrast to conferences in the telephone network, 914 where the equivalent of the URI - the phone number - is scarce. An 915 entire conferencing provider generally has one or two numbers. 916 Conference IDs must be obtained through IVR interactions with the 917 caller, or through a human attendant. This makes it difficult to 918 distribute conferences across servers all over the network, since the 919 PSTN routing only knows about the dialed number. 921 For more examples, consider the URI conventions of RFC 4240 [24] for 922 media servers and RFC 4458 [25] for voicemail and IVR systems. 924 In practical applications, it is important that an invoker does not 925 necessarily apply semantic rules to various URIs it did not create. 926 Instead, it should allow any arbitrary string to be provisioned, and 927 map the string to the desired behavior. The administrator of a 928 service may choose to provision specific conventions or mnemonic 929 strings, but the application should not require it. In any large 930 installation, the system owner is likely to have pre-existing rules 931 for mnemonic URIs, and any attempt by an application to define its 932 own rules may create a conflict. Implementations should allow an 933 arbitrary mix of URIs from these schemes, or any other scheme that 934 renders valid SIP URIs to be provisioned, rather than enforce only 935 one particular scheme. 937 As we have shown, SIP URIs represent an ideal, flexible mechanism for 938 describing and naming service resources, regardless if the resources 939 are queues, conferences, voice dialogs, announcements, voicemail 940 treatments, or phone features. 942 2.8. Invoker Independence 944 With functional signaling, only the invoker of features in SIP need 945 to know exactly which feature they are invoking. One of the primary 946 benefits of this approach is that combinations of functional features 947 work in SIP call control without requiring complex feature 948 interaction matrices. For example, let us examine the combination of 949 a "transfer" of a call that is "conferenced". 951 Alice calls Bob. Alice silently "conferences in" her robotic 952 assistant Albert as a hidden party. Bob transfers Alice to Carol. 953 If Bob asks Alice to Replace her leg with a new one to Carol then 954 both Alice and Albert should be communicating with Carol 955 (transparently). 957 Using the peer-to-peer model, this combination of features works fine 958 if A is doing local mixing (Alice replaces Bob's dialog with 959 Carol's), or if A is using a central mixer (the mixer replaces Bob's 960 dialog with Carol's). A clever implementation using the 3pcc model 961 can generate similar results. 963 New extensions to the SIP Call Control Framework should attempt to 964 preserve this property. 966 2.9. Billing issues 968 Billing in the PSTN is typically based on who initiated a call. At 969 the moment billing in a SIP network is neither consistent with 970 itself, nor with the PSTN. (A billing model for SIP should allow for 971 both PSTN-style billing, and non-PSTN billing.) The example below 972 demonstrates one such inconsistency. 974 Alice places a call to Bob. Alice then blind transfers Bob to Carol 975 through a PSTN gateway. In current usage of REFER, Bob may be billed 976 for a call he did not initiate (his UA originated the outgoing dialog 977 however). This is not necessarily a terrible thing, but it 978 demonstrates a security concern (Bob must have appropriate local 979 policy to prevent fraud). Also, Alice may wish to pay for Bob's 980 session with Carol. There should be a way to signal this in SIP. 982 Likewise a Replacement call may maintain the same billing 983 relationship as a Replaced call, so if Alice first calls Carol, then 984 asks Bob to Replace this call, Alice may continue to receive a bill. 986 Further work in SIP billing should define a way to set or discover 987 the direction of billing. 989 3. Catalog of call control actions and sample features 991 Call control actions can be categorized by the dialogs upon which 992 they operate. The actions may involve a single or multiple dialogs. 993 These dialogs can be early or established. Multiple dialogs may be 994 related in a conversation space to form a conference or other 995 interesting media topologies. 997 It should be noted that it is desirable to provide a means by which a 998 party can discover the actions that may be performed on a dialog. 999 The interested party may be independent or related to the dialogs. 1000 One means of accomplishing this is through the ability to define and 1001 obtain URIs for these actions as described in section . 1003 Below are listed several call control "actions" that establish or 1004 modify dialogs and relate the participants in a conversation space. 1005 The names of the actions listed are for descriptive purposes only 1006 (they are not normative). This list of actions is not meant to be 1007 exhaustive. 1009 In the examples, all actions are initiated by the user "Alice" 1010 represented by UA "A". 1012 3.1. Remote Call Control Actions on Early Dialogs 1014 The following are a set of actions that may be performed on a single 1015 early dialog. These actions can be thought of as a set of remote 1016 control operations. For example an automaton might perform the 1017 operation on behalf of a user. Alternatively a user might use the 1018 remote control in the form of an application to perform the action on 1019 the early dialog of a UA that may be out of reach. All of these 1020 actions correspond to telling the UA how to respond to a request to 1021 establish an early dialog. These actions provide useful 1022 functionality for PDA, PC and server based applications that desire 1023 the ability to control a UA. A proposed mechanism for this type of 1024 functionality is described in Remote Call Control [23]. 1026 3.1.1. Remote Answer 1028 A dialog is in some early dialog state such as 180 Ringing. It may 1029 be desirable to tell the UA to answer the dialog. That is tell it to 1030 send a 200 Ok response to establish the dialog. 1032 3.1.2. Remote Forward or Put 1034 It may be desirable to tell the UA to respond with a 3xx class 1035 response to forward an early dialog to another UA. 1037 3.1.3. Remote Busy or Error Out 1039 It may be desirable to instruct the UA to send an error response such 1040 as 486 Busy Here. 1042 3.2. Remote Call Control Actions on Single Dialogs 1044 There is another useful set of actions that operate on a single 1045 established dialog. These operations are useful in building 1046 productivity applications for aiding users to control their phone. 1047 For example a Customer Relationship Management (CRM) application that 1048 sets up calls for a user eliminating the need for the user to 1049 actually enter an address. These operations can also be thought of a 1050 remote control actions. A proposed mechanism for this type of 1051 functionality is described in Remote Call Control [23]. 1053 3.2.1. Remote Dial 1055 This action instructs the UA to initiate a dialog. This action can 1056 be performed using the REFER method. 1058 3.2.2. Remote On and Off Hold 1060 This action instructs the UA to put an established dialog on hold. 1061 Though this operation can conceptually be performed with the REFER 1062 method, there is no semantics defined as to what the referred party 1063 should do with the SDP. There is no way to distinguish between the 1064 desire to go on or off hold on a per media stream basis. 1066 3.2.3. Remote Hangup 1068 This action instructs the UA to terminate an early or established 1069 dialog. A REFER request with the following Refer-To URI and Target- 1070 Dialog header field [26] performs this action. Note: this example 1071 does not show the full set of header fields. 1073 REFER sip:carol@client.chicago.net SIP/2.0 1074 Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE 1075 Target-Dialog: 13413098;local-tag=879738;remote-tag=023214 1077 3.3. Call Control Actions on Multiple Dialogs 1079 These actions apply to a set of related dialogs. 1081 3.3.1. Transfer 1083 This section describes how call transfer can be achieved using 1084 centralized (3pcc) and peer-to-peer (REFER) approaches. 1086 The conversation space changes as follows: 1088 before after 1089 { A , B } --> { C , B } 1091 A replaces itself with C. 1093 To make this happen using the peer-to-peer approach, "A" would send 1094 two SIP requests. A shorthand for those requests is shown below: 1096 REFER B Refer-To:C 1097 BYE B 1099 To make this happen instead using the 3pcc approach, the controller 1100 sends requests represented by the shorthand below: 1102 INVITE C (w/SDP of B) 1103 reINVITE B (w/SDP of C) 1104 BYE A 1106 Features enabled by this action: 1108 - blind transfer 1109 - transfer to a central mixer (some type of conference or forking) 1110 - transfer to park server (park) 1111 - transfer to music on hold or announcement server 1112 - transfer to a "queue" 1113 - transfer to a service (such as Voice Dialogs service) 1114 - transition from local mixer to central mixer 1116 This action is frequently referred to as "completing an attended 1117 transfer". It is described in more detail in cc-transfer [18]. 1119 Note that if a transfer requires URI hiding or privacy, then the 3pcc 1120 approach can more easily implement this. For example, if the URI of 1121 C needs to be hidden from B, then the use of 3pcc helps accomplish 1122 this. 1124 3.3.2. Take 1126 The conversation space changes as follows: 1128 { B , C } --> { B , A } 1130 A forcibly replaces C with itself. In most uses of this primitive, A 1131 is just "un-replacing" itself. 1133 Using the peer-to-peer approach, "A" sends: 1135 INVITE B Replaces: 1137 Using the 3pcc approach (all requests sent from controller) 1139 INVITE A (w/SDP of B) 1140 reINVITE B (w/SDP of A) 1141 BYE C 1143 Features enabled by this action: 1145 - transferee completes an attended transfer 1146 - retrieve from central mixer (not recommended) 1147 - retrieve from music on hold or park 1148 - retrieve from queue 1149 - call center take 1150 - voice portal resuming ownership of a call it originated 1151 - answering-machine style screening (pickup) 1152 - pickup of a ringing call (i.e. early dialog) 1154 Note: that pick up of a ringing call has perhaps some interesting 1155 additional requirements. First of all it is an early dialog as 1156 opposed to an established dialog. Secondly the party which is to 1157 pickup the call may only wish to do so only while it is an early 1158 dialog. That is in the race condition where the ringing UA accepts 1159 just before it receives signaling from the party wishing to take the 1160 call, the taking party wishes to yield or cancel the take. The goal 1161 is to avoid yanking an answered call from the called party. 1163 This action is described in Replaces [9] and in cc-transfer [18]. 1165 3.3.3. Add 1167 Note that the following 4 actions are described in cc-conferencing 1168 [19]. 1170 This is merely adding a participant to a SIP conference. The 1171 conversation space changes as follows: 1173 { A , B } --> { A , B , C } 1175 A adds C to the conversation. 1177 Using the peer-to-peer approach, adding a party using local mixing 1178 requires no signaling. To transition from a 2-party call or a 1179 locally mixed conference to centrally mixing A could send the 1180 following requests: 1182 REFER B Refer-To: conference-URI 1183 INVITE conference-URI 1184 BYE B 1186 To add a party to a conference: 1188 REFER C Refer-To: conference-URI 1189 or 1190 REFER conference-URI Refer-To: C 1192 Using the 3pcc approach to transition to centrally mixed, the 1193 controller would send: 1195 INVITE mixer leg 1 (w/SDP of A) 1196 INVITE mixer leg 2 (w/SDP of B) 1197 INVITE C (late SDP) 1198 reINVITE A (w/SDP of mixer leg 1) 1199 reINVITE B (w/SDP of mixer leg 2) 1200 INVITE mixer leg3 (w/SDP of C) 1202 To add a party to a SIP conference: 1204 INVITE C (late SDP) 1205 INVITE conference-URI (w/SDP of C) 1207 Features enabled: 1209 - standard conference feature 1210 - call recording 1211 - answering-machine style screening (screening) 1213 3.3.4. Local Join 1215 The conversation space changes like this: 1217 { A , B } , { A , C } --> { A , B , C } 1219 or like this 1221 { A , B } , { C , D } --> { A , B , C , D } 1223 A takes two conversation spaces and joins them together into a single 1224 space. 1226 Using the peer-to-peer approach, A can mix locally, or REFER the 1227 participants of both conversation spaces to the same central mixer 1228 (as in 3.3.5). 1230 For the 3pcc approach, the call flows for inserting participants, and 1231 joining and splitting conversation spaces are tedious yet 1232 straightforward, so these are left as an exercise for the reader. 1234 Features enabled: 1236 - standard conference feature 1237 - leaving a sidebar to rejoin a larger conference 1239 3.3.5. Insert 1241 The conversation space changes like this: 1243 { B , C } --> { A , B , C } 1245 A inserts itself into a conversation space. 1247 A proposed mechanism for signaling this using the peer-to-peer 1248 approach is to send a new header in an INVITE with "joining" [10] 1249 semantics. For example: 1251 INVITE B Join: 1253 If B accepted the INVITE, B would accept responsibility to setup the 1254 dialogs and mixing necessary (for example: to mix locally or to 1255 transfer the participants to a central mixer) 1257 Features enabled: 1259 - barge-in 1260 - call center monitoring 1261 - call recording 1263 3.3.6. Split 1265 { A , B , C , D } --> { A , B } , { C , D } 1267 If using a central conference with peer-to-peer 1268 REFER C Refer-To: conference-URI (new URI) 1269 REFER D Refer-To: conference-URI (new URI) 1270 BYE C 1271 BYE D 1273 Features enabled: 1275 - sidebar conversations during a larger conference 1277 3.3.7. Near-fork 1279 A participates in two conversation spaces simultaneously: 1281 { A, B } --> { B , A } & { A , C } 1283 A is a participant in two conversation spaces such that A sends the 1284 same media to both spaces, and renders media from both spaces, 1285 presumably by mixing or rendering the media from both. We can define 1286 that A is the "anchor" point for both forks, each of which is a 1287 separate conversation space. 1289 This action is purely local implementation (it requires no special 1290 signaling). Local features such as switching calls between the 1291 background and foreground are possible using this media relationship. 1293 3.3.8. Far fork 1295 The conversation space diagram... 1297 { A, B } --> { A , B } & { B , C } 1299 A requests B to be the "anchor" of two conversation spaces. 1301 This is easily setup by creating a conference with two sub- 1302 conferences and setting the media policy appropriately such that B is 1303 a participant in both. Media forking can also be setup using 3pcc as 1304 described in Section 5.1 of RFC3264 [3] (an offer/answer model for 1305 SDP). The session descriptions for forking are quite complex. 1306 Controllers should verify that endpoints can handle forked-media, for 1307 example using prior configuration. 1309 Features enabled: 1311 - barge-in 1312 - voice portal services 1313 - whisper 1314 - hotword detection 1315 - sending DTMF somewhere else 1317 4. Security Considerations 1319 Call Control primitives provide a powerful set of features that can 1320 be dangerous in the hands of an attacker. To complicate matters, 1321 call control primitives are likely to be automatically authorized 1322 without direct human oversight. 1324 The class of attacks that are possible using these tools include the 1325 ability to eavesdrop on calls, disconnect calls, redirect calls, 1326 render irritating content (including ringing) at a user agent, cause 1327 an action that has billing consequences, subvert billing (theft-of- 1328 service), and obtain private information. Call control extensions 1329 must take extra care to describe how these attacks will be prevented. 1331 We can also make some general observations about authorization and 1332 trust with respect to call control. The security model is 1333 dramatically dependent on the signaling model chosen (see section 1334 3.2) 1336 Let us first examine the security model used in the 3pcc approach. 1337 All signaling goes through the controller, which is a trusted entity. 1338 Traditional SIP authentication and hop-by-hop encryption and message 1339 integrity work fine in this environment, but end-to-end encryption 1340 and message integrity may not be possible. 1342 When using the peer-to-peer approach, call control actions and 1343 primitives can be legitimately initiated by a) an existing 1344 participant in the conversation space, b) a former participant in the 1345 conversation space, or c) an entity trusted by one of the 1346 participants. For example, a participant always initiates a 1347 transfer; a retrieve from Park (a take) is initiated on behalf of a 1348 former participant; and a barge-in (insert or far-fork) is initiated 1349 by a trusted entity (an operator for example). 1351 Authenticating requests by an existing participant or a trusted 1352 entity can be done with baseline SIP mechanisms. In the case of 1353 features initiated by a former participant, these should be protected 1354 against replay attacks by using a unique name or identifier per 1355 invocation. The Replaces header exhibits this behavior as a by- 1356 product of its operation (once a Replaces operation is successful, 1357 the dialog being Replaced no longer exists). For other requests, a 1358 "one-time" Request-URI may be provided to the feature invoker. 1360 To authorize call control primitives that trigger special behavior 1361 (such as an INVITE with Replaces or Join semantics), the receiving 1362 user agent may have trouble finding appropriate credentials with 1363 which to challenge or authorize the request, as the sender may be 1364 completely unknown to the receiver, except through the introduction 1365 of a third party. These credentials need to be passed transitively 1366 in some way or fetched in an event body, for example. 1368 5. IANA Considerations 1370 This document required no action by IANA. 1372 6. Appendix A: Example Features 1374 Primitives are defined in terms of their ability to provide features. 1375 These example features should require an amply robust set of services 1376 to demonstrate a useful set of primitives. They are described here 1377 briefly. Note that the descriptions of these features are non- 1378 normative. Some of these features are used as examples in section 6 1379 to demonstrate how some features may require certain media 1380 relationships. Note also that this document describes a mixture of 1381 both features originating in the world of telephones, and features 1382 that are clearly Internet oriented. 1384 Example Feature Definitions: 1386 Attended Transfer - The transferring party establishes a session with 1387 the transfer target before completing the transfer. 1389 Auto Answer - Calls to a certain address or location answer 1390 immediately via a speakerphone. 1392 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 1393 like Bob to call her automatically when he is available. When Bob 1394 hangs up, Alice's phone rings. When Alice answers, Bob's phone 1395 rings. Bob answers and they talk. 1397 Barge-in - Carol interrupts Alice who has a call in-progress call 1398 with Bob. In some variations, Alice forcibly joins a new conversation 1399 with Carol, in other variations, all three parties are placed in the 1400 same conversation (basically a 3-way conference). 1402 Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob 1403 to contact Carol, but makes no attempt to contact Carol 1404 independently. In many implementations, Alice does not verify Bob's 1405 success or failure in contacting Carol. 1407 Call Forwarding - Before a dialog is accepted it is redirected to 1408 another location, for example, because the originally intended 1409 recipient is busy, does not answer, is disconnected from the network, 1410 configured all requests to go somewhere else. 1412 Call Monitoring - A call center supervisor joins an in-progress call 1413 for monitoring purposes. 1415 Call Park - A call participant parks a call (essentially puts the 1416 call on hold), and then retrieves it at a later time (typically from 1417 another location). 1419 Call Pickup - A party picks up a call that was ringing at another 1420 location. One variation allows the caller to choose which location, 1421 another variation just picks up any call in that user's "pickup 1422 group". 1424 Call Return - Alice calls Bob. Bob misses the call or is disconnected 1425 before he is finished talking to Alice. Bob invokes Call return that 1426 calls Alice, even if Alice did not provide her real identity or 1427 location to Bob. 1429 Call Waiting - Alice is in a call, then receives another call. Alice 1430 can place the first call on hold, and talk with the other caller. 1431 She can typically switch back and forth between the callers. 1433 Click-to-dial - Alice looks in her company directory for Bob. When 1434 she finds Bob, she clicks on a URI to call him. Her phone rings (or 1435 possibly answers automatically), and when she answers, Bob's phone 1436 rings. 1438 Conference Call - Three or more active, visible participants in the 1439 same conversation space. 1441 Consultative transfer - the transferring party establishes a session 1442 with the target and mixes both sessions together so that all three 1443 parties can participate, then disconnects leaving the transferee and 1444 transfer target with an active session. 1446 Distinctive ring - Incoming calls have different ring cadences or 1447 sample sounds depending on the From party, the To party, or other 1448 factors. 1450 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 1451 her either ring briefly or not at all and are forwarded elsewhere. 1452 Some variations allow specially authorized callers to override this 1453 feature and ring Alice anyway. 1455 Find-Me - Alice sets up complicated rules for how she can be reached 1456 (possibly using CPL (Lennox, J., Wu, X., and H. Schulzrinne, "Call 1457 Processing Language (CPL): A Language for User Control of Internet 1458 Telephony Services," October 2004.) [27], presence (Rosenberg, J., "A 1459 Presence Event Package for the Session Initiation Protocol (SIP)," 1460 August 2004.) [14], or other factors). When Bob calls Alice, his 1461 call is eventually routed to a temporary Contact where Alice happens 1462 to be available. 1464 Hotline - Alice picks up a phone and is immediately connected to the 1465 technical support hotline, for example. 1467 IM Conference Alerts: A user receives an notification as an Instant 1468 Message whenever someone joins a conference they are also in. 1470 Inbound Call Screening - Alice doesn't want to receive calls from 1471 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1472 some variations this works even if Matt hides his identity. 1474 Intercom - Alice typically presses a button on a phone that 1475 immediately connects to another user or phone and causes that phone 1476 to play her voice over its speaker. Some variations immediately 1477 setup two-way communications, other variations require another button 1478 to be pressed to enable a two-way conversation. 1480 Message Waiting - Bob calls Alice when she steps away from her phone, 1481 when she returns a visible or audible indicator conveys that someone 1482 has left her a voicemail message. The message waiting indication may 1483 also convey how many messages are waiting, from whom, what time, and 1484 other useful pieces of information. 1486 Music on Hold - When Alice places a call with Bob on hold, it 1487 replaces its audio with streaming content such as music, 1488 announcements, or advertisements. 1490 Outbound Call Screening - Alice is paged and unknowingly calls a PSTN 1491 pay-service telephone number in the Caribbean, but local policy 1492 blocks her call, and possibly informs her why. 1494 Pre-paid calling - Alice pays for a certain currency or unit amount 1495 of calling value. When she places a call, she provides her account 1496 number somehow. If her account runs out of calling value during a 1497 call her call is disconnected or redirected to a service where she 1498 can purchase more calling value. 1500 Presence-Enabled Conferencing: Alice wants to set up a conference 1501 call with Bob and Cathy when they all happen to be available (rather 1502 than scheduling a predefined time). The server providing the 1503 application monitors their status, and calls all three when they are 1504 all "online", not idle, and not in another call. 1506 Single Line Extension/Multiple Line Appearance -- A group of phones 1507 are all treated as "extensions" of a single line. A call for one 1508 rings them all. As soon as one answers, the others stop ringing. If 1509 any extension is actively in a conversation, another extension can 1510 "pick up" and immediately join the conversation. This emulates the 1511 behavior of a home telephone line with multiple phones. 1513 Speakerphone paging - Alice calls the paging address and speaks. Her 1514 voice is played on the speaker of every idle phone in a preconfigured 1515 group of phones. 1517 Speed dial - Alice dials an abbreviated number, or enters an alias, 1518 or presses a special speed dial button representing Bob. Her action 1519 is interpreted as if she specified the full address of Bob. 1521 Voice message screening - Bob calls Alice. Alice is screening her 1522 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1523 leave his message. If she decides to talk to Bob, she can take the 1524 call back from the voicemail system, otherwise she can let Bob leave 1525 a message. This emulates the behavior of a home telephone answering 1526 machine 1528 Voice Portal - A service that allows users to access a portal site 1529 using spoken dialog interaction. For example, Alice needs to 1530 schedule a working dinner with her co-worker Carol. Alice uses a 1531 voice portal to check Carol's flight schedule, find a restaurant near 1532 her hotel, make a reservation, get directions there, and page Carol 1533 with this information. 1535 Whispered call waiting - Alice is in a conversation with Bob. Carol 1536 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1537 get lunch in 15 minutes?"), or an automaton whispers to Alice 1538 informing her that Carol is trying to reach her. 1540 6.1. Implementation of these features 1542 Example Features: 1544 Attended Transfer [18] 1545 Auto Answer [28] 1546 Automatic Callback Two person presence-based conference 1547 Barge-in Section 6.1.1 1548 Blind Transfer [18] 1549 Call Forwarding Proxy or Local implementation 1550 Call Hold [6] 1551 Call Monitoring Section 6.1.2 1552 Call Park Section 6.1.3, [6] 1553 Call Pickup Section 6.1.4, [6] 1554 Call Return Proxy feature 1555 Call Waiting Local Implementation 1556 Click-to-dial Section 6.1.5, [6] 1557 Conference Call [19] 1558 Presence-based 1559 Conferencing [19], [14] 1560 Consultative transfer [18] 1561 Distinctive ring Section 6.1.6, Proxy or Local implementation 1562 Do Not Disturb [14] 1563 Find-Me Proxy service based on presence 1564 Hotline Local Implementation 1565 IM Conference Alerts Subscribe to conference status 1566 Inbound Call Screening Proxy or Local implementation 1567 Intercom Section 6.1.7, [28] 1568 Message Waiting [29] 1569 Multiple Appearances Section 6.1.10 1570 Music on Hold Section 6.1.8, [6] 1571 Outbound Call Screening Proxy feature 1572 Pre-Paid Calling Section 6.1.9 1573 Single Line Extension Section 6.1.10 1574 Speakerphone paging Section 6.1.11, Speed dial + Auto Answer 1575 Speed dial Local Implementation 1576 Voice Message Screening Section 6.1.12 1577 Voice Portal Section 6.1.13 1578 Whispered call waiting Local implementation 1580 6.1.1. Barge-in 1582 Barge-in works the same as call monitoring except that it must 1583 indicate that the send media stream to be mixed so that all of the 1584 other parties can hear the stream from UA barging in. 1586 6.1.2. Call Monitoring 1588 Call monitoring is a Join operation. The monitoring UA sends a Join 1589 to the dialog it wants to listen to. It is able to discover the 1590 dialog via the dialog state on the monitored UA. The monitoring UA 1591 sends SDP in the INVITE that indicates receive only media. As the UA 1592 is monitoring only it does not matter whether the UA indicates it 1593 wishes the send stream be mix or point to point. 1595 6.1.3. Call Park 1597 Call park requires the ability to: put a dialog some place, advertise 1598 it to users in a pickup group and to uniquely identify it in a means 1599 that can be communicated (including human voice). The dialog can be 1600 held locally on the UA parking the dialog or alternatively 1601 transferred to the park service for the pickup group. The parked 1602 dialog then needs to be labeled (e.g. orbit 12) in a way that can be 1603 communicated to the party that is to pick up the call. The UAs in 1604 the pick up group discovers the parked dialog(s) via the dialog 1605 package from the park service. If the dialog is parked locally the 1606 park service merely aggregates the parked call states from the set of 1607 UAs in the pickup up group. 1609 6.1.4. Call Pickup 1611 There are two different features that are called call pickup. The 1612 first is the pickup of a parked dialog. The UA from which the dialog 1613 is to be picked up subscribes to the dialog state of the park service 1614 or the UA that has locally parked the dialog. Dialogs that are 1615 parked should be labeled with an identifier. The labels are used by 1616 the UA to allow the user to indicate which dialog is to be picked up. 1617 The UA picking up the call invoked the URI in the call state that is 1618 labeled as replace-remote. 1620 The other call pickup feature involves picking up an early dialog 1621 (typically ringing). This feature uses some of the same primitives 1622 as the pick up of a parked call. The call state of the UA ringing 1623 phone is advertised using the dialog package. The UA that is to 1624 pickup the early dialog subscribes either directly to the ringing UA 1625 or to a service aggregating the states for UAs in the pickup group. 1626 The call state identifies early dialogs. The UA uses the call 1627 state(s) to help the user choose which early dialog that is to be 1628 picked up. The UA then invokes the URI in the call state labeled as 1629 replace-remote. 1631 6.1.5. Click-to-dial 1633 The application or server that hosts the click-to-dial application 1634 captures the URI to be dialed and can setup the call using 3pcc or 1635 can send a REFER request to the UA that is to dial the address. As 1636 users sometimes change their mind or wish to give up listing to a 1637 ringing or voicemail answered phone, this application illustrates the 1638 need to also have the ability to remotely hangup a call. 1640 6.1.6. Distinctive ring 1642 The target UA either makes a local decision based on information in 1643 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1644 Alert-Info header provided by the caller or inserted by a trusted 1645 proxy. In the latter case, the UA fetches the content described in 1646 the URI (typically via http) and renders it to the user. 1648 6.1.7. Intercom 1650 The UA initiates a dialog using INVITE and the Answer-Mode: Auto 1651 header field as described in [28]. The called UA accepts the INVITE 1652 with a 200 OK and automatically enables the speakerphone. 1654 Alternatively this can be a local decision for the UA to answer based 1655 upon called party identification. 1657 6.1.8. Music on Hold 1659 Music on hold can be implemented a number of ways. One way is to 1660 transfer the held call to a holding service. When the UA wishes to 1661 take the call off hold it basically performs a take on the call from 1662 the holding service. This involves subscribing to call state on the 1663 holding service and then invoking the URI in the call state labeled 1664 as replace-remote. 1666 Alternatively music on hold can be performed as a local mixing 1667 operation. The UA holding the call can mix in the music from the 1668 music service via RTP (i.e. an additional dialog) or RTSP or other 1669 streaming media source. This approach is simpler (i.e. the held 1670 dialog does not move so there is less chance of loosing them) from a 1671 protocol perspective, however it does use more LAN bandwidth and 1672 resources on the UA. 1674 6.1.9. Pre-paid calling 1676 For prepaid calling, the user's media always passes through a device 1677 that is trusted by the pre-paid provider. This may be the other 1678 endpoint (for example a PSTN gateway). In either case, an 1679 intermediary proxy or B2BUA can periodically verify the amount of 1680 time available on the pre-paid account, and use the session-timer 1681 extension to cause the trusted endpoint (gateway) or intermediary 1682 (media relay) to send a reINVITE before that time runs out. During 1683 the reINVITE, the SIP intermediary can re-verify the account and 1684 insert another session-timer header. 1686 Note that while most pre-paid systems on the PSTN use an IVR to 1687 collect the account number and destination, this isn't strictly 1688 necessary for a SIP-originated prepaid call. SIP requests and SIP 1689 URIs are sufficiently expressive to convey the final destination, the 1690 provider of the prepaid service, the location from which the user is 1691 calling, and the prepaid account they want to use. If a pre-paid IVR 1692 is used, the mechanism described below (Voice Portals) can be 1693 combined as well. 1695 6.1.10. Single Line Extension/Multiple Line Appearance 1697 Incoming calls ring all the extensions through basic parallel 1698 forking. Each extension subscribes to dialog events from each other 1699 extension. While one user has an active call, any other UA extension 1700 can insert itself into that conversation (it already knows the dialog 1701 information) in the same way as barge-in. 1703 Standardization work to allow line appearance numbers to be 1704 coordinated across a group of UAs is currently underway. 1706 6.1.11. Speakerphone paging 1708 Speakerphone paging can be implemented using either multicast or 1709 through a simple multipoint mixer. In the multicast solution the 1710 paging UA sends a multicast INVITE with send only media in the SDP 1711 (see also RFC3264). The automatic answer and enabling of the 1712 speakerphone is a locally configured decision on the paged UAs. The 1713 paging UA sends RTP via the multicast address indicated in the SDP. 1715 The multipoint solution is accomplished by sending an INVITE to the 1716 multipoint mixer. The mixer is configured to automatically answer 1717 the dialog. The paging UA then sends REFER requests for each of the 1718 UAs that are to become paging speakers (The UA is likely to send out 1719 a single REFER that is parallel forked by the proxy server). The UAs 1720 performing as paging speakers are configured to automatically answer 1721 based upon caller identification (e.g. To field, URI or Referred-To 1722 headers). 1724 Finally as a third option, the user agent can send a mass-invitation 1725 request to a conference server, which would create a conference and 1726 send INVITEs containing the Answer-Mode: Auto header field to all 1727 user agents in the paging group. 1729 6.1.12. Voice message screening 1731 At first, this is the same as call monitoring. In this case the 1732 voicemail service is one of the UAs. The UA screening the message 1733 monitors the call on the voicemail service, and also subscribes to 1734 dialog information. If the user screening their messages decides to 1735 answer, they perform a Take from the voicemail system (for example, 1736 send an INVITE with Replaces to the UA leaving the message) 1738 6.1.13. Voice Portal 1740 A voice portal is essentially a complex collection of voice dialogs 1741 used to access interesting content. One of the most desirable call 1742 control features of a Voice Portal is the ability to start a new 1743 outgoing call from within the context of the Portal (to make a 1744 restaurant reservation, or return a voicemail message for example). 1745 Once the new call is over, the user should be able to return to the 1746 Portal by pressing a special key, using some DTMF sequence (ex: a 1747 very long pound or hash tone), or by speaking a hotword (ex: "Main 1748 Menu"). 1750 In order to accomplish this, the Voice Portal starts with the 1751 following media relationship: 1753 { User , Voice Portal } 1755 The user then asks to make an outgoing call. The Voice Portal asks 1756 the User to perform a Far-Fork. In other words the Voice Portal 1757 wants the following media relationship: 1759 { Target , User } & { User , Voice Portal } 1761 The Voice Portal is now just listening for a hotword or the 1762 appropriate DTMF. As soon as the user indicates they are done, the 1763 Voice Portal takes the call from the old Target, and we are back to 1764 the original media relationship. 1766 This feature can also be used by the account number and phone number 1767 collection menu in a pre-paid calling service. A user can press a 1768 DTMF sequence that presents them with the appropriate menu again. 1770 7. Acknowledgements 1772 Thanks to AC Mahendran, John Elwell, and Xavier Marjou for their 1773 detailed Working Group review of the document. 1775 8. Informative References 1777 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1778 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 1779 Session Initiation Protocol", RFC 3261, June 2002. 1781 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1782 Levels", BCP 14, RFC 2119, March 1997. 1784 [3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1785 Session Description Protocol (SDP)", RFC 3264, June 2002. 1787 [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1788 Notification", RFC 3265, June 2002. 1790 [5] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1791 Description Protocol", RFC 4566, July 2006. 1793 [6] Johnston, A., "Session Initiation Protocol Service Examples", 1794 draft-ietf-sipping-service-examples-13 (work in progress), 1795 July 2007. 1797 [7] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, 1798 "Best Current Practices for Third Party Call Control (3pcc) in 1799 the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, 1800 April 2004. 1802 [8] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1803 Method", RFC 3515, April 2003. 1805 [9] Mahy, R., Biggs, B., and R. Dean, "The Session Initiation 1806 Protocol (SIP) "Replaces" Header", RFC 3891, September 2004. 1808 [10] Mahy, R. and D. Petrie, "The Session Initiation Protocol (SIP) 1809 "Join" Header", RFC 3911, October 2004. 1811 [11] Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE- 1812 Initiated Dialog Event Package for the Session Initiation 1813 Protocol (SIP)", RFC 4235, November 2005. 1815 [12] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 1816 Initiation Protocol (SIP) Event Package for Conference State", 1817 RFC 4575, August 2006. 1819 [13] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1820 Package for Registrations", RFC 3680, March 2004. 1822 [14] Rosenberg, J., "A Presence Event Package for the Session 1823 Initiation Protocol (SIP)", RFC 3856, August 2004. 1825 [15] Rosenberg, J., "A Framework for Conferencing with the Session 1826 Initiation Protocol (SIP)", RFC 4353, February 2006. 1828 [16] Rosenberg, J., "A Framework for Application Interaction in the 1829 Session Initiation Protocol (SIP)", 1830 draft-ietf-sipping-app-interaction-framework-05 (work in 1831 progress), July 2005. 1833 [17] Camarillo, G., "Framework for Transcoding with the Session 1834 Initiation Protocol (SIP)", 1835 draft-ietf-sipping-transc-framework-05 (work in progress), 1836 December 2006. 1838 [18] Sparks, R., "Session Initiation Protocol Call Control - 1839 Transfer", draft-ietf-sipping-cc-transfer-08 (work in 1840 progress), July 2007. 1842 [19] Johnston, A. and O. Levin, "Session Initiation Protocol (SIP) 1843 Call Control - Conferencing for User Agents", BCP 119, 1844 RFC 4579, August 2006. 1846 [20] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating 1847 User Agent Capabilities in the Session Initiation Protocol 1848 (SIP)", RFC 3840, August 2004. 1850 [21] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1851 Preferences for the Session Initiation Protocol (SIP)", 1852 RFC 3841, August 2004. 1854 [22] Campbell, B. and R. Sparks, "Control of Service Context using 1855 SIP Request-URI", RFC 3087, April 2001. 1857 [23] Jennings, C. and R. Mahy, "Remote Call Control in the Session 1858 Initiation Protocol (SIP) using the REFER method and the 1859 session-oriented dialog package", draft-mahy-sip-remote-cc-05 1860 (work in progress), March 2007. 1862 [24] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media 1863 Services with SIP", RFC 4240, December 2005. 1865 [25] Jennings, C., Audet, F., and J. Elwell, "Session Initiation 1866 Protocol (SIP) URIs for Applications such as Voicemail and 1867 Interactive Voice Response (IVR)", RFC 4458, April 2006. 1869 [26] Rosenberg, J., "Request Authorization through Dialog 1870 Identification in the Session Initiation Protocol (SIP)", 1871 RFC 4538, June 2006. 1873 [27] Lennox, J., Wu, X., and H. Schulzrinne, "Call Processing 1874 Language (CPL): A Language for User Control of Internet 1875 Telephony Services", RFC 3880, October 2004. 1877 [28] Willis, D. and A. Allen, "Requesting Answering Modes for the 1878 Session Initiation Protocol (SIP)", 1879 draft-ietf-sip-answermode-06 (work in progress), 1880 September 2007. 1882 [29] Mahy, R., "A Message Summary and Message Waiting Indication 1883 Event Package for the Session Initiation Protocol (SIP)", 1884 RFC 3842, August 2004. 1886 Authors' Addresses 1888 Rohan Mahy 1889 Plantronics 1890 345 Encincal Street 1891 Santa Cruz, CA 1892 USA 1894 Email: rohan@ekabal.com 1896 Ben Campbell 1897 Estacado Systems 1899 Email: ben@nostrum.com 1901 Robert Sparks 1902 Estacado Systems 1904 Email: rjsparks@nostrum.com 1906 Jonathan Rosenberg 1907 Cisco Systems 1909 Email: jdrosen@cisco.com 1911 Dan Petrie 1912 SIP EZ 1914 Email: dpetrie@sipez.com 1915 Alan Johnston (editor) 1916 Avaya 1918 Email: alan@sipstation.com 1920 Full Copyright Statement 1922 Copyright (C) The IETF Trust (2007). 1924 This document is subject to the rights, licenses and restrictions 1925 contained in BCP 78, and except as set forth therein, the authors 1926 retain all their rights. 1928 This document and the information contained herein are provided on an 1929 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1930 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1931 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1932 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1933 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1934 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1936 Intellectual Property 1938 The IETF takes no position regarding the validity or scope of any 1939 Intellectual Property Rights or other rights that might be claimed to 1940 pertain to the implementation or use of the technology described in 1941 this document or the extent to which any license under such rights 1942 might or might not be available; nor does it represent that it has 1943 made any independent effort to identify any such rights. Information 1944 on the procedures with respect to rights in RFC documents can be 1945 found in BCP 78 and BCP 79. 1947 Copies of IPR disclosures made to the IETF Secretariat and any 1948 assurances of licenses to be made available, or the result of an 1949 attempt made to obtain a general license or permission for the use of 1950 such proprietary rights by implementers or users of this 1951 specification can be obtained from the IETF on-line IPR repository at 1952 http://www.ietf.org/ipr. 1954 The IETF invites any interested party to bring to its attention any 1955 copyrights, patents or patent applications, or other proprietary 1956 rights that may cover technology that may be required to implement 1957 this standard. Please address the information to the IETF at 1958 ietf-ipr@ietf.org. 1960 Acknowledgment 1962 Funding for the RFC Editor function is provided by the IETF 1963 Administrative Support Activity (IASA).