idnits 2.17.1 draft-ietf-sipping-cc-framework-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 23. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1931. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1942. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1949. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1955. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 28, 2007) is 5994 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'JTAPI' on line 230 -- Looks like a reference, but probably isn't: 'CSTA' on line 231 -- Looks like a reference, but probably isn't: 'VoiceXML' on line 724 == Unused Reference: '2' is defined on line 1782, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 3265 (ref. '4') (Obsoleted by RFC 6665) -- Obsolete informational reference (is this intentional?): RFC 4566 (ref. '5') (Obsoleted by RFC 8866) == Outdated reference: A later version (-15) exists of draft-ietf-sipping-service-examples-13 == Outdated reference: A later version (-12) exists of draft-ietf-sipping-cc-transfer-08 == Outdated reference: A later version (-07) exists of draft-ietf-sip-answermode-06 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING WG R. Mahy 3 Internet-Draft Plantronics 4 Intended status: Informational R. Sparks 5 Expires: May 31, 2008 Estacado Systems 6 J. Rosenberg 7 Cisco Systems 8 D. Petrie 9 SIP EZ 10 A. Johnston, Ed. 11 Avaya 12 November 28, 2007 14 A Call Control and Multi-party usage framework for the Session 15 Initiation Protocol (SIP) 16 draft-ietf-sipping-cc-framework-09 18 Status of this Memo 20 By submitting this Internet-Draft, each author represents that any 21 applicable patent or other IPR claims of which he or she is aware 22 have been or will be disclosed, and any of which he or she becomes 23 aware will be disclosed, in accordance with Section 6 of BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet- 28 Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on May 31, 2008. 43 Copyright Notice 45 Copyright (C) The IETF Trust (2007). 47 Abstract 49 This document defines a framework and requirements for multi-party 50 usage of SIP. To enable discussion of multi-party features and 51 applications we define an abstract call model for describing the 52 media relationships required by many of these. The model and actions 53 described here are specifically chosen to be independent of the SIP 54 signaling and/or mixing approach chosen to actually setup the media 55 relationships. In addition to its dialog manipulation aspect, this 56 framework includes requirements for communicating related information 57 and events such as conference and session state, and session history. 58 This framework also describes other goals that embody the spirit of 59 SIP applications as used on the Internet. 61 Table of Contents 63 1. Motivation and Background . . . . . . . . . . . . . . . . . . 4 64 2. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 2.1. "Conversation Space" Model . . . . . . . . . . . . . . . . 6 66 2.2. Relationship Between Conversation Space, SIP Dialogs, 67 and SIP Sessions . . . . . . . . . . . . . . . . . . . . . 7 68 2.3. Signaling Models . . . . . . . . . . . . . . . . . . . . . 8 69 2.4. Mixing Models . . . . . . . . . . . . . . . . . . . . . . 9 70 2.4.1. Tightly Coupled . . . . . . . . . . . . . . . . . . . 10 71 2.4.2. Loosely Coupled . . . . . . . . . . . . . . . . . . . 11 72 2.5. Conveying Information and Events . . . . . . . . . . . . . 12 73 2.6. Componentization and Decomposition . . . . . . . . . . . . 14 74 2.6.1. Media Intermediaries . . . . . . . . . . . . . . . . . 14 75 2.6.2. Mixer . . . . . . . . . . . . . . . . . . . . . . . . 14 76 2.6.3. Transcoder . . . . . . . . . . . . . . . . . . . . . . 15 77 2.6.4. Media Relay . . . . . . . . . . . . . . . . . . . . . 15 78 2.6.5. Queue Server . . . . . . . . . . . . . . . . . . . . . 15 79 2.6.6. Parking Place . . . . . . . . . . . . . . . . . . . . 15 80 2.6.7. Announcements and Voice Dialogs . . . . . . . . . . . 15 81 2.7. Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 17 82 2.7.1. Naming Users in SIP . . . . . . . . . . . . . . . . . 18 83 2.7.2. Naming Services with SIP URIs . . . . . . . . . . . . 19 84 2.8. Invoker Independence . . . . . . . . . . . . . . . . . . . 21 85 2.9. Billing issues . . . . . . . . . . . . . . . . . . . . . . 21 86 3. Catalog of call control actions and sample features . . . . . 22 87 3.1. Remote Call Control Actions on Early Dialogs . . . . . . . 22 88 3.1.1. Remote Answer . . . . . . . . . . . . . . . . . . . . 22 89 3.1.2. Remote Forward or Put . . . . . . . . . . . . . . . . 23 90 3.1.3. Remote Busy or Error Out . . . . . . . . . . . . . . . 23 91 3.2. Remote Call Control Actions on Single Dialogs . . . . . . 23 92 3.2.1. Remote Dial . . . . . . . . . . . . . . . . . . . . . 23 93 3.2.2. Remote On and Off Hold . . . . . . . . . . . . . . . . 23 94 3.2.3. Remote Hangup . . . . . . . . . . . . . . . . . . . . 23 95 3.3. Call Control Actions on Multiple Dialogs . . . . . . . . . 24 96 3.3.1. Transfer . . . . . . . . . . . . . . . . . . . . . . . 24 97 3.3.2. Take . . . . . . . . . . . . . . . . . . . . . . . . . 25 98 3.3.3. Add . . . . . . . . . . . . . . . . . . . . . . . . . 25 99 3.3.4. Local Join . . . . . . . . . . . . . . . . . . . . . . 26 100 3.3.5. Insert . . . . . . . . . . . . . . . . . . . . . . . . 27 101 3.3.6. Split . . . . . . . . . . . . . . . . . . . . . . . . 27 102 3.3.7. Near-fork . . . . . . . . . . . . . . . . . . . . . . 28 103 3.3.8. Far fork . . . . . . . . . . . . . . . . . . . . . . . 28 104 4. Security Considerations . . . . . . . . . . . . . . . . . . . 29 105 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 106 6. Appendix A: Example Features . . . . . . . . . . . . . . . . . 30 107 6.1. Implementation of these features . . . . . . . . . . . . . 33 108 6.1.1. Barge-in . . . . . . . . . . . . . . . . . . . . . . . 34 109 6.1.2. Call Monitoring . . . . . . . . . . . . . . . . . . . 34 110 6.1.3. Call Park . . . . . . . . . . . . . . . . . . . . . . 35 111 6.1.4. Call Pickup . . . . . . . . . . . . . . . . . . . . . 35 112 6.1.5. Click-to-dial . . . . . . . . . . . . . . . . . . . . 35 113 6.1.6. Distinctive ring . . . . . . . . . . . . . . . . . . . 36 114 6.1.7. Intercom . . . . . . . . . . . . . . . . . . . . . . . 36 115 6.1.8. Music on Hold . . . . . . . . . . . . . . . . . . . . 36 116 6.1.9. Pre-paid calling . . . . . . . . . . . . . . . . . . . 36 117 6.1.10. Single Line Extension/Multiple Line Appearance . . . . 37 118 6.1.11. Speakerphone paging . . . . . . . . . . . . . . . . . 37 119 6.1.12. Voice message screening . . . . . . . . . . . . . . . 37 120 6.1.13. Voice Portal . . . . . . . . . . . . . . . . . . . . . 38 121 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 38 122 8. Informative References . . . . . . . . . . . . . . . . . . . . 38 123 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 124 Intellectual Property and Copyright Statements . . . . . . . . . . 42 126 1. Motivation and Background 128 The Session Initiation Protocol [1] (SIP) was defined for the 129 initiation, maintenance, and termination of sessions or calls between 130 one or more users. However, despite its origins as a large-scale 131 multiparty conferencing protocol, SIP is used today primarily for 132 point to point calls. This two-party configuration is the focus of 133 the SIP specification and most of its extensions. 135 This document defines a framework and requirements for multi-party 136 usage of SIP. Most multi-party operations manipulate SIP dialogs 137 (also known as call legs) or SIP conference media policy to cause 138 participants in a conversation to perceive specific media 139 relationships. In other protocols that deal with the concept of 140 calls, this manipulation is known as call control. In addition to 141 its dialog or policy manipulation aspect, "call control" also 142 includes communicating information and events related to manipulating 143 calls, including information and events dealing with session state 144 and history, conference state, user state, and even message state. 146 Based on input from the SIP community, the authors compiled the 147 following set of goals for SIP call control and multiparty 148 applications: 149 o Define Primitives, Not Services. Allow for a handful of robust 150 yet simple mechanisms that can be combined to deliver features and 151 services. Throughout this document we refer to these simple 152 mechanisms as "primitives". Primitives should be sufficiently 153 robust so that when they are combined with eachother, they can be 154 used to build lots of services. However, the goal is not to 155 define a provably complete set of primitives. Note that while the 156 IETF will NOT standardize behavior or services, it may define 157 example services for informational purposes, as in service 158 examples [6]. 159 o Participant oriented. The primitives should be designed to 160 provide services that are oriented around the experience of the 161 participants. The authors observe that end users of features and 162 services usually don't care how a media relationship is setup. 163 Their ultimate experience is based only on the resulting media and 164 other externally visible characteristics. 165 o Signaling Model independent: Support both a central control and a 166 peer-to-peer feature invocation model (and combinations of the 167 two). Baseline SIP already supports a centralized control model 168 described in 3pcc [7], and the SIP community has expressed a great 169 deal of interest in peer-to-peer or distributed call control using 170 primitives such as those defined in REFER [8], Replaces [9], and 171 Join [10]. 173 o Mixing Model independent: The bulk of interesting multiparty 174 applications involve mixing or combining media from multiple 175 participants. This mixing can be performed by one or more of the 176 participants, or by a centralized mixing resource. The experience 177 of the participants should not depend on the mixing model used. 178 While most examples in this document refer to audio mixing, the 179 framework applies to any media type. In this context a "mixer" 180 refers to combining media of the same type in an appropriate, 181 media-specific way. This is consistent with model described in 182 the SIP conferencing framework. 183 o Invoker oriented. Only the user who invokes a feature or a 184 service needs to know exactly which service is invoked or why. 185 This is good because it allows new services to be created without 186 requiring new primitives from all the participants; and it allows 187 for much simpler feature authorization policies, for example, when 188 participation spans organizational boundaries. As discussed in 189 section 3.8, this also avoids exponential state explosion when 190 combining features. The invoker only has to manage a user 191 interface or API to prevent local feature interactions. All the 192 other participants simply need to manage the feature interactions 193 of a much smaller number of primitives. 194 o Primitives make full use of URIs. URIs are a very powerful 195 mechanism for describing users and services. They represent a 196 plentiful resource that can be extremely expressive and easily 197 routed, translated, and manipulated--even across organizational 198 boundaries. URIs can contain special parameters and informational 199 headers that need only be relevant to the owner of the namespace 200 (domain) of the URI. Just as a user who selects an http: URL need 201 not understand the significance and organization of the web site 202 it references, a user may encounter a SIP URI that translates into 203 an email-style group alias, that plays a pre-recorded message, or 204 runs some complex call-handling logic. Note that while this may 205 seem paradoxical to the previous goal, both goals can be satisfied 206 by the same model. 207 o Make use of SIP headers and SIP event packages to provide SIP 208 entities with information about their environment. These should 209 include information about the status / handling of dialogs on 210 other user agents, information about the history of other contacts 211 attempted prior to the current contact, the status of 212 participants, the status of conferences, user presence 213 information, and the status of messages. 214 o Encourage service decomposition, and design to make use of 215 standard components using well-defined, simple interfaces. Sample 216 components include a SIP mixer, recording service, announcement 217 server, and voice dialog server. (This is not an exhaustive 218 list). 220 o Include authentication, authorization, policy, logging, and 221 accounting mechanisms to allow these primitives to be used safely 222 among mutually untrusted participants. Some of these mechanisms 223 may be used to assist in billing, but no specific billing system 224 will be endorsed. 225 o Permit graceful fallback to baseline SIP. Definitions for new SIP 226 call control extensions/primitives must describe a graceful way to 227 fallback to baseline SIP behavior. Support for one primitive must 228 not imply support for another primitive. 229 o There is no desire or goal to reinvent traditional models, such as 230 the model used the [H.450] family of protocols, [JTAPI], or the 231 [CSTA] call model, as these other models do not share the design 232 goals presented in this document. 234 2. Key Concepts 236 This section introduces a number of key concepts which will be used 237 to describe and explain various call control operations and services 238 in the remainder of this document. This includes the conversation 239 space model, signaling and mixing models, common components, and the 240 use of URIs. 242 2.1. "Conversation Space" Model 244 This document introduces the concept of an abstract "conversation 245 space" as a set of participants who believe they are all 246 communicating among one another. Each conversation space contains 247 one or more participants. 249 Participants are SIP User Agents that send original media to or 250 terminate and receive media from other members of the conversation 251 space. Logically, every participant in the conversation space has 252 access to all the media generated in that space (this is strictly 253 true if all participants share a common media type). A SIP User 254 Agent that does not contribute or consume any media is NOT a 255 participant; nor is a user agent that merely forwards, transcoders, 256 mixes, or selects media originating elsewhere in the conversation 257 space. [Note that a conversation space consists of zero or more SIP 258 calls or SIP conferences. A conversation space is similar to the 259 definition of a "call" in some other call models.] 261 Participants may represent human users or non-human users (referred 262 to as robots or automatons in this document). Some participants may 263 be hidden within a conversation space. Some examples of hidden 264 participants include: robots that generate tones, images, or 265 announcements during a conference to announce users arriving and 266 departing, a human call center supervisor monitoring a conversation 267 between a trainee and a customer, and robots that record media for 268 training or archival purposes. 270 Participants may also be active or passive. Active participants are 271 expected to be intelligent enough to leave a conversation space when 272 they no longer desire to participate. (An attentive human 273 participant is obviously active.) Some robotic participants (such as 274 a voice messaging system, an instant messaging agent, or a voice 275 dialog system) may be active participants if they can leave the 276 conversation space when there is no human interaction. Other robots 277 (for example our tone generating robot from the previous example) are 278 passive participants. A human participant "on-hold" is passive. 280 An example diagram of a conversation space can be shown as a "bubble" 281 or ovals, or as a "set" in curly or square brace notation. Each set, 282 oval, or "bubble" represents a conversation space. Hidden 283 participants are shown in lowercase letters. 285 Note that while the term "conversation" usually applies to oral 286 exchange of information, we apply the conversation space model to any 287 media exchange between participants. 289 { A , B } [ A , b, C, D ] 291 .-. .---. 292 / \ / \ 293 / A \ / A b \ 294 ( ) ( ) 295 \ B / \ C D / 296 \ / \ / 297 '-' '---' 299 2.2. Relationship Between Conversation Space, SIP Dialogs, and SIP 300 Sessions 302 In SIP, a call is "an informal term that refers to some communication 303 between peers, generally set up for the purposes of a multimedia 304 conversation." Obviously we cannot discuss normative behavior based 305 on such an intentionally vague definition. The concept of a 306 conversation space is needed because the SIP definition of call is 307 not sufficiently precise for the purpose of describing the user 308 experience of multiparty features. 310 Do any other definitions convey the correct meaning? SIP, and SDP 311 [5] both define a conference as "a multimedia session identified by a 312 common session description." A session is defined as "a set of 313 multimedia senders and receivers and the data streams flowing from 314 senders to receivers." Both of these definitions are heavily 315 oriented toward multicast sessions with little differentiation among 316 participants. As such, neither is particularly useful for our 317 purposes. In fact, the definition of "call" in some call models is 318 more similar to our definition of a conversation space. 320 Some examples of the relationship between conversation spaces, SIP 321 dialogs, and SIP sessions are listed below. In each example, a human 322 user will perceive that there is a single call. 323 o A simple two-party call is a single conversation space, a single 324 session, and a single dialog. 325 o A locally mixed three-way call is two sessions and two dialogs. 326 It is also a single conversation space. 327 o A simple dial-in audio conference is a single conversation space, 328 but is represented by as many dialogs and sessions as there are 329 human participants. 330 o A multicast conference is a single conversation space, a single 331 session, and as many dialogs as participants. 333 2.3. Signaling Models 335 Obviously to make changes to a conversation space, you must be able 336 to use SIP signaling to cause these changes. Specifically there must 337 be a way to manipulate SIP dialogs (call legs) to move participants 338 into and out of conversation spaces. Although this is not as 339 obvious, there also must be a way to manipulate SIP dialogs to 340 include non-participant user agents that are otherwise involved in a 341 conversation space (ex: B2BUAs, 3pcc controllers, mixers, 342 transcoders, translators, or relays). 344 Implementations may setup the media relationships described in the 345 conversation space model using a centralized control model. One 346 common way to implement this using SIP is known as 3rd Party Call 347 Control (3pcc) and is described in 3pcc [7]. The 3pcc approach 348 relies on only the following 3 primitive operations: 349 o Create a new dialog (INVITE) 350 o Modify a dialog (reINVITE) 351 o Destroy a dialog (BYE) 353 The main advantage of the 3pcc approach is that it only requires very 354 basic SIP support from end systems to support call control features. 355 As such, third-party call control is a natural way to handle protocol 356 conversion and mid-call features. It also has the advantage and 357 disadvantage that new features can/must be implemented in one place 358 only (the controller), and neither requires enhanced client 359 functionality, nor takes advantage of it. 361 In addition, a peer-to-peer approach is discussed at length in this 362 draft. The primary drawback of the peer-to-peer model is additional 363 complexity in the end system and authentication and management 364 models. The benefits of the peer-to-peer model include: 365 o state remains at the edges 366 o call signaling need only go through participants involved (there 367 are no additional points of failure) 368 o peers can take advantage of end-to-end message integrity or 369 encryption 370 o setup time is shorter (fewer messages are required to be sent by 371 the initiator of the action) 373 The peer-to-peer approach relies on additional "primitive" 374 operations, some of which are identified here. 375 o Replace an existing dialog 376 o Join a new dialog with an existing dialog 377 o Locally perform media forking (multi-unicast) 378 o Ask another UA to send a request on your behalf 380 The peer-to-peer approach also only results in a single SIP dialog, 381 directly between the two UAs. The 3pcc approach results in two SIP 382 dialogs, between each UA and the controller. As a result, the SIP 383 features and extensions that will be used during the dialog are 384 limited to the those understood by the controller. As a result, in a 385 situation where both the UAs support an advanced SIP feature but the 386 controller does not, the feature will not be able to be used. 388 Many of the features, primitives, and actions described in this 389 document also require some type of media mixing, combining, or 390 selection as described in the next section. 392 2.4. Mixing Models 394 SIP permits a variety of mixing models, which are discussed here 395 briefly. This topic is discussed more thoroughly in the SIP 396 conferencing framework [15] and cc-conferencing [19]. SIP supports 397 both tightly-coupled and loosely-coupled conferencing, although more 398 sophisticated behavior is available in tightly-coupled conferences. 399 In a tightly-coupled conference, a single SIP user agent (called the 400 focus) has a direct dialog relationship with each participant (and 401 may control non participant user agents as well). In a loosely- 402 coupled conference there is no coordinated signaling relationships 403 among the participants. 405 For brevity, only the two most popular conferencing models are 406 significantly discussed in this document (local and centralized 407 mixing). Applications of the conversation spaces model to loosely- 408 coupled multicast and distributed full unicast mesh conferences are 409 left as an exercise for the reader. Note that a distributed full 410 mesh conference can be used for basic conferences, but does not 411 easily allow for more complex conferencing actions like splitting, 412 merging, and sidebars. 414 Call control features should be designed to allow a mixer (local or 415 centralized) to decide when to reduce a conference back to a 2-party 416 call, or drop all the participants (for example if only two 417 automatons are communicating). The actual heuristics used to release 418 calls are beyond the scope of this document, but may depend on 419 properties in the conversation space, such as the number of active, 420 passive, or hidden participants; and the send-only, receive-only, or 421 send-and-receive orientation of various participants. 423 2.4.1. Tightly Coupled 425 Tightly coupled conferences utilize a central point for signaling and 426 authentication known as a focus [15]. The actual media can be 427 centrally mixed or distributed. 429 2.4.1.1. (Single) End System Mixing 431 The first model we call "end system mixing". In this model, user A 432 calls user B, and they have a conversation. At some point later, A 433 decides to conference in user C. To do this, A calls C, using a 434 completely separate SIP call. This call uses a different Call-ID, 435 different tags, etc. There is no call set up directly between B and 436 C. No SIP extension or external signaling is needed. A merely 437 decides to locally join two dialogs. 439 B C 440 \ / 441 \ / 442 A 444 A receives media streams from both B and C, and mixes them. A sends 445 a stream containing A's and C's streams to B, and a stream containing 446 A's and B's streams to C. Basically, user A handles both signaling 447 and media mixing. 449 2.4.1.2. Centralized Mixing 451 In a centralized mixing model, all participants have a pairwise SIP 452 and media relationship with the mixer. Common applications of 453 centralized mixing include ad-hoc conferences and scheduled dial-in 454 or dial-out conferences. In the figure below, the mixer M receives 455 and sends media to participants A, B, C, D, and E. 457 B C 458 \ / 459 \ / 460 M --- A 461 / \ 462 / \ 463 D E 465 2.4.1.3. Centralized Signaling, Distributed Media 467 In this conferencing model, there is a centralized controller, as in 468 the dial-in and dial-out cases. However, the centralized server 469 handles signaling only. The media is still sent directly between 470 participants, using either multicast or multi-unicast. Participants 471 perform their own mixing. Multi-unicast is when a user sends 472 multiple packets (one for each recipient, addressed to that 473 recipient). This is referred to as a "Decentralized Multipoint 474 Conference" in [H.323]. Full mesh media with centralized mixing is 475 another approach. 477 2.4.2. Loosely Coupled 479 In these models, there is no point of central control of SIP 480 signaling. As in the "Centralized Signaling, Distributed Media" case 481 above, all endpoints send media to all other endpoints. Consequently 482 every endpoint mixes their own media from all the other sources, and 483 sends their own media to every other participant. 485 2.4.2.1. Large-Scale Multicast Conferences 487 Large-scale multicast conferences were the original motivation for 488 both the Session Description Protocol SDP [5] and SIP. In a large- 489 scale multicast conference, one or more multicast addresses are 490 allocated to the conference. Each participant joins those multicast 491 groups, and sends their media to those groups. Signaling is not sent 492 to the multicast groups. The sole purpose of the signaling is to 493 inform participants of which multicast groups to join. Large-scale 494 multicast conferences are usually pre-arranged, with specific start 495 and stop times. However, multicast conferences do not need to be 496 pre-arranged, so long as a mechanism exists to dynamically obtain a 497 multicast address. 499 2.4.2.2. Full Distributed Unicast Conferencing 501 In this conferencing model, each participant has both a pairwise 502 media relationship and a pairwise signaling relationship with every 503 other participant (a full mesh). This model requires a mechanism to 504 maintain a consistent view of distributed state across the group. 505 This is a classic hard problem in computer science. Also, this model 506 does not scale well for large numbers of participants. because for 507 participants the number of media and signaling relationships is 508 approximately n-squared. As a result, this model is not generally 509 available in commercial implementations; to the contrary it is 510 primarily the topic of research or experimental implementations. 511 Note that this model assumes peer-to-peer signaling. 513 2.5. Conveying Information and Events 515 Participants should have access to information about the other 516 participants in a conversation space, so that this information can be 517 rendered to a human user or processed by an automaton. Although some 518 of this information may be available from the Request-URI or To, 519 From, Contact, or other SIP headers, another mechanism of reporting 520 this information is necessary. 522 Many applications are driven by knowledge about the progress of calls 523 and conferences. In general these types of events allow for the 524 construction of distributed applications, where the application 525 requires information on dialog and conference state, but is not 526 necessarily co-resident with an endpoint user agent or conference 527 server. For example, a focus involved in a conversation space may 528 wish to provide URIs for conference status, and/or conference/floor 529 control. 531 The SIP Events [4] architecture defines general mechanisms for 532 subscription to and notification of events within SIP networks. It 533 introduces the notion of a package that is a specific "instantiation" 534 of the events mechanism for a well-defined set of events. 536 Event packages are needed to provide the status of a user's dialogs, 537 provide the status of conferences and its participants, provide user 538 presence information, provide the status of registrations, and 539 provide the status of user's messages. While this is not an 540 exhaustive list, these are sufficient to enable the sample features 541 described in this document. 543 The conference event package [12] allows users to subscribe to 544 information about an entire tightly-coupled SIP conference. 545 Notifications convey information about the participants such as: the 546 SIP URI identifying each user, their status in the space (active, 547 declined, departed), URIs to invoke other features (such as sidebar 548 conversations), links to other relevant information (such as floor 549 control policies), and if floor control policies are in place, the 550 user's floor control status. For conversation spaces created from 551 cascaded conferences, conversation state can be gathered from 552 relevant foci and merged into a cohesive set of state. 554 The dialog package [11] provides information about all the dialogs 555 the target user is maintaining, what conversations the user in 556 participating in, and how these are correlated. Likewise the 557 registration package [13] provides notifications when contacts have 558 changed for a specific address-of-record. The combination of these 559 allows a user agent to learn about all conversations occurring for 560 the entire registered contact set for an address-of-record. 562 Note that user presence in SIP [14] has a close relationship with 563 these later two event packages. It is fundamental to the presence 564 model that the information used to obtain user presence is 565 constructed from any number of different input sources. Examples of 566 other such sources include calendaring information and uploads of 567 presence documents. These two packages can be considered another 568 mechanism that allows a presence agent to determine the presence 569 state of the user. Specifically, a user presence server can act as a 570 subscriber for the dialog and registration packages to obtain 571 additional information that can be used to construct a presence 572 document. 574 The multi-party architecture may also need to provide a mechanism to 575 get information about the status /handling of a dialog (for example, 576 information about the history of other contacts attempted prior to 577 the current contact). Finally, the architecture should provide ample 578 opportunities to present informational URIs that relate to calls, 579 conversations, or dialogs in some way. For example, consider the SIP 580 Call-Info header, or Contact headers returned in a 300-class 581 response. Frequently additional information about a call or dialog 582 can be fetched via non-SIP URIs. For example, consider a web page 583 for package tracking when calling a delivery company, or a web page 584 with related documentation when joining a dial-in conference. The 585 use of URIs in the multiparty framework is discussed in more detail 586 in Section 3.7. 588 Finally the interaction of SIP with stimulus-signaling-based 589 applications, that allow a user agent to interact with an application 590 without knowledge of the semantics of that application, is discussed 591 in the SIP application interaction framework [16]. Stimulus 592 signaling can occur to a user interface running locally with the 593 client, or to a remote user interface, through media streams. 594 Stimulus signaling encompasses a wide range of mechanisms, ranging 595 from clicking on hyperlinks, to pressing buttons, to traditional Dual 596 Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling 597 is supported through the use of markup languages, which play a key 598 role in that framework. 600 2.6. Componentization and Decomposition 602 This framework proposes a decomposed component architecture with a 603 very loose coupling of services and components. This means that a 604 service (such as a conferencing server or an auto-attendant) need not 605 be implemented as an actual server. Rather, these services can be 606 built by combining a few basic components in straightforward or 607 arbitrarily complex ways. 609 Since the components are easily deployed on separate boxes, by 610 separate vendors, or even with separate providers, we achieve a 611 separation of function that allows each piece to be developed in 612 complete isolation. We can also reuse existing components for new 613 applications. This allows rapid service creation, and the ability 614 for services to be distributed across organizational domains anywhere 615 in the Internet. 617 For many of these components it is also desirable to discover their 618 capabilities, for example querying the ability of a mixer to host a 619 10 dialog conference, or to reserve resources for a specific time. 620 These actions could be provided in the form of URIs, provided there 621 is an a priori means of understanding their semantics. For example 622 if there is a published dictionary of operations, a way to query the 623 service for the available operations and the associated URIs, the URI 624 can be the interface for providing these service operations. This 625 concept is described in more detail in the context of dialog 626 operations in Section 3. 628 2.6.1. Media Intermediaries 630 Media Intermediaries are not participants in any conversation space, 631 although an entity that is also a media translator may also have a 632 co-located participant component (for example a mixer that also 633 announces the arrival of a new participant; the announcement portion 634 is a participant, but the mixer itself is not). Media intermediaries 635 should be as transparent as possible to the end users--offering a 636 useful, fundamental service; without getting in the way of new 637 features implemented by participants. Some common media 638 intermediaries are described below. 640 2.6.2. Mixer 642 A SIP mixer is a component that combines media from all dialogs in 643 the same conversation in a media specific way. For example, the 644 default combining for an audio conference might be an N-1 645 configuration, while a text mixer might interleave text messages on a 646 per-line basis. More details about how to manipulate the media 647 policy used by mixers is being discussed in the XCON Working Group. 649 2.6.3. Transcoder 651 A transcoder translates media from one encoding or format to another 652 (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to 653 text/plain), or from one media type to another (for example text to 654 speech). A more thorough discussion of transcoding is described in 655 SIP transcoding services invocation [17]. 657 2.6.4. Media Relay 659 A media relay terminates media and simply forwards it to a new 660 destination without changing the content in any way. Sometimes media 661 relays are used to provide source IP address anonymity, to facilitate 662 middlebox traversal, or to provide a trusted entity where media can 663 be forcefully disconnected. 665 2.6.5. Queue Server 667 A queue server is a location where calls can be entered into one of 668 several FIFO (first-in, first-out) queues. A queue server would 669 subscribe to the presence of groups or individuals who are interested 670 in its queues. When detecting that a user is available to service a 671 queue, the server redirects or transfers the last call in the 672 relevant queue to the available user. On a queue-by-queue basis, 673 authorized users could also subscribe to the call state (dialog 674 information) of calls within a queue. Authorized users could use 675 this information to effectively pluck (take) a call out of the queue 676 (for example by sending an INVITE with a Replaces header to one of 677 the user agents in the queue). 679 2.6.6. Parking Place 681 A parking place is a location where calls can be terminated 682 temporarily and then retrieved later. While a call is "parked", it 683 can receive media "on-hold" such as music, announcements, or 684 advertisements. Such a service could be further decomposed such that 685 announcements or music are handled by a separate component. 687 2.6.7. Announcements and Voice Dialogs 689 An announcement server is a server that can play digitized media 690 (frequently audio), such as music or recorded speech. These servers 691 are typically accessible via SIP, HTTP, or RTSP. An analogous 692 service is a recording service that stores digitized media. A 693 convention for specifying announcements in SIP URIs is described in 694 [24]. Likewise the same server could easily provide a service that 695 records digitized media. 697 A "voice dialog" is a model of spoken interactive behavior between a 698 human and an automaton that can include synthesized speech, digitized 699 audio, recognition of spoken and DTMF key input, recording of spoken 700 input, and interaction with call control. Voice dialogs frequently 701 consist of forms or menus. Forms present information and gather 702 input; menus offer choices of what to do next. 704 Spoken dialogs are a basic building block of applications that use 705 voice. Consider for example that a voice mail system, the 706 conference-id and passcode collection system for a conferencing 707 system, and complicated voice portal applications all require a voice 708 dialog component. 710 2.6.7.1. Text-to-Speech and Automatic Speech Recognition 712 Text-to-Speech (TTS) is a service that converts text into digitized 713 audio. TTS is frequently integrated into other applications, but 714 when separated as a component, it provides greater opportunity for 715 broad reuse. Automatic Speech Recognition (ASR) is a service that 716 attempts to decipher digitized speech based on a proposed grammar. 717 Like TTS, ASR services can be embedded, or exposed so that many 718 applications can take advantage of such services. A standardized 719 (decomposed) interface to access standalone TTS and ASR services is 720 currently being developed in the SPEECHSC Working Group. 722 2.6.7.2. VoiceXML 724 [VoiceXML] is a W3C recommendation that was designed to give authors 725 control over the spoken dialog between users and applications. The 726 application and user take turns speaking: the application prompts the 727 user, and the user in turn responds. Its major goal is to bring the 728 advantages of web-based development and content delivery to 729 interactive voice response applications. We believe that VoiceXML 730 represents the ideal partner for SIP in the development of 731 distributed IVR servers. VoiceXML is an XML based scripting language 732 for describing IVR services at an abstract level. VoiceXML supports 733 DTMF recognition, speech recognition, text-to-speech, and playing out 734 of recorded media files. The results of the data collected from the 735 user are passed to a controlling entity through an HTTP POST 736 operation. The controller can then return another script, or 737 terminate the interaction with the IVR server. 739 A VoiceXML server also need not be implemented as a monolithic 740 server. Below is a diagram of a VoiceXML browser that is split into 741 media and non-media handling parts. The VoiceXML interpreter handles 742 SIP dialog state and state within a VoiceXML document, and sends 743 requests to the media component over another protocol. 745 +-------------+ 746 | | 747 | VoiceXML | 748 | Interpreter | 749 | (signaling) | 750 +-------------+ 751 ^ ^ 752 | | 753 SIP | | RTSP 754 | | 755 | | 756 v v 757 +-------------+ +-------------+ 758 | | | | 759 | SIP UA | RTP | RTSP Server | 760 | |<------>| (media) | 761 | | | | 762 +-------------+ +-------------+ 764 Figure : Decomposed VoiceXML Server 766 2.7. Use of URIs 768 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 769 contexts: the Request-URI; Contact, To, From, and *-Info headers; 770 application/uri bodies; and embedded in email, web pages, instant 771 messages, and ENUM records. The request-URI identifies the user or 772 service that the call is destined for. 774 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 775 SIP content can also specify methods, special parameters, headers, 776 and even bodies. For example: 778 sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice 780 Throughout this draft we discuss call control primitive operations. 781 One of the biggest problems is defining how these operations may be 782 invoked. There are a number of ways to do this. One way is to 783 define the primitives in the protocol itself such that SIP methods 784 (for example REFER) or SIP headers (for example Replaces) indicate a 785 specific call control action. Another way to invoke call control 786 primitives is to define a specific Request-URI naming convention. 787 Either these conventions must be shared between the client (the 788 invoker) and the server, or published by or on behalf of the server. 789 The former involves defining URI construction techniques (e.g. URI 790 parameters and/or token conventions) as proposed in [24]. The latter 791 technique usually involves discovering the URI via a SIP event 792 package, a web page, a business card, or an Instant Message. Yet 793 another means to acquire the URIs is to define a dictionary of 794 primitives with well-defined semantics and provide a means to query 795 the named primitives and corresponding URIs that may be invoked on 796 the service or dialogs. 798 2.7.1. Naming Users in SIP 800 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 801 that points to a domain with a location server that can map the URI 802 to set of Contact URIs where the user might be available. Typically 803 the Contact URIs are populated via registration. 805 Address of Record Contacts 807 sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060 808 sip:bbrown@mailbox.provider.example.net 809 sip:+1.408.555.6789@mobile.example.net 811 Callee Capabilities [20] defines a set of additional parameters to 812 the Contact header that define the characteristics of the user agent 813 at the specified URI. For example, there is a mobility parameter 814 that indicates whether the UA is fixed or mobile. When a user agent 815 registers, it places these parameters in the Contact headers to 816 characterize the URIs it is registering. This allows a proxy for 817 that domain to have information about the contact addresses for that 818 user. 820 When a caller sends a request, it can optionally request Caller 821 Preferences [21], by including the Accept-Contact, Request- 822 Disposition, and Reject-Contact headers that request certain handling 823 by the proxy in the target domain. These headers contain preferences 824 that describe the set of desired URIs to which the caller would like 825 their request routed. The proxy in the target domain matches these 826 preferences with the Contact characteristics originally registered by 827 the target user. The target user can also choose to run arbitrarily 828 complex "Find-me" feature logic on a proxy in the target domain. 830 There is a strong asymmetry in how preferences for callers and 831 callees can be presented to the network. While a caller takes an 832 active role by initiating the request, the callee takes a passive 833 role in waiting for requests. This motivates the use of callee- 834 supplied scripts and caller preferences included in the call request. 835 This asymmetry is also reflected in the appropriate relationship 836 between caller and callee preferences. A server for a callee should 837 respect the wishes of the caller to avoid certain locations, while 838 the preferences among locations has to be the callee's choice, as it 839 determines where, for example, the phone rings and whether the callee 840 incurs mobile telephone charges for incoming calls. 842 SIP User Agent implementations are encouraged to make intelligent 843 decisions based on the type of participants (active/passive, hidden, 844 human/robot) in a conversation space. This information is conveyed 845 via the dialog package or in a SIP header parameter communicated 846 using an appropriate SIP header. For example, a music on hold 847 service may take the sensible approach that if there are two or more 848 unhidden participants, it should not provide hold music; or that it 849 will not send hold music to robots. 851 Multiple participants in the same conversation space may represent 852 the same human user. For example, the user may use one participant 853 for video, chat, and whiteboard media on a PC and another for audio 854 media on a SIP phone. In this case, the address-of-record is the 855 same for both user agents, but the Contacts are different. In 856 addition, human users may add robot participants that act on their 857 behalf (for example a call recording service, or a calendar 858 announcement reminder). Call Control features in SIP should continue 859 to function as expected in such an environment. 861 2.7.2. Naming Services with SIP URIs 863 A critical piece of defining a session level service that can be 864 accessed by SIP is defining the naming of the resources within that 865 service. This point cannot be overstated. 867 In the context of SIP control of application components, we take 868 advantage of the fact that the left-hand-side of a standard SIP URI 869 is a user part. Most services may be thought of as user automatons 870 that participate in SIP sessions. It naturally follows that the user 871 part should be utilized as a service indicator. 873 For example, media servers commonly offer multiple services at a 874 single host address. Use of the user part as a service indicator 875 enables service consumers to direct their requests without ambiguity. 876 It has the added benefit of enabling media services to register their 877 availability with SIP Registrars just as any "real" SIP user would. 878 This maintains consistency and provides enhanced flexibility in the 879 deployment of media services in the network. 881 There has been much discussion about the potential for confusion if 882 media services URIs are not readily distinguishable from other types 883 of SIP UAs. The use of a service namespace provides a mechanism to 884 unambiguously identify standard interfaces while not constraining the 885 development of private or experimental services. 887 In SIP, the Request-URI identifies the user or service that the call 888 is destined for. The great advantage of using URIs (specifically, 889 the SIP Request-URI) as a service identifier comes because of the 890 combination of two facts. First, unlike in the PSTN, where the 891 namespace (dialable telephone numbers) are limited, URIs come from an 892 infinite space. They are plentiful, and they are free. Secondly, 893 the primary function of SIP is call routing through manipulations of 894 the Request-URI. In the traditional SIP application, this URI 895 represents a person. However, the URI can also represent a service, 896 as we propose here. This means we can apply the routing services SIP 897 provides to routing of calls to services. The result - the problem 898 of service invocation and service location becomes a routing problem, 899 for which SIP provides a scalable and flexible solution. Since there 900 is such a vast namespace of services, we can explicitly name each 901 service in a finely granular way. This allows the distribution of 902 services across the network. For further discussion about services 903 and SIP URIs, see RFC 3087 [22] 905 Consider a conferencing service, where we have separated the names of 906 ad-hoc conferences from scheduled conferences, we can program proxies 907 to route calls for ad-hoc conferences to one set of servers, and 908 calls for scheduled ones to another, possibly even in a different 909 provider. In fact, since each conference itself is given a URI, we 910 can distribute conferences across servers, and easily guarantee that 911 calls for the same conference always get routed to the same server. 912 This is in stark contrast to conferences in the telephone network, 913 where the equivalent of the URI - the phone number - is scarce. An 914 entire conferencing provider generally has one or two numbers. 915 Conference IDs must be obtained through IVR interactions with the 916 caller, or through a human attendant. This makes it difficult to 917 distribute conferences across servers all over the network, since the 918 PSTN routing only knows about the dialed number. 920 For more examples, consider the URI conventions of RFC 4240 [24] for 921 media servers and RFC 4458 [25] for voicemail and IVR systems. 923 In practical applications, it is important that an invoker does not 924 necessarily apply semantic rules to various URIs it did not create. 925 Instead, it should allow any arbitrary string to be provisioned, and 926 map the string to the desired behavior. The administrator of a 927 service may choose to provision specific conventions or mnemonic 928 strings, but the application should not require it. In any large 929 installation, the system owner is likely to have pre-existing rules 930 for mnemonic URIs, and any attempt by an application to define its 931 own rules may create a conflict. Implementations should allow an 932 arbitrary mix of URIs from these schemes, or any other scheme that 933 renders valid SIP URIs to be provisioned, rather than enforce only 934 one particular scheme. 936 As we have shown, SIP URIs represent an ideal, flexible mechanism for 937 describing and naming service resources, regardless if the resources 938 are queues, conferences, voice dialogs, announcements, voicemail 939 treatments, or phone features. 941 2.8. Invoker Independence 943 With functional signaling, only the invoker of features in SIP need 944 to know exactly which feature they are invoking. One of the primary 945 benefits of this approach is that combinations of functional features 946 work in SIP call control without requiring complex feature 947 interaction matrices. For example, let us examine the combination of 948 a "transfer" of a call that is "conferenced". 950 Alice calls Bob. Alice silently "conferences in" her robotic 951 assistant Albert as a hidden party. Bob transfers Alice to Carol. 952 If Bob asks Alice to Replace her leg with a new one to Carol then 953 both Alice and Albert should be communicating with Carol 954 (transparently). 956 Using the peer-to-peer model, this combination of features works fine 957 if A is doing local mixing (Alice replaces Bob's dialog with 958 Carol's), or if A is using a central mixer (the mixer replaces Bob's 959 dialog with Carol's). A clever implementation using the 3pcc model 960 can generate similar results. 962 New extensions to the SIP Call Control Framework should attempt to 963 preserve this property. 965 2.9. Billing issues 967 Billing in the PSTN is typically based on who initiated a call. At 968 the moment billing in a SIP network is neither consistent with 969 itself, nor with the PSTN. (A billing model for SIP should allow for 970 both PSTN-style billing, and non-PSTN billing.) The example below 971 demonstrates one such inconsistency. 973 Alice places a call to Bob. Alice then blind transfers Bob to Carol 974 through a PSTN gateway. In current usage of REFER, Bob may be billed 975 for a call he did not initiate (his UA originated the outgoing dialog 976 however). This is not necessarily a terrible thing, but it 977 demonstrates a security concern (Bob must have appropriate local 978 policy to prevent fraud). Also, Alice may wish to pay for Bob's 979 session with Carol. There should be a way to signal this in SIP. 981 Likewise a Replacement call may maintain the same billing 982 relationship as a Replaced call, so if Alice first calls Carol, then 983 asks Bob to Replace this call, Alice may continue to receive a bill. 985 Further work in SIP billing should define a way to set or discover 986 the direction of billing. 988 3. Catalog of call control actions and sample features 990 Call control actions can be categorized by the dialogs upon which 991 they operate. The actions may involve a single or multiple dialogs. 992 These dialogs can be early or established. Multiple dialogs may be 993 related in a conversation space to form a conference or other 994 interesting media topologies. 996 It should be noted that it is desirable to provide a means by which a 997 party can discover the actions that may be performed on a dialog. 998 The interested party may be independent or related to the dialogs. 999 One means of accomplishing this is through the ability to define and 1000 obtain URIs for these actions as described in section . 1002 Below are listed several call control "actions" that establish or 1003 modify dialogs and relate the participants in a conversation space. 1004 The names of the actions listed are for descriptive purposes only 1005 (they are not normative). This list of actions is not meant to be 1006 exhaustive. 1008 In the examples, all actions are initiated by the user "Alice" 1009 represented by UA "A". 1011 3.1. Remote Call Control Actions on Early Dialogs 1013 The following are a set of actions that may be performed on a single 1014 early dialog. These actions can be thought of as a set of remote 1015 control operations. For example an automaton might perform the 1016 operation on behalf of a user. Alternatively a user might use the 1017 remote control in the form of an application to perform the action on 1018 the early dialog of a UA that may be out of reach. All of these 1019 actions correspond to telling the UA how to respond to a request to 1020 establish an early dialog. These actions provide useful 1021 functionality for PDA, PC and server based applications that desire 1022 the ability to control a UA. A proposed mechanism for this type of 1023 functionality is described in Remote Call Control [23]. 1025 3.1.1. Remote Answer 1027 A dialog is in some early dialog state such as 180 Ringing. It may 1028 be desirable to tell the UA to answer the dialog. That is tell it to 1029 send a 200 Ok response to establish the dialog. 1031 3.1.2. Remote Forward or Put 1033 It may be desirable to tell the UA to respond with a 3xx class 1034 response to forward an early dialog to another UA. 1036 3.1.3. Remote Busy or Error Out 1038 It may be desirable to instruct the UA to send an error response such 1039 as 486 Busy Here. 1041 3.2. Remote Call Control Actions on Single Dialogs 1043 There is another useful set of actions that operate on a single 1044 established dialog. These operations are useful in building 1045 productivity applications for aiding users to control their phone. 1046 For example a Customer Relationship Management (CRM) application that 1047 sets up calls for a user eliminating the need for the user to 1048 actually enter an address. These operations can also be thought of a 1049 remote control actions. A proposed mechanism for this type of 1050 functionality is described in Remote Call Control [23]. 1052 3.2.1. Remote Dial 1054 This action instructs the UA to initiate a dialog. This action can 1055 be performed using the REFER method. 1057 3.2.2. Remote On and Off Hold 1059 This action instructs the UA to put an established dialog on hold. 1060 Though this operation can conceptually be performed with the REFER 1061 method, there is no semantics defined as to what the referred party 1062 should do with the SDP. There is no way to distinguish between the 1063 desire to go on or off hold on a per media stream basis. 1065 3.2.3. Remote Hangup 1067 This action instructs the UA to terminate an early or established 1068 dialog. A REFER request with the following Refer-To URI and Target- 1069 Dialog header field [26] performs this action. Note: this example 1070 does not show the full set of header fields. 1072 REFER sip:carol@client.chicago.net SIP/2.0 1073 Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE 1074 Target-Dialog: 13413098;local-tag=879738;remote-tag=023214 1076 3.3. Call Control Actions on Multiple Dialogs 1078 These actions apply to a set of related dialogs. 1080 3.3.1. Transfer 1082 This section describes how call transfer can be achieved using 1083 centralized (3pcc) and peer-to-peer (REFER) approaches. 1085 The conversation space changes as follows: 1087 before after 1088 { A , B } --> { C , B } 1090 A replaces itself with C. 1092 To make this happen using the peer-to-peer approach, "A" would send 1093 two SIP requests. A shorthand for those requests is shown below: 1095 REFER B Refer-To:C 1096 BYE B 1098 To make this happen instead using the 3pcc approach, the controller 1099 sends requests represented by the shorthand below: 1101 INVITE C (w/SDP of B) 1102 reINVITE B (w/SDP of C) 1103 BYE A 1105 Features enabled by this action: 1107 - blind transfer 1108 - transfer to a central mixer (some type of conference or forking) 1109 - transfer to park server (park) 1110 - transfer to music on hold or announcement server 1111 - transfer to a "queue" 1112 - transfer to a service (such as Voice Dialogs service) 1113 - transition from local mixer to central mixer 1115 This action is frequently referred to as "completing an attended 1116 transfer". It is described in more detail in cc-transfer [18]. 1118 Note that if a transfer requires URI hiding or privacy, then the 3pcc 1119 approach can more easily implement this. For example, if the URI of 1120 C needs to be hidden from B, then the use of 3pcc helps accomplish 1121 this. 1123 3.3.2. Take 1125 The conversation space changes as follows: 1127 { B , C } --> { B , A } 1129 A forcibly replaces C with itself. In most uses of this primitive, A 1130 is just "un-replacing" itself. 1132 Using the peer-to-peer approach, "A" sends: 1134 INVITE B Replaces: 1136 Using the 3pcc approach (all requests sent from controller) 1138 INVITE A (w/SDP of B) 1139 reINVITE B (w/SDP of A) 1140 BYE C 1142 Features enabled by this action: 1144 - transferee completes an attended transfer 1145 - retrieve from central mixer (not recommended) 1146 - retrieve from music on hold or park 1147 - retrieve from queue 1148 - call center take 1149 - voice portal resuming ownership of a call it originated 1150 - answering-machine style screening (pickup) 1151 - pickup of a ringing call (i.e. early dialog) 1153 Note: that pick up of a ringing call has perhaps some interesting 1154 additional requirements. First of all it is an early dialog as 1155 opposed to an established dialog. Secondly the party which is to 1156 pickup the call may only wish to do so only while it is an early 1157 dialog. That is in the race condition where the ringing UA accepts 1158 just before it receives signaling from the party wishing to take the 1159 call, the taking party wishes to yield or cancel the take. The goal 1160 is to avoid yanking an answered call from the called party. 1162 This action is described in Replaces [9] and in cc-transfer [18]. 1164 3.3.3. Add 1166 Note that the following 4 actions are described in cc-conferencing 1167 [19]. 1169 This is merely adding a participant to a SIP conference. The 1170 conversation space changes as follows: 1172 { A , B } --> { A , B , C } 1174 A adds C to the conversation. 1176 Using the peer-to-peer approach, adding a party using local mixing 1177 requires no signaling. To transition from a 2-party call or a 1178 locally mixed conference to centrally mixing A could send the 1179 following requests: 1181 REFER B Refer-To: conference-URI 1182 INVITE conference-URI 1183 BYE B 1185 To add a party to a conference: 1187 REFER C Refer-To: conference-URI 1188 or 1189 REFER conference-URI Refer-To: C 1191 Using the 3pcc approach to transition to centrally mixed, the 1192 controller would send: 1194 INVITE mixer leg 1 (w/SDP of A) 1195 INVITE mixer leg 2 (w/SDP of B) 1196 INVITE C (late SDP) 1197 reINVITE A (w/SDP of mixer leg 1) 1198 reINVITE B (w/SDP of mixer leg 2) 1199 INVITE mixer leg3 (w/SDP of C) 1201 To add a party to a SIP conference: 1203 INVITE C (late SDP) 1204 INVITE conference-URI (w/SDP of C) 1206 Features enabled: 1208 - standard conference feature 1209 - call recording 1210 - answering-machine style screening (screening) 1212 3.3.4. Local Join 1214 The conversation space changes like this: 1216 { A , B } , { A , C } --> { A , B , C } 1218 or like this 1220 { A , B } , { C , D } --> { A , B , C , D } 1222 A takes two conversation spaces and joins them together into a single 1223 space. 1225 Using the peer-to-peer approach, A can mix locally, or REFER the 1226 participants of both conversation spaces to the same central mixer 1227 (as in 3.3.5). 1229 For the 3pcc approach, the call flows for inserting participants, and 1230 joining and splitting conversation spaces are tedious yet 1231 straightforward, so these are left as an exercise for the reader. 1233 Features enabled: 1235 - standard conference feature 1236 - leaving a sidebar to rejoin a larger conference 1238 3.3.5. Insert 1240 The conversation space changes like this: 1242 { B , C } --> { A , B , C } 1244 A inserts itself into a conversation space. 1246 A proposed mechanism for signaling this using the peer-to-peer 1247 approach is to send a new header in an INVITE with "joining" [10] 1248 semantics. For example: 1250 INVITE B Join: 1252 If B accepted the INVITE, B would accept responsibility to setup the 1253 dialogs and mixing necessary (for example: to mix locally or to 1254 transfer the participants to a central mixer) 1256 Features enabled: 1258 - barge-in 1259 - call center monitoring 1260 - call recording 1262 3.3.6. Split 1264 { A , B , C , D } --> { A , B } , { C , D } 1266 If using a central conference with peer-to-peer 1267 REFER C Refer-To: conference-URI (new URI) 1268 REFER D Refer-To: conference-URI (new URI) 1269 BYE C 1270 BYE D 1272 Features enabled: 1274 - sidebar conversations during a larger conference 1276 3.3.7. Near-fork 1278 A participates in two conversation spaces simultaneously: 1280 { A, B } --> { B , A } & { A , C } 1282 A is a participant in two conversation spaces such that A sends the 1283 same media to both spaces, and renders media from both spaces, 1284 presumably by mixing or rendering the media from both. We can define 1285 that A is the "anchor" point for both forks, each of which is a 1286 separate conversation space. 1288 This action is purely local implementation (it requires no special 1289 signaling). Local features such as switching calls between the 1290 background and foreground are possible using this media relationship. 1292 3.3.8. Far fork 1294 The conversation space diagram... 1296 { A, B } --> { A , B } & { B , C } 1298 A requests B to be the "anchor" of two conversation spaces. 1300 This is easily setup by creating a conference with two sub- 1301 conferences and setting the media policy appropriately such that B is 1302 a participant in both. Media forking can also be setup using 3pcc as 1303 described in Section 5.1 of RFC3264 [3] (an offer/answer model for 1304 SDP). The session descriptions for forking are quite complex. 1305 Controllers should verify that endpoints can handle forked-media, for 1306 example using prior configuration. 1308 Features enabled: 1310 - barge-in 1311 - voice portal services 1312 - whisper 1313 - hotword detection 1314 - sending DTMF somewhere else 1316 4. Security Considerations 1318 Call Control primitives provide a powerful set of features that can 1319 be dangerous in the hands of an attacker. To complicate matters, 1320 call control primitives are likely to be automatically authorized 1321 without direct human oversight. 1323 The class of attacks that are possible using these tools include the 1324 ability to eavesdrop on calls, disconnect calls, redirect calls, 1325 render irritating content (including ringing) at a user agent, cause 1326 an action that has billing consequences, subvert billing (theft-of- 1327 service), and obtain private information. Call control extensions 1328 must take extra care to describe how these attacks will be prevented. 1330 We can also make some general observations about authorization and 1331 trust with respect to call control. The security model is 1332 dramatically dependent on the signaling model chosen (see section 1333 3.2) 1335 Let us first examine the security model used in the 3pcc approach. 1336 All signaling goes through the controller, which is a trusted entity. 1337 Traditional SIP authentication and hop-by-hop encryption and message 1338 integrity work fine in this environment, but end-to-end encryption 1339 and message integrity may not be possible. 1341 When using the peer-to-peer approach, call control actions and 1342 primitives can be legitimately initiated by a) an existing 1343 participant in the conversation space, b) a former participant in the 1344 conversation space, or c) an entity trusted by one of the 1345 participants. For example, a participant always initiates a 1346 transfer; a retrieve from Park (a take) is initiated on behalf of a 1347 former participant; and a barge-in (insert or far-fork) is initiated 1348 by a trusted entity (an operator for example). 1350 Authenticating requests by an existing participant or a trusted 1351 entity can be done with baseline SIP mechanisms. In the case of 1352 features initiated by a former participant, these should be protected 1353 against replay attacks by using a unique name or identifier per 1354 invocation. The Replaces header exhibits this behavior as a by- 1355 product of its operation (once a Replaces operation is successful, 1356 the dialog being Replaced no longer exists). For other requests, a 1357 "one-time" Request-URI may be provided to the feature invoker. 1359 To authorize call control primitives that trigger special behavior 1360 (such as an INVITE with Replaces or Join semantics), the receiving 1361 user agent may have trouble finding appropriate credentials with 1362 which to challenge or authorize the request, as the sender may be 1363 completely unknown to the receiver, except through the introduction 1364 of a third party. These credentials need to be passed transitively 1365 in some way or fetched in an event body, for example. 1367 5. IANA Considerations 1369 This document required no action by IANA. 1371 6. Appendix A: Example Features 1373 Primitives are defined in terms of their ability to provide features. 1374 These example features should require an amply robust set of services 1375 to demonstrate a useful set of primitives. They are described here 1376 briefly. Note that the descriptions of these features are non- 1377 normative. Some of these features are used as examples in section 6 1378 to demonstrate how some features may require certain media 1379 relationships. Note also that this document describes a mixture of 1380 both features originating in the world of telephones, and features 1381 that are clearly Internet oriented. 1383 Example Feature Definitions: 1385 Attended Transfer - The transferring party establishes a session with 1386 the transfer target before completing the transfer. 1388 Auto Answer - Calls to a certain address or location answer 1389 immediately via a speakerphone. 1391 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 1392 like Bob to call her automatically when he is available. When Bob 1393 hangs up, Alice's phone rings. When Alice answers, Bob's phone 1394 rings. Bob answers and they talk. 1396 Barge-in - Carol interrupts Alice who has a call in-progress call 1397 with Bob. In some variations, Alice forcibly joins a new conversation 1398 with Carol, in other variations, all three parties are placed in the 1399 same conversation (basically a 3-way conference). 1401 Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob 1402 to contact Carol, but makes no attempt to contact Carol 1403 independently. In many implementations, Alice does not verify Bob's 1404 success or failure in contacting Carol. 1406 Call Forwarding - Before a dialog is accepted it is redirected to 1407 another location, for example, because the originally intended 1408 recipient is busy, does not answer, is disconnected from the network, 1409 configured all requests to go somewhere else. 1411 Call Monitoring - A call center supervisor joins an in-progress call 1412 for monitoring purposes. 1414 Call Park - A call participant parks a call (essentially puts the 1415 call on hold), and then retrieves it at a later time (typically from 1416 another location). 1418 Call Pickup - A party picks up a call that was ringing at another 1419 location. One variation allows the caller to choose which location, 1420 another variation just picks up any call in that user's "pickup 1421 group". 1423 Call Return - Alice calls Bob. Bob misses the call or is disconnected 1424 before he is finished talking to Alice. Bob invokes Call return that 1425 calls Alice, even if Alice did not provide her real identity or 1426 location to Bob. 1428 Call Waiting - Alice is in a call, then receives another call. Alice 1429 can place the first call on hold, and talk with the other caller. 1430 She can typically switch back and forth between the callers. 1432 Click-to-dial - Alice looks in her company directory for Bob. When 1433 she finds Bob, she clicks on a URI to call him. Her phone rings (or 1434 possibly answers automatically), and when she answers, Bob's phone 1435 rings. 1437 Conference Call - Three or more active, visible participants in the 1438 same conversation space. 1440 Consultative transfer - the transferring party establishes a session 1441 with the target and mixes both sessions together so that all three 1442 parties can participate, then disconnects leaving the transferee and 1443 transfer target with an active session. 1445 Distinctive ring - Incoming calls have different ring cadences or 1446 sample sounds depending on the From party, the To party, or other 1447 factors. 1449 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 1450 her either ring briefly or not at all and are forwarded elsewhere. 1451 Some variations allow specially authorized callers to override this 1452 feature and ring Alice anyway. 1454 Find-Me - Alice sets up complicated rules for how she can be reached 1455 (possibly using CPL (Lennox, J., Wu, X., and H. Schulzrinne, "Call 1456 Processing Language (CPL): A Language for User Control of Internet 1457 Telephony Services," October 2004.) [27], presence (Rosenberg, J., "A 1458 Presence Event Package for the Session Initiation Protocol (SIP)," 1459 August 2004.) [14], or other factors). When Bob calls Alice, his 1460 call is eventually routed to a temporary Contact where Alice happens 1461 to be available. 1463 Hotline - Alice picks up a phone and is immediately connected to the 1464 technical support hotline, for example. 1466 IM Conference Alerts: A user receives an notification as an Instant 1467 Message whenever someone joins a conference they are also in. 1469 Inbound Call Screening - Alice doesn't want to receive calls from 1470 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1471 some variations this works even if Matt hides his identity. 1473 Intercom - Alice typically presses a button on a phone that 1474 immediately connects to another user or phone and causes that phone 1475 to play her voice over its speaker. Some variations immediately 1476 setup two-way communications, other variations require another button 1477 to be pressed to enable a two-way conversation. 1479 Message Waiting - Bob calls Alice when she steps away from her phone, 1480 when she returns a visible or audible indicator conveys that someone 1481 has left her a voicemail message. The message waiting indication may 1482 also convey how many messages are waiting, from whom, what time, and 1483 other useful pieces of information. 1485 Music on Hold - When Alice places a call with Bob on hold, it 1486 replaces its audio with streaming content such as music, 1487 announcements, or advertisements. 1489 Outbound Call Screening - Alice is paged and unknowingly calls a PSTN 1490 pay-service telephone number in the Caribbean, but local policy 1491 blocks her call, and possibly informs her why. 1493 Pre-paid calling - Alice pays for a certain currency or unit amount 1494 of calling value. When she places a call, she provides her account 1495 number somehow. If her account runs out of calling value during a 1496 call her call is disconnected or redirected to a service where she 1497 can purchase more calling value. 1499 Presence-Enabled Conferencing: Alice wants to set up a conference 1500 call with Bob and Cathy when they all happen to be available (rather 1501 than scheduling a predefined time). The server providing the 1502 application monitors their status, and calls all three when they are 1503 all "online", not idle, and not in another call. 1505 Single Line Extension/Multiple Line Appearance -- A group of phones 1506 are all treated as "extensions" of a single line. A call for one 1507 rings them all. As soon as one answers, the others stop ringing. If 1508 any extension is actively in a conversation, another extension can 1509 "pick up" and immediately join the conversation. This emulates the 1510 behavior of a home telephone line with multiple phones. 1512 Speakerphone paging - Alice calls the paging address and speaks. Her 1513 voice is played on the speaker of every idle phone in a preconfigured 1514 group of phones. 1516 Speed dial - Alice dials an abbreviated number, or enters an alias, 1517 or presses a special speed dial button representing Bob. Her action 1518 is interpreted as if she specified the full address of Bob. 1520 Voice message screening - Bob calls Alice. Alice is screening her 1521 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1522 leave his message. If she decides to talk to Bob, she can take the 1523 call back from the voicemail system, otherwise she can let Bob leave 1524 a message. This emulates the behavior of a home telephone answering 1525 machine 1527 Voice Portal - A service that allows users to access a portal site 1528 using spoken dialog interaction. For example, Alice needs to 1529 schedule a working dinner with her co-worker Carol. Alice uses a 1530 voice portal to check Carol's flight schedule, find a restaurant near 1531 her hotel, make a reservation, get directions there, and page Carol 1532 with this information. 1534 Whispered call waiting - Alice is in a conversation with Bob. Carol 1535 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1536 get lunch in 15 minutes?"), or an automaton whispers to Alice 1537 informing her that Carol is trying to reach her. 1539 6.1. Implementation of these features 1541 Example Features: 1543 Attended Transfer [18] 1544 Auto Answer [28] 1545 Automatic Callback Two person presence-based conference 1546 Barge-in Section 6.1.1 1547 Blind Transfer [18] 1548 Call Forwarding Proxy or Local implementation 1549 Call Hold [6] 1550 Call Monitoring Section 6.1.2 1551 Call Park Section 6.1.3, [6] 1552 Call Pickup Section 6.1.4, [6] 1553 Call Return Proxy feature 1554 Call Waiting Local Implementation 1555 Click-to-dial Section 6.1.5, [6] 1556 Conference Call [19] 1557 Presence-based 1558 Conferencing [19], [14] 1559 Consultative transfer [18] 1560 Distinctive ring Section 6.1.6, Proxy or Local implementation 1561 Do Not Disturb [14] 1562 Find-Me Proxy service based on presence 1563 Hotline Local Implementation 1564 IM Conference Alerts Subscribe to conference status 1565 Inbound Call Screening Proxy or Local implementation 1566 Intercom Section 6.1.7, [28] 1567 Message Waiting [29] 1568 Multiple Appearances Section 6.1.10 1569 Music on Hold Section 6.1.8, [6] 1570 Outbound Call Screening Proxy feature 1571 Pre-Paid Calling Section 6.1.9 1572 Single Line Extension Section 6.1.10 1573 Speakerphone paging Section 6.1.11, Speed dial + Auto Answer 1574 Speed dial Local Implementation 1575 Voice Message Screening Section 6.1.12 1576 Voice Portal Section 6.1.13 1577 Whispered call waiting Local implementation 1579 6.1.1. Barge-in 1581 Barge-in works the same as call monitoring except that it must 1582 indicate that the send media stream to be mixed so that all of the 1583 other parties can hear the stream from UA barging in. 1585 6.1.2. Call Monitoring 1587 Call monitoring is a Join operation. The monitoring UA sends a Join 1588 to the dialog it wants to listen to. It is able to discover the 1589 dialog via the dialog state on the monitored UA. The monitoring UA 1590 sends SDP in the INVITE that indicates receive only media. As the UA 1591 is monitoring only it does not matter whether the UA indicates it 1592 wishes the send stream be mix or point to point. 1594 6.1.3. Call Park 1596 Call park requires the ability to: put a dialog some place, advertise 1597 it to users in a pickup group and to uniquely identify it in a means 1598 that can be communicated (including human voice). The dialog can be 1599 held locally on the UA parking the dialog or alternatively 1600 transferred to the park service for the pickup group. The parked 1601 dialog then needs to be labeled (e.g. orbit 12) in a way that can be 1602 communicated to the party that is to pick up the call. The UAs in 1603 the pick up group discovers the parked dialog(s) via the dialog 1604 package from the park service. If the dialog is parked locally the 1605 park service merely aggregates the parked call states from the set of 1606 UAs in the pickup up group. 1608 6.1.4. Call Pickup 1610 There are two different features that are called call pickup. The 1611 first is the pickup of a parked dialog. The UA from which the dialog 1612 is to be picked up subscribes to the dialog state of the park service 1613 or the UA that has locally parked the dialog. Dialogs that are 1614 parked should be labeled with an identifier. The labels are used by 1615 the UA to allow the user to indicate which dialog is to be picked up. 1616 The UA picking up the call invoked the URI in the call state that is 1617 labeled as replace-remote. 1619 The other call pickup feature involves picking up an early dialog 1620 (typically ringing). This feature uses some of the same primitives 1621 as the pick up of a parked call. The call state of the UA ringing 1622 phone is advertised using the dialog package. The UA that is to 1623 pickup the early dialog subscribes either directly to the ringing UA 1624 or to a service aggregating the states for UAs in the pickup group. 1625 The call state identifies early dialogs. The UA uses the call 1626 state(s) to help the user choose which early dialog that is to be 1627 picked up. The UA then invokes the URI in the call state labeled as 1628 replace-remote. 1630 6.1.5. Click-to-dial 1632 The application or server that hosts the click-to-dial application 1633 captures the URI to be dialed and can setup the call using 3pcc or 1634 can send a REFER request to the UA that is to dial the address. As 1635 users sometimes change their mind or wish to give up listing to a 1636 ringing or voicemail answered phone, this application illustrates the 1637 need to also have the ability to remotely hangup a call. 1639 6.1.6. Distinctive ring 1641 The target UA either makes a local decision based on information in 1642 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1643 Alert-Info header provided by the caller or inserted by a trusted 1644 proxy. In the latter case, the UA fetches the content described in 1645 the URI (typically via http) and renders it to the user. 1647 6.1.7. Intercom 1649 The UA initiates a dialog using INVITE and the Answer-Mode: Auto 1650 header field as described in [28]. The called UA accepts the INVITE 1651 with a 200 OK and automatically enables the speakerphone. 1653 Alternatively this can be a local decision for the UA to answer based 1654 upon called party identification. 1656 6.1.8. Music on Hold 1658 Music on hold can be implemented a number of ways. One way is to 1659 transfer the held call to a holding service. When the UA wishes to 1660 take the call off hold it basically performs a take on the call from 1661 the holding service. This involves subscribing to call state on the 1662 holding service and then invoking the URI in the call state labeled 1663 as replace-remote. 1665 Alternatively music on hold can be performed as a local mixing 1666 operation. The UA holding the call can mix in the music from the 1667 music service via RTP (i.e. an additional dialog) or RTSP or other 1668 streaming media source. This approach is simpler (i.e. the held 1669 dialog does not move so there is less chance of loosing them) from a 1670 protocol perspective, however it does use more LAN bandwidth and 1671 resources on the UA. 1673 6.1.9. Pre-paid calling 1675 For prepaid calling, the user's media always passes through a device 1676 that is trusted by the pre-paid provider. This may be the other 1677 endpoint (for example a PSTN gateway). In either case, an 1678 intermediary proxy or B2BUA can periodically verify the amount of 1679 time available on the pre-paid account, and use the session-timer 1680 extension to cause the trusted endpoint (gateway) or intermediary 1681 (media relay) to send a reINVITE before that time runs out. During 1682 the reINVITE, the SIP intermediary can re-verify the account and 1683 insert another session-timer header. 1685 Note that while most pre-paid systems on the PSTN use an IVR to 1686 collect the account number and destination, this isn't strictly 1687 necessary for a SIP-originated prepaid call. SIP requests and SIP 1688 URIs are sufficiently expressive to convey the final destination, the 1689 provider of the prepaid service, the location from which the user is 1690 calling, and the prepaid account they want to use. If a pre-paid IVR 1691 is used, the mechanism described below (Voice Portals) can be 1692 combined as well. 1694 6.1.10. Single Line Extension/Multiple Line Appearance 1696 Incoming calls ring all the extensions through basic parallel 1697 forking. Each extension subscribes to dialog events from each other 1698 extension. While one user has an active call, any other UA extension 1699 can insert itself into that conversation (it already knows the dialog 1700 information) in the same way as barge-in. 1702 Standardization work to allow line appearance numbers to be 1703 coordinated across a group of UAs is currently underway. 1705 6.1.11. Speakerphone paging 1707 Speakerphone paging can be implemented using either multicast or 1708 through a simple multipoint mixer. In the multicast solution the 1709 paging UA sends a multicast INVITE with send only media in the SDP 1710 (see also RFC3264). The automatic answer and enabling of the 1711 speakerphone is a locally configured decision on the paged UAs. The 1712 paging UA sends RTP via the multicast address indicated in the SDP. 1714 The multipoint solution is accomplished by sending an INVITE to the 1715 multipoint mixer. The mixer is configured to automatically answer 1716 the dialog. The paging UA then sends REFER requests for each of the 1717 UAs that are to become paging speakers (The UA is likely to send out 1718 a single REFER that is parallel forked by the proxy server). The UAs 1719 performing as paging speakers are configured to automatically answer 1720 based upon caller identification (e.g. To field, URI or Referred-To 1721 headers). 1723 Finally as a third option, the user agent can send a mass-invitation 1724 request to a conference server, which would create a conference and 1725 send INVITEs containing the Answer-Mode: Auto header field to all 1726 user agents in the paging group. 1728 6.1.12. Voice message screening 1730 At first, this is the same as call monitoring. In this case the 1731 voicemail service is one of the UAs. The UA screening the message 1732 monitors the call on the voicemail service, and also subscribes to 1733 dialog information. If the user screening their messages decides to 1734 answer, they perform a Take from the voicemail system (for example, 1735 send an INVITE with Replaces to the UA leaving the message) 1737 6.1.13. Voice Portal 1739 A voice portal is essentially a complex collection of voice dialogs 1740 used to access interesting content. One of the most desirable call 1741 control features of a Voice Portal is the ability to start a new 1742 outgoing call from within the context of the Portal (to make a 1743 restaurant reservation, or return a voicemail message for example). 1744 Once the new call is over, the user should be able to return to the 1745 Portal by pressing a special key, using some DTMF sequence (ex: a 1746 very long pound or hash tone), or by speaking a hotword (ex: "Main 1747 Menu"). 1749 In order to accomplish this, the Voice Portal starts with the 1750 following media relationship: 1752 { User , Voice Portal } 1754 The user then asks to make an outgoing call. The Voice Portal asks 1755 the User to perform a Far-Fork. In other words the Voice Portal 1756 wants the following media relationship: 1758 { Target , User } & { User , Voice Portal } 1760 The Voice Portal is now just listening for a hotword or the 1761 appropriate DTMF. As soon as the user indicates they are done, the 1762 Voice Portal takes the call from the old Target, and we are back to 1763 the original media relationship. 1765 This feature can also be used by the account number and phone number 1766 collection menu in a pre-paid calling service. A user can press a 1767 DTMF sequence that presents them with the appropriate menu again. 1769 7. Acknowledgements 1771 The authors would like to acknowledge Ben Campbell for his 1772 contributions to the document and thank AC Mahendran, John Elwell, 1773 and Xavier Marjou for their detailed Working Group review of the 1774 document. 1776 8. Informative References 1778 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1779 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 1780 Session Initiation Protocol", RFC 3261, June 2002. 1782 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1783 Levels", BCP 14, RFC 2119, March 1997. 1785 [3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1786 Session Description Protocol (SDP)", RFC 3264, June 2002. 1788 [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1789 Notification", RFC 3265, June 2002. 1791 [5] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1792 Description Protocol", RFC 4566, July 2006. 1794 [6] Johnston, A., "Session Initiation Protocol Service Examples", 1795 draft-ietf-sipping-service-examples-13 (work in progress), 1796 July 2007. 1798 [7] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, 1799 "Best Current Practices for Third Party Call Control (3pcc) in 1800 the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, 1801 April 2004. 1803 [8] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1804 Method", RFC 3515, April 2003. 1806 [9] Mahy, R., Biggs, B., and R. Dean, "The Session Initiation 1807 Protocol (SIP) "Replaces" Header", RFC 3891, September 2004. 1809 [10] Mahy, R. and D. Petrie, "The Session Initiation Protocol (SIP) 1810 "Join" Header", RFC 3911, October 2004. 1812 [11] Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE- 1813 Initiated Dialog Event Package for the Session Initiation 1814 Protocol (SIP)", RFC 4235, November 2005. 1816 [12] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 1817 Initiation Protocol (SIP) Event Package for Conference State", 1818 RFC 4575, August 2006. 1820 [13] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1821 Package for Registrations", RFC 3680, March 2004. 1823 [14] Rosenberg, J., "A Presence Event Package for the Session 1824 Initiation Protocol (SIP)", RFC 3856, August 2004. 1826 [15] Rosenberg, J., "A Framework for Conferencing with the Session 1827 Initiation Protocol (SIP)", RFC 4353, February 2006. 1829 [16] Rosenberg, J., "A Framework for Application Interaction in the 1830 Session Initiation Protocol (SIP)", 1831 draft-ietf-sipping-app-interaction-framework-05 (work in 1832 progress), July 2005. 1834 [17] Camarillo, G., "Framework for Transcoding with the Session 1835 Initiation Protocol (SIP)", 1836 draft-ietf-sipping-transc-framework-05 (work in progress), 1837 December 2006. 1839 [18] Sparks, R., "Session Initiation Protocol Call Control - 1840 Transfer", draft-ietf-sipping-cc-transfer-08 (work in 1841 progress), July 2007. 1843 [19] Johnston, A. and O. Levin, "Session Initiation Protocol (SIP) 1844 Call Control - Conferencing for User Agents", BCP 119, 1845 RFC 4579, August 2006. 1847 [20] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating 1848 User Agent Capabilities in the Session Initiation Protocol 1849 (SIP)", RFC 3840, August 2004. 1851 [21] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1852 Preferences for the Session Initiation Protocol (SIP)", 1853 RFC 3841, August 2004. 1855 [22] Campbell, B. and R. Sparks, "Control of Service Context using 1856 SIP Request-URI", RFC 3087, April 2001. 1858 [23] Jennings, C. and R. Mahy, "Remote Call Control in the Session 1859 Initiation Protocol (SIP) using the REFER method and the 1860 session-oriented dialog package", draft-mahy-sip-remote-cc-05 1861 (work in progress), March 2007. 1863 [24] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media 1864 Services with SIP", RFC 4240, December 2005. 1866 [25] Jennings, C., Audet, F., and J. Elwell, "Session Initiation 1867 Protocol (SIP) URIs for Applications such as Voicemail and 1868 Interactive Voice Response (IVR)", RFC 4458, April 2006. 1870 [26] Rosenberg, J., "Request Authorization through Dialog 1871 Identification in the Session Initiation Protocol (SIP)", 1872 RFC 4538, June 2006. 1874 [27] Lennox, J., Wu, X., and H. Schulzrinne, "Call Processing 1875 Language (CPL): A Language for User Control of Internet 1876 Telephony Services", RFC 3880, October 2004. 1878 [28] Willis, D. and A. Allen, "Requesting Answering Modes for the 1879 Session Initiation Protocol (SIP)", 1880 draft-ietf-sip-answermode-06 (work in progress), 1881 September 2007. 1883 [29] Mahy, R., "A Message Summary and Message Waiting Indication 1884 Event Package for the Session Initiation Protocol (SIP)", 1885 RFC 3842, August 2004. 1887 Authors' Addresses 1889 Rohan Mahy 1890 Plantronics 1891 345 Encincal Street 1892 Santa Cruz, CA 1893 USA 1895 Email: rohan@ekabal.com 1897 Robert Sparks 1898 Estacado Systems 1900 Email: rjsparks@nostrum.com 1902 Jonathan Rosenberg 1903 Cisco Systems 1905 Email: jdrosen@cisco.com 1907 Dan Petrie 1908 SIP EZ 1910 Email: dpetrie@sipez.com 1912 Alan Johnston (editor) 1913 Avaya 1915 Email: alan@sipstation.com 1917 Full Copyright Statement 1919 Copyright (C) The IETF Trust (2007). 1921 This document is subject to the rights, licenses and restrictions 1922 contained in BCP 78, and except as set forth therein, the authors 1923 retain all their rights. 1925 This document and the information contained herein are provided on an 1926 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1927 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1928 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1929 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1930 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1931 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1933 Intellectual Property 1935 The IETF takes no position regarding the validity or scope of any 1936 Intellectual Property Rights or other rights that might be claimed to 1937 pertain to the implementation or use of the technology described in 1938 this document or the extent to which any license under such rights 1939 might or might not be available; nor does it represent that it has 1940 made any independent effort to identify any such rights. Information 1941 on the procedures with respect to rights in RFC documents can be 1942 found in BCP 78 and BCP 79. 1944 Copies of IPR disclosures made to the IETF Secretariat and any 1945 assurances of licenses to be made available, or the result of an 1946 attempt made to obtain a general license or permission for the use of 1947 such proprietary rights by implementers or users of this 1948 specification can be obtained from the IETF on-line IPR repository at 1949 http://www.ietf.org/ipr. 1951 The IETF invites any interested party to bring to its attention any 1952 copyrights, patents or patent applications, or other proprietary 1953 rights that may cover technology that may be required to implement 1954 this standard. Please address the information to the IETF at 1955 ietf-ipr@ietf.org. 1957 Acknowledgment 1959 Funding for the RFC Editor function is provided by the IETF 1960 Administrative Support Activity (IASA).