idnits 2.17.1 draft-ietf-sipping-cc-framework-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 20, 2009) is 5203 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 3265 (Obsoleted by RFC 6665) -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-15) exists of draft-ietf-xcon-ccmp-04 == Outdated reference: A later version (-15) exists of draft-ietf-bliss-shared-appearances-04 -- Obsolete informational reference (is this intentional?): RFC 4244 (Obsoleted by RFC 7044) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING WG R. Mahy 3 Internet-Draft Unaffiliated 4 Intended status: Informational R. Sparks 5 Expires: June 23, 2010 Tekelek 6 J. Rosenberg 7 jdrosen.net 8 D. Petrie 9 SIP EZ 10 A. Johnston, Ed. 11 Avaya 12 December 20, 2009 14 A Call Control and Multi-party usage framework for the Session 15 Initiation Protocol (SIP) 16 draft-ietf-sipping-cc-framework-12 18 Abstract 20 This document defines a framework and requirements for call control 21 and multi-party usage of Session Initiation Protocol (SIP). To 22 enable discussion of multi-party features and applications we define 23 an abstract call model for describing the media relationships 24 required by many of these. The model and actions described here are 25 specifically chosen to be independent of the SIP signaling and/or 26 mixing approach chosen to actually setup the media relationships. In 27 addition to its dialog manipulation aspect, this framework includes 28 requirements for communicating related information and events such as 29 conference and session state, and session history. This framework 30 also describes other goals that embody the spirit of SIP applications 31 as used on the Internet such as: definition of primitives, not 32 services; invoker and participant oriented; signaling and mixing 33 model independence, and others. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that 42 other groups may also distribute working documents as Internet- 43 Drafts. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/ietf/1id-abstracts.txt. 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html. 55 This Internet-Draft will expire on June 23, 2010. 57 Copyright Notice 59 Copyright (c) 2009 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents 64 (http://trustee.ietf.org/license-info) in effect on the date of 65 publication of this document. Please review these documents 66 carefully, as they describe your rights and restrictions with respect 67 to this document. Code Components extracted from this document must 68 include Simplified BSD License text as described in Section 4.e of 69 the Trust Legal Provisions and are provided without warranty as 70 described in the BSD License. 72 This document may contain material from IETF Documents or IETF 73 Contributions published or made publicly available before November 74 10, 2008. The person(s) controlling the copyright in some of this 75 material may not have granted the IETF Trust the right to allow 76 modifications of such material outside the IETF Standards Process. 77 Without obtaining an adequate license from the person(s) controlling 78 the copyright in such materials, this document may not be modified 79 outside the IETF Standards Process, and derivative works of it may 80 not be created outside the IETF Standards Process, except to format 81 it for publication as an RFC or to translate it into languages other 82 than English. 84 Table of Contents 86 1. Motivation and Background . . . . . . . . . . . . . . . . . . 5 87 2. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 7 88 2.1. "Conversation Space" Model . . . . . . . . . . . . . . . . 7 89 2.2. Relationship Between Conversation Space, SIP Dialogs, 90 and SIP Sessions . . . . . . . . . . . . . . . . . . . . . 9 91 2.3. Signaling Models . . . . . . . . . . . . . . . . . . . . . 9 92 2.4. Mixing Models . . . . . . . . . . . . . . . . . . . . . . 10 93 2.4.1. Tightly Coupled . . . . . . . . . . . . . . . . . . . 11 94 2.4.2. Loosely Coupled . . . . . . . . . . . . . . . . . . . 12 95 2.5. Conveying Information and Events . . . . . . . . . . . . . 13 96 2.6. Componentization and Decomposition . . . . . . . . . . . . 15 97 2.6.1. Media Intermediaries . . . . . . . . . . . . . . . . . 15 98 2.6.2. Text-to-Speech and Automatic Speech Recognition . . . 17 99 2.6.3. VoiceXML . . . . . . . . . . . . . . . . . . . . . . . 17 100 2.7. Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 18 101 2.7.1. Naming Users in SIP . . . . . . . . . . . . . . . . . 19 102 2.7.2. Naming Services with SIP URIs . . . . . . . . . . . . 20 103 2.8. Invoker Independence . . . . . . . . . . . . . . . . . . . 22 104 2.9. Billing issues . . . . . . . . . . . . . . . . . . . . . . 23 105 3. Catalog of call control actions and sample features . . . . . 23 106 3.1. Remote Call Control Actions on Early Dialogs . . . . . . . 24 107 3.1.1. Remote Answer . . . . . . . . . . . . . . . . . . . . 24 108 3.1.2. Remote Forward or Put . . . . . . . . . . . . . . . . 24 109 3.1.3. Remote Busy or Error Out . . . . . . . . . . . . . . . 24 110 3.2. Remote Call Control Actions on Single Dialogs . . . . . . 24 111 3.2.1. Remote Dial . . . . . . . . . . . . . . . . . . . . . 25 112 3.2.2. Remote On and Off Hold . . . . . . . . . . . . . . . . 25 113 3.2.3. Remote Hangup . . . . . . . . . . . . . . . . . . . . 25 114 3.3. Call Control Actions on Multiple Dialogs . . . . . . . . . 25 115 3.3.1. Transfer . . . . . . . . . . . . . . . . . . . . . . . 25 116 3.3.2. Take . . . . . . . . . . . . . . . . . . . . . . . . . 26 117 3.3.3. Add . . . . . . . . . . . . . . . . . . . . . . . . . 27 118 3.3.4. Local Join . . . . . . . . . . . . . . . . . . . . . . 28 119 3.3.5. Insert . . . . . . . . . . . . . . . . . . . . . . . . 29 120 3.3.6. Split . . . . . . . . . . . . . . . . . . . . . . . . 29 121 3.3.7. Near fork . . . . . . . . . . . . . . . . . . . . . . 29 122 3.3.8. Far fork . . . . . . . . . . . . . . . . . . . . . . . 30 123 4. Security Considerations . . . . . . . . . . . . . . . . . . . 30 124 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 125 6. Appendix A: Example Features . . . . . . . . . . . . . . . . . 32 126 6.1. Attended Transfer . . . . . . . . . . . . . . . . . . . . 32 127 6.2. Auto Answer . . . . . . . . . . . . . . . . . . . . . . . 32 128 6.3. Automatic Callback . . . . . . . . . . . . . . . . . . . . 32 129 6.4. Barge-in . . . . . . . . . . . . . . . . . . . . . . . . . 32 130 6.5. Blind Transfer . . . . . . . . . . . . . . . . . . . . . . 32 131 6.6. Call Forwarding . . . . . . . . . . . . . . . . . . . . . 33 132 6.7. Call Monitoring . . . . . . . . . . . . . . . . . . . . . 33 133 6.8. Call Park . . . . . . . . . . . . . . . . . . . . . . . . 33 134 6.9. Call Pickup . . . . . . . . . . . . . . . . . . . . . . . 33 135 6.10. Call Return . . . . . . . . . . . . . . . . . . . . . . . 34 136 6.11. Call Waiting . . . . . . . . . . . . . . . . . . . . . . . 34 137 6.12. Click-to-Dial . . . . . . . . . . . . . . . . . . . . . . 34 138 6.13. Conference Call . . . . . . . . . . . . . . . . . . . . . 34 139 6.14. Consultative Transfer . . . . . . . . . . . . . . . . . . 34 140 6.15. Distinctive Ring . . . . . . . . . . . . . . . . . . . . . 35 141 6.16. Do Not Disturb . . . . . . . . . . . . . . . . . . . . . . 35 142 6.17. Find-Me . . . . . . . . . . . . . . . . . . . . . . . . . 35 143 6.18. Hotline . . . . . . . . . . . . . . . . . . . . . . . . . 35 144 6.19. IM Conference Alerts . . . . . . . . . . . . . . . . . . . 35 145 6.20. Inbound Call Screening . . . . . . . . . . . . . . . . . . 35 146 6.21. Intercom . . . . . . . . . . . . . . . . . . . . . . . . . 35 147 6.22. Message Waiting . . . . . . . . . . . . . . . . . . . . . 36 148 6.23. Music on Hold . . . . . . . . . . . . . . . . . . . . . . 36 149 6.24. Outbound Call Screening . . . . . . . . . . . . . . . . . 36 150 6.25. Pre-paid Calling . . . . . . . . . . . . . . . . . . . . . 36 151 6.26. Presence-Enabled Conferencing . . . . . . . . . . . . . . 37 152 6.27. Single Line Extension/Multiple Line Appearance . . . . . . 37 153 6.28. Speakerphone Paging . . . . . . . . . . . . . . . . . . . 38 154 6.29. Speed Dial . . . . . . . . . . . . . . . . . . . . . . . . 38 155 6.30. Voice Message Screening . . . . . . . . . . . . . . . . . 38 156 6.31. Voice Portal . . . . . . . . . . . . . . . . . . . . . . . 39 157 6.32. Voicemail . . . . . . . . . . . . . . . . . . . . . . . . 39 158 6.33. Whispered Call Waiting . . . . . . . . . . . . . . . . . . 40 159 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 40 160 8. Informative References . . . . . . . . . . . . . . . . . . . . 40 161 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 163 1. Motivation and Background 165 The Session Initiation Protocol [RFC3261] (SIP) was defined for the 166 initiation, maintenance, and termination of sessions or calls between 167 one or more users. However, despite its origins as a large-scale 168 multiparty conferencing protocol, SIP is used today primarily for 169 point to point calls. This two-party configuration is the focus of 170 the SIP specification and most of its extensions. 172 This document defines a framework and requirements for call control 173 and multi-party usage of SIP. Most multi-party operations manipulate 174 SIP dialogs (also known as call legs) or SIP conference media policy 175 to cause participants in a conversation to perceive specific media 176 relationships. In other protocols that deal with the concept of 177 calls, this manipulation is known as call control. In addition to 178 its dialog or policy manipulation aspect, "call control" also 179 includes communicating information and events related to manipulating 180 calls, including information and events dealing with session state 181 and history, conference state, user state, and even message state. 183 Based on input from the SIP community, the authors compiled the 184 following set of goals for SIP call control and multiparty 185 applications: 186 o Define Primitives, Not Services. Allow for a handful of robust 187 yet simple mechanisms that can be combined to deliver features and 188 services. Throughout this document we refer to these simple 189 mechanisms as "primitives". Primitives should be sufficiently 190 robust so that when they are combined with each other, they can be 191 used to build lots of services. However, the goal is not to 192 define a provably complete set of primitives. Note that while the 193 IETF will NOT standardize behavior or services, it may define 194 example services for informational purposes, as in service 195 examples [RFC5359]. 196 o Participant oriented. The primitives should be designed to 197 provide services that are oriented around the experience of the 198 participants. The authors observe that end users of features and 199 services usually don't care how a media relationship is setup. 200 Their ultimate experience is based only on the resulting media and 201 other externally visible characteristics. 202 o Signaling Model independent: Support both a central control and a 203 peer-to-peer feature invocation model (and combinations of the 204 two). Baseline SIP already supports a centralized control model 205 described in 3pcc (third party call control) [RFC3725], and the 206 SIP community has expressed a great deal of interest in peer-to- 207 peer or distributed call control using primitives such as those 208 defined in REFER [RFC3515], Replaces [RFC3891], and Join 209 [RFC3911]. 211 o Mixing Model independent: The bulk of interesting multiparty 212 applications involve mixing or combining media from multiple 213 participants. This mixing can be performed by one or more of the 214 participants, or by a centralized mixing resource. The experience 215 of the participants should not depend on the mixing model used. 216 While most examples in this document refer to audio mixing, the 217 framework applies to any media type. In this context a "mixer" 218 refers to combining media of the same type in an appropriate, 219 media-specific way. This is consistent with the model described 220 in the SIP conferencing framework. 221 o Invoker oriented. Only the user who invokes a feature or a 222 service needs to know exactly which service is invoked or why. 223 This is good because it allows new services to be created without 224 requiring new primitives from all the participants; and it allows 225 for much simpler feature authorization policies, for example, when 226 participation spans organizational boundaries. As discussed in 227 section 2.7, this also avoids exponential state explosion when 228 combining features. The invoker only has to manage a user 229 interface or API to prevent local feature interactions. All the 230 other participants simply need to manage the feature interactions 231 of a much smaller number of primitives. 232 o Primitives make full use of URIs (uniform resource identifiers). 233 URIs are a very powerful mechanism for describing users and 234 services. They represent a plentiful resource that can be 235 extremely expressive and easily routed, translated, and 236 manipulated--even across organizational boundaries. URIs can 237 contain special parameters and informational headers that need 238 only be relevant to the owner of the namespace (domain) of the 239 URI. Just as a user who selects an http: URL need not understand 240 the significance and organization of the web site it references, a 241 user may encounter a SIP URI that translates into an email-style 242 group alias, that plays a pre-recorded message, or runs some 243 complex call-handling logic. Note that while this may seem 244 paradoxical to the previous goal, both goals can be satisfied by 245 the same model. 246 o Make use of SIP headers and SIP event packages to provide SIP 247 entities with information about their environment. These should 248 include information about the status / handling of dialogs on 249 other user agents, information about the history of other contacts 250 attempted prior to the current contact, the status of 251 participants, the status of conferences, user presence 252 information, and the status of messages. 253 o Encourage service decomposition, and design to make use of 254 standard components using well-defined, simple interfaces. Sample 255 components include a SIP mixer, recording service, announcement 256 server, and voice dialog server. (This is not an exhaustive 257 list). 259 o Include authentication, authorization, policy, logging, and 260 accounting mechanisms to allow these primitives to be used safely 261 among mutually untrusted participants. Some of these mechanisms 262 may be used to assist in billing, but no specific billing system 263 will be endorsed. 264 o Permit graceful fallback to baseline SIP. Definitions for new SIP 265 call control extensions/primitives must describe a graceful way to 266 fallback to baseline SIP behavior. Support for one primitive must 267 not imply support for another primitive. 268 o There is no desire or goal to reinvent traditional models, such as 269 the model used the H.450 family of protocols, JTAPI (Java 270 Telephony Application Programming Interface), or the CSTA 271 (Computer-supported telecommunications applications) call model, 272 as these other models do not share the design goals presented in 273 this document. 275 Note that the flexibility in this model does have some disadvantages 276 in terms of interoperability. It is possible to build a call control 277 feature in SIP using different combinations of primitives. For a 278 discussion of the issues associated with this, see 279 [I-D.ietf-bliss-problem-statement]. 281 2. Key Concepts 283 This section introduces a number of key concepts which will be used 284 to describe and explain various call control operations and services 285 in the remainder of this document. This includes the conversation 286 space model, signaling and mixing models, common components, and the 287 use of URIs. 289 2.1. "Conversation Space" Model 291 This document introduces the concept of an abstract "conversation 292 space" as a set of participants who believe they are all 293 communicating among one another. Each conversation space contains 294 one or more participants. 296 Participants are SIP User Agents that send original media to or 297 terminate and receive media from other members of the conversation 298 space. Logically, every participant in the conversation space has 299 access to all the media generated in that space (this is strictly 300 true if all participants share a common media type). A SIP User 301 Agent that does not contribute or consume any media is NOT a 302 participant; nor is a user agent that merely forwards, transcoders, 303 mixes, or selects media originating elsewhere in the conversation 304 space. 306 Note that a conversation space consists of zero or more SIP calls 307 or SIP conferences. A conversation space is similar to the 308 definition of a "call" in some other call models. 310 Participants may represent human users or non-human users (referred 311 to as robots or automatons in this document). Some participants may 312 be hidden within a conversation space. Some examples of hidden 313 participants include: robots that generate tones, images, or 314 announcements during a conference to announce users arriving and 315 departing, a human call center supervisor monitoring a conversation 316 between a trainee and a customer, and robots that record media for 317 training or archival purposes. 319 Participants may also be active or passive. Active participants are 320 expected to be intelligent enough to leave a conversation space when 321 they no longer desire to participate. (An attentive human 322 participant is obviously active.) Some robotic participants (such as 323 a voice messaging system, an instant messaging agent, or a voice 324 dialog system) may be active participants if they can leave the 325 conversation space when there is no human interaction. Other robots 326 (for example our tone generating robot from the previous example) are 327 passive participants. A human participant "on-hold" is passive. 329 An example diagram of a conversation space can be shown as a "bubble" 330 or ovals, or as a "set" in curly or square brace notation. Each set, 331 oval, or "bubble" represents a conversation space. Hidden 332 participants are shown in lowercase letters. Examples are given in 333 Figure 1. 335 Note that while the term "conversation" usually applies to oral 336 exchange of information, we apply the conversation space model to any 337 media exchange between participants. 339 { A , B } [ A , b, C, D ] 341 .-. .---. 342 / \ / \ 343 / A \ / A b \ 344 ( ) ( ) 345 \ B / \ C D / 346 \ / \ / 347 '-' '---' 349 Figure 1. Conversation Spaces. 351 2.2. Relationship Between Conversation Space, SIP Dialogs, and SIP 352 Sessions 354 In SIP, a call is "an informal term that refers to some communication 355 between peers, generally set up for the purposes of a multimedia 356 conversation." The concept of a conversation space is needed because 357 the SIP definition of call is not sufficiently precise for the 358 purpose of describing the user experience of multiparty features. 360 Do any other definitions convey the correct meaning? SIP, and SDP 361 (Session Description Protocol) [RFC4566] both define a conference as 362 "a multimedia session identified by a common session description." A 363 session is defined as "a set of multimedia senders and receivers and 364 the data streams flowing from senders to receivers." The definition 365 of "call" in some call models is more similar to our definition of a 366 conversation space. 368 Some examples of the relationship between conversation spaces, SIP 369 dialogs, and SIP sessions are listed below. In each example, a human 370 user will perceive that there is a single call. 371 o A simple two-party call is a single conversation space, a single 372 session, and a single dialog. 373 o A locally mixed three-way call is two sessions and two dialogs. 374 It is also a single conversation space. 375 o A simple dial-in audio conference is a single conversation space, 376 but is represented by as many dialogs and sessions as there are 377 human participants. 378 o A multicast conference is a single conversation space, a single 379 session, and as many dialogs as participants. 381 2.3. Signaling Models 383 Obviously to make changes to a conversation space, you must be able 384 to use SIP signaling to cause these changes. Specifically there must 385 be a way to manipulate SIP dialogs (call legs) to move participants 386 into and out of conversation spaces. Although this is not as 387 obvious, there also must be a way to manipulate SIP dialogs to 388 include non-participant user agents that are otherwise involved in a 389 conversation space (e.g., back-to-back user agents or B2BUAs, third 390 party call control 3pcc controllers, mixers, transcoders, 391 translators, or relays). 393 Implementations may setup the media relationships described in the 394 conversation space model using a centralized control model. One 395 common way to implement this using SIP is known as 3rd Party Call 396 Control (3pcc) and is described in 3pcc [RFC3725]. The 3pcc approach 397 relies on only the following 3 primitive operations: 399 o Create a new dialog (INVITE) 400 o Modify a dialog (reINVITE) 401 o Destroy a dialog (BYE) 403 The main advantage of the 3pcc approach is that it only requires very 404 basic SIP support from end systems to support call control features. 405 As such, third-party call control is a natural way to handle protocol 406 conversion and mid-call features. It also has the advantage and 407 disadvantage that new features can/must be implemented in one place 408 only (the controller), and neither requires enhanced client 409 functionality, nor takes advantage of it. 411 In addition, a peer-to-peer approach is discussed at length in this 412 draft. The primary drawback of the peer-to-peer model is additional 413 complexity in the end system and authentication and management 414 models. The benefits of the peer-to-peer model include: 415 o state remains at the edges 416 o call signaling need only go through participants involved (there 417 are no additional points of failure) 418 o peers may take advantage of end-to-end message integrity or 419 encryption 421 The peer-to-peer approach relies on additional "primitive" 422 operations, some of which are identified here. 423 o Replace an existing dialog 424 o Join a new dialog with an existing dialog 425 o Locally perform media forking (multi-unicast) 426 o Ask another User Agent (UA) to send a request on your behalf 428 The peer-to-peer approach also only results in a single SIP dialog, 429 directly between the two UAs. The 3pcc approach results in two SIP 430 dialogs, between each UA and the controller. As a result, the SIP 431 features and extensions that will be used during the dialog are 432 limited to the those understood by the controller. As a result, in a 433 situation where both the UAs support an advanced SIP feature but the 434 controller does not, the feature will not be able to be used. 436 Many of the features, primitives, and actions described in this 437 document also require some type of media mixing, combining, or 438 selection as described in the next section. 440 2.4. Mixing Models 442 SIP permits a variety of mixing models, which are discussed here 443 briefly. This topic is discussed more thoroughly in the SIP 444 conferencing framework [RFC4353] and [RFC4579]. SIP supports both 445 tightly-coupled and loosely-coupled conferencing, although more 446 sophisticated behavior is available in tightly-coupled conferences. 448 In a tightly-coupled conference, a single SIP user agent (called the 449 focus) has a direct dialog relationship with each participant (and 450 may control non participant user agents as well). The focus can 451 authoritatively publish information about the character and 452 participants in a conference. In a loosely-coupled conference there 453 is no coordinated signaling relationships among the participants. 455 For brevity, only the two most popular conferencing models are 456 significantly discussed in this document (local and centralized 457 mixing). Applications of the conversation spaces model to loosely- 458 coupled multicast and distributed full unicast mesh conferences are 459 left as an exercise for the reader. Note that a distributed full 460 mesh conference can be used for basic conferences, but does not 461 easily allow for more complex conferencing actions like splitting, 462 merging, and sidebars. 464 Call control features should be designed to allow a mixer (local or 465 centralized) to decide when to reduce a conference back to a 2-party 466 call, or drop all the participants (for example if only two 467 automatons are communicating). The actual heuristics used to release 468 calls are beyond the scope of this document, but may depend on 469 properties in the conversation space, such as the number of active, 470 passive, or hidden participants; and the send-only, receive-only, or 471 send-and-receive orientation of various participants. 473 2.4.1. Tightly Coupled 475 Tightly coupled conferences utilize a central point for signaling and 476 authentication known as a focus [RFC4353]. The actual media can be 477 centrally mixed or distributed. 479 2.4.1.1. (Single) End System Mixing 481 The first model we call "end system mixing". In this model, user A 482 calls user B, and they have a conversation. At some point later, A 483 decides to conference in user C. To do this, A calls C, using a 484 completely separate SIP call. This call uses a different Call-ID, 485 different tags, etc. There is no call set up directly between B and 486 C. No SIP extension or external signaling is needed. A merely 487 decides to locally join two dialogs. 489 B C 490 \ / 491 \ / 492 A 494 Figure 2. End System mixing Example. 496 In Figure 2, A receives media streams from both B and C, and mixes 497 them. A sends a stream containing A's and C's streams to B, and a 498 stream containing A's and B's streams to C. Basically, user A handles 499 both signaling and media mixing. 501 2.4.1.2. Centralized Mixing 503 In a centralized mixing model, all participants have a pairwise SIP 504 and media relationship with the mixer. Common applications of 505 centralized mixing include ad-hoc conferences and scheduled dial-in 506 or dial-out conferences. In Figure 3 below, the mixer M receives and 507 sends media to participants A, B, C, D, and E. 509 B C 510 \ / 511 \ / 512 M --- A 513 / \ 514 / \ 515 D E 517 Figure 3. Centralized Mixing Example. 519 2.4.1.3. Centralized Signaling, Distributed Media 521 In this conferencing model, there is a centralized controller, as in 522 the dial-in and dial-out cases. However, the centralized server 523 handles signaling only. The media is still sent directly between 524 participants, using either multicast or multi-unicast. Participants 525 perform their own mixing. Multi-unicast is when a user sends 526 multiple packets (one for each recipient, addressed to that 527 recipient). This is referred to as a "Decentralized Multipoint 528 Conference" in [H.323]. Full mesh media with centralized mixing is 529 another approach. 531 2.4.2. Loosely Coupled 533 In these models, there is no point of central control of SIP 534 signaling. As in the "Centralized Signaling, Distributed Media" case 535 above, all endpoints send media to all other endpoints. Consequently 536 every endpoint mixes their own media from all the other sources, and 537 sends their own media to every other participant. 539 2.4.2.1. Large-Scale Multicast Conferences 541 Large-scale multicast conferences were the original motivation for 542 both the Session Description Protocol SDP [RFC4566] and SIP. In a 543 large-scale multicast conference, one or more multicast addresses are 544 allocated to the conference. Each participant joins those multicast 545 groups, and sends their media to those groups. Signaling is not sent 546 to the multicast groups. The sole purpose of the signaling is to 547 inform participants of which multicast groups to join. Large-scale 548 multicast conferences are usually pre-arranged, with specific start 549 and stop times. However, multicast conferences do not need to be 550 pre-arranged, so long as a mechanism exists to dynamically obtain a 551 multicast address. 553 2.4.2.2. Full Distributed Unicast Conferencing 555 In this conferencing model, each participant has both a pairwise 556 media relationship and a pairwise signaling relationship with every 557 other participant (a full mesh). This model requires a mechanism to 558 maintain a consistent view of distributed state across the group. 559 This is a classic hard problem in computer science. Also, this model 560 does not scale well for large numbers of participants. because for 561 participants the number of media and signaling relationships is 562 approximately n-squared. As a result, this model is not generally 563 available in commercial implementations; to the contrary it is 564 primarily the topic of research or experimental implementations. 565 Note that this model assumes peer-to-peer signaling. 567 2.5. Conveying Information and Events 569 Participants should have access to information about the other 570 participants in a conversation space, so that this information can be 571 rendered to a human user or processed by an automaton. Although some 572 of this information may be available from the Request-URI or To, 573 From, Contact, or other SIP headers, another mechanism of reporting 574 this information is necessary. 576 Many applications are driven by knowledge about the progress of calls 577 and conferences. In general these types of events allow for the 578 construction of distributed applications, where the application 579 requires information on dialog and conference state, but is not 580 necessarily co-resident with an endpoint user agent or conference 581 server. For example, a focus involved in a conversation space may 582 wish to provide URIs for conference status, and/or conference/floor 583 control. 585 The SIP Events [RFC3265] architecture defines general mechanisms for 586 subscription to and notification of events within SIP networks. It 587 introduces the notion of a package that is a specific "instantiation" 588 of the events mechanism for a well-defined set of events. 590 Event packages are needed to provide the status of a user's dialogs, 591 provide the status of conferences and their participants, provide 592 user presence information, provide the status of registrations, and 593 provide the status of user's messages. While this is not an 594 exhaustive list, these are sufficient to enable the sample features 595 described in this document. 597 The conference event package [RFC4575] allows users to subscribe to 598 information about an entire tightly-coupled SIP conference. 599 Notifications convey information about the participants such as: the 600 SIP URI identifying each user, their status in the space (active, 601 declined, departed), URIs to invoke other features (such as sidebar 602 conversations), links to other relevant information (such as floor 603 control policies), and if floor control policies are in place, the 604 user's floor control status. For conversation spaces created from 605 cascaded conferences, conversation state can be gathered from 606 relevant foci and merged into a cohesive set of state. 608 The dialog package [RFC4235] provides information about all the 609 dialogs the target user is maintaining, what conversations the user 610 in participating in, and how these are correlated. Likewise the 611 registration package [RFC3680] provides notifications when contacts 612 have changed for a specific address-of-record. The combination of 613 these allows a user agent to learn about all conversations occurring 614 for the entire registered contact set for an address-of-record. 616 Note that user presence in SIP [RFC3856] has a close relationship 617 with these later two event packages. It is fundamental to the 618 presence model that the information used to obtain user presence is 619 constructed from any number of different input sources. Examples of 620 other such sources include calendaring information and uploads of 621 presence documents. These two packages can be considered another 622 mechanism that allows a presence agent to determine the presence 623 state of the user. Specifically, a user presence server can act as a 624 subscriber for the dialog and registration packages to obtain 625 additional information that can be used to construct a presence 626 document. 628 The multi-party architecture may also need to provide a mechanism to 629 get information about the status /handling of a dialog (for example, 630 information about the history of other contacts attempted prior to 631 the current contact). Finally, the architecture should provide ample 632 opportunities to present informational URIs that relate to calls, 633 conversations, or dialogs in some way. For example, consider the SIP 634 Call-Info header, or Contact headers returned in a 300-class 635 response. Frequently additional information about a call or dialog 636 can be fetched via non-SIP URIs. For example, consider a web page 637 for package tracking when calling a delivery company, or a web page 638 with related documentation when joining a dial-in conference. The 639 use of URIs in the multiparty framework is discussed in more detail 640 in Section 3.7. 642 Finally the interaction of SIP with stimulus-signaling-based 643 applications, that allow a user agent to interact with an application 644 without knowledge of the semantics of that application, is discussed 645 in the SIP application interaction framework [RFC5629]. Stimulus 646 signaling can occur to a user interface running locally with the 647 client, or to a remote user interface, through media streams. 648 Stimulus signaling encompasses a wide range of mechanisms, ranging 649 from clicking on hyperlinks, to pressing buttons, to traditional Dual 650 Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling 651 is supported through the use of markup languages, which play a key 652 role in that framework. 654 2.6. Componentization and Decomposition 656 This framework proposes a decomposed component architecture with a 657 very loose coupling of services and components. This means that a 658 service (such as a conferencing server or an auto-attendant) need not 659 be implemented as an actual server. Rather, these services can be 660 built by combining a few basic components in straightforward or 661 arbitrarily complex ways. 663 Since the components are easily deployed on separate boxes, by 664 separate vendors, or even with separate providers, we achieve a 665 separation of function that allows each piece to be developed in 666 complete isolation. We can also reuse existing components for new 667 applications. This allows rapid service creation, and the ability 668 for services to be distributed across organizational domains anywhere 669 in the Internet. 671 For many of these components it is also desirable to discover their 672 capabilities, for example querying the ability of a mixer to host a 673 10 dialog conference, or to reserve resources for a specific time. 674 These actions could be provided in the form of URIs, provided there 675 is an a priori means of understanding their semantics. For example 676 if there is a published dictionary of operations, a way to query the 677 service for the available operations and the associated URIs, the URI 678 can be the interface for providing these service operations. This 679 concept is described in more detail in the context of dialog 680 operations in Section 3. 682 2.6.1. Media Intermediaries 684 Media Intermediaries are not participants in any conversation space, 685 although an entity that is also a media translator may also have a 686 co-located participant component (for example a mixer that also 687 announces the arrival of a new participant; the announcement portion 688 is a participant, but the mixer itself is not). Media intermediaries 689 should be as transparent as possible to the end users--offering a 690 useful, fundamental service; without getting in the way of new 691 features implemented by participants. Some common media 692 intermediaries are described below. 694 2.6.1.1. Mixer 696 A SIP mixer is a component that combines media from all dialogs in 697 the same conversation in a media specific way. For example, the 698 default combining for an audio conference might be an N-1 699 configuration, while a text mixer might interleave text messages on a 700 per-line basis. More details about how to manipulate the media 701 policy used by mixers is being discussed in [I-D.ietf-xcon-ccmp]. 703 2.6.1.2. Transcoder 705 A transcoder translates media from one encoding or format to another 706 (for example, GSM (Global System for Mobile communications) voice to 707 G.711, MPEG2 to H.261, or text/html to text/plain), or from one media 708 type to another (for example text to speech). A more thorough 709 discussion of transcoding is described in SIP transcoding services 710 invocation [RFC5369]. 712 2.6.1.3. Media Relay 714 A media relay terminates media and simply forwards it to a new 715 destination without changing the content in any way. Sometimes media 716 relays are used to provide source IP address anonymity, to facilitate 717 middlebox traversal, or to provide a trusted entity where media can 718 be forcefully disconnected. 720 2.6.1.4. Queue Server 722 A queue server is a location where calls can be entered into one of 723 several FIFO (first-in, first-out) queues. A queue server would 724 subscribe to the presence of groups or individuals who are interested 725 in its queues. When detecting that a user is available to service a 726 queue, the server redirects or transfers the last call in the 727 relevant queue to the available user. On a queue-by-queue basis, 728 authorized users could also subscribe to the call state (dialog 729 information) of calls within a queue. Authorized users could use 730 this information to effectively pluck (take) a call out of the queue 731 (for example by sending an INVITE with a Replaces header to one of 732 the user agents in the queue). 734 2.6.1.5. Parking Place 736 A parking place is a location where calls can be terminated 737 temporarily and then retrieved later. While a call is "parked", it 738 can receive media "on-hold" such as music, announcements, or 739 advertisements. Such a service could be further decomposed such that 740 announcements or music are handled by a separate component. 742 2.6.1.6. Announcements and Voice Dialogs 744 An announcement server is a server that can play digitized media 745 (frequently audio), such as music or recorded speech. These servers 746 are typically accessible via SIP, HTTP (Hyper Text Transport 747 Protocol), or RTSP (Real-Time Streaming Protocol). An analogous 748 service is a recording service that stores digitized media. A 749 convention for specifying announcements in SIP URIs is described in 750 [RFC4240]. Likewise the same server could easily provide a service 751 that records digitized media. 753 A "voice dialog" is a model of spoken interactive behavior between a 754 human and an automaton that can include synthesized speech, digitized 755 audio, recognition of spoken and DTMF key input, recording of spoken 756 input, and interaction with call control. Voice dialogs frequently 757 consist of forms or menus. Forms present information and gather 758 input; menus offer choices of what to do next. 760 Spoken dialogs are a basic building block of applications that use 761 voice. Consider for example that a voice mail system, the 762 conference-id and passcode collection system for a conferencing 763 system, and complicated voice portal applications all require a voice 764 dialog component. 766 2.6.2. Text-to-Speech and Automatic Speech Recognition 768 Text-to-Speech (TTS) is a service that converts text into digitized 769 audio. TTS is frequently integrated into other applications, but 770 when separated as a component, it provides greater opportunity for 771 broad reuse. Automatic Speech Recognition (ASR) is a service that 772 attempts to decipher digitized speech based on a proposed grammar. 773 Like TTS, ASR services can be embedded, or exposed so that many 774 applications can take advantage of such services. A standardized 775 (decomposed) interface to access standalone TTS and ASR services is 776 currently being developed in [RFC4313]. 778 2.6.3. VoiceXML 780 VoiceXML is a W3C (World Wide Web Consortium) recommendation that was 781 designed to give authors control over the spoken dialog between users 782 and applications. The application and user take turns speaking: the 783 application prompts the user, and the user in turn responds. Its 784 major goal is to bring the advantages of web-based development and 785 content delivery to interactive voice response applications. We 786 believe that VoiceXML represents the ideal partner for SIP in the 787 development of distributed IVR (interactive voice response) servers. 788 VoiceXML is an XML based scripting language for describing IVR 789 services at an abstract level. VoiceXML supports DTMF recognition, 790 speech recognition, text-to-speech, and playing out of recorded media 791 files. The results of the data collected from the user are passed to 792 a controlling entity through an HTTP POST operation. The controller 793 can then return another script, or terminate the interaction with the 794 IVR server. 796 A VoiceXML server also need not be implemented as a monolithic 797 server. Figure 4 shows a diagram of a VoiceXML browser that is split 798 into media and non-media handling parts. The VoiceXML interpreter 799 handles SIP dialog state and state within a VoiceXML document, and 800 sends requests to the media component over another protocol. 802 +-------------+ 803 | | 804 | VoiceXML | 805 | Interpreter | 806 | (signaling) | 807 +-------------+ 808 ^ ^ 809 | | 810 SIP | | RTSP 811 | | 812 | | 813 v v 814 +-------------+ +-------------+ 815 | | | | 816 | SIP UA | RTP | RTSP Server | 817 | |<------>| (media) | 818 | | | | 819 +-------------+ +-------------+ 821 Figure 4. Decomposed VoiceXML Server. 823 2.7. Use of URIs 825 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 826 contexts: the Request-URI; Contact, To, From, and *-Info headers; 827 application/uri bodies; and embedded in email, web pages, instant 828 messages, and ENUM records. The request-URI identifies the user or 829 service that the call is destined for. 831 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 832 SIP content can also specify methods, special parameters, headers, 833 and even bodies. For example: 835 sip:bob@b.example.com;method=REFER?Refer-To=http://example.com/~alice 837 Throughout this draft we discuss call control primitive operations. 838 One of the biggest problems is defining how these operations may be 839 invoked. There are a number of ways to do this. One way is to 840 define the primitives in the protocol itself such that SIP methods 841 (for example REFER) or SIP headers (for example Replaces) indicate a 842 specific call control action. Another way to invoke call control 843 primitives is to define a specific Request-URI naming convention. 844 Either these conventions must be shared between the client (the 845 invoker) and the server, or published by or on behalf of the server. 846 The former involves defining URI construction techniques (e.g. URI 847 parameters and/or token conventions) as proposed in [RFC4240]. The 848 latter technique usually involves discovering the URI via a SIP event 849 package, a web page, a business card, or an Instant Message. Yet 850 another means to acquire the URIs is to define a dictionary of 851 primitives with well-defined semantics and provide a means to query 852 the named primitives and corresponding URIs that may be invoked on 853 the service or dialogs. 855 2.7.1. Naming Users in SIP 857 An address-of-record, or public SIP address, is a SIP (or Secure SIP 858 SIPS) URI that points to a domain with a location service that can 859 map the URI to set of Contact URIs where the user might be available. 860 Typically the Contact URIs are populated via registration. 862 Address of Record Contacts 864 sip:bob@biloxi.example.com -> sip:bob@babylon.biloxi.example.com:5060 865 sip:bbrown@mailbox.provider.example.net 866 sip:+1.408.555.6789@mobile.example.net 868 Callee Capabilities [RFC3840] defines a set of additional parameters 869 to the Contact header that define the characteristics of the user 870 agent at the specified URI. For example, there is a mobility 871 parameter that indicates whether the UA is fixed or mobile. When a 872 user agent registers, it places these parameters in the Contact 873 headers to characterize the URIs it is registering. This allows a 874 proxy for that domain to have information about the contact addresses 875 for that user. 877 When a caller sends a request, it can optionally request Caller 878 Preferences [RFC3841], by including the Accept-Contact, Request- 879 Disposition, and Reject-Contact headers that request certain handling 880 by the proxy in the target domain. These headers contain preferences 881 that describe the set of desired URIs to which the caller would like 882 their request routed. The proxy in the target domain matches these 883 preferences with the Contact characteristics originally registered by 884 the target user. The target user can also choose to run arbitrarily 885 complex "Find-me" feature logic on a proxy in the target domain. 887 There is a strong asymmetry in how preferences for callers and 888 callees can be presented to the network. While a caller takes an 889 active role by initiating the request, the callee takes a passive 890 role in waiting for requests. This motivates the use of callee- 891 supplied scripts and caller preferences included in the call request. 892 This asymmetry is also reflected in the appropriate relationship 893 between caller and callee preferences. A server for a callee should 894 respect the wishes of the caller to avoid certain locations, while 895 the preferences among locations has to be the callee's choice, as it 896 determines where, for example, the phone rings and whether the callee 897 incurs mobile telephone charges for incoming calls. 899 SIP User Agent implementations are encouraged to make intelligent 900 decisions based on the type of participants (active/passive, hidden, 901 human/robot) in a conversation space. This information is conveyed 902 via the dialog package or in a SIP header parameter communicated 903 using an appropriate SIP header. For example, a music on hold 904 service may take the sensible approach that if there are two or more 905 unhidden participants, it should not provide hold music; or that it 906 will not send hold music to robots. 908 Multiple participants in the same conversation space may represent 909 the same human user. For example, the user may use one participant 910 device for video, chat, and whiteboard media on a PC and another for 911 audio media on a SIP phone. In this case, the address-of-record is 912 the same for both user agents, but the Contacts are different. In 913 this case, there is really only one human participant. In addition, 914 human users may add robot participants that act on their behalf (for 915 example a call recording service, or a calendar announcement 916 reminder). Call control features in SIP should continue to function 917 as expected in such an environment. 919 2.7.2. Naming Services with SIP URIs 921 A critical piece of defining a session level service that can be 922 accessed by SIP is defining the naming of the resources within that 923 service. This point cannot be overstated. 925 In the context of SIP control of application components, we take 926 advantage of the fact that the left-hand-side of a standard SIP URI 927 is a user part. Most services may be thought of as user automatons 928 that participate in SIP sessions. It naturally follows that the user 929 part should be utilized as a service indicator. 931 For example, media servers commonly offer multiple services at a 932 single host address. Use of the user part as a service indicator 933 enables service consumers to direct their requests without ambiguity. 934 It has the added benefit of enabling media services to register their 935 availability with SIP Registrars just as any "real" SIP user would. 936 This maintains consistency and provides enhanced flexibility in the 937 deployment of media services in the network. 939 There has been much discussion about the potential for confusion if 940 media services URIs are not readily distinguishable from other types 941 of SIP UAs. The use of a service namespace provides a mechanism to 942 unambiguously identify standard interfaces while not constraining the 943 development of private or experimental services. 945 In SIP, the Request-URI identifies the user or service that the call 946 is destined for. The great advantage of using URIs (specifically, 947 the SIP Request-URI) as a service identifier comes because of the 948 combination of two facts. First, unlike in the PSTN (Public Switched 949 Telephone Network), where the namespace (dialable telephone numbers) 950 are limited, URIs come from an infinite space. They are plentiful, 951 and they are free. Secondly, the primary function of SIP is call 952 routing through manipulations of the Request-URI. In the traditional 953 SIP application, this URI represents a person. However, the URI can 954 also represent a service, as we propose here. This means we can 955 apply the routing services SIP provides to routing of calls to 956 services. The result - the problem of service invocation and service 957 location becomes a routing problem, for which SIP provides a scalable 958 and flexible solution. Since there is such a vast namespace of 959 services, we can explicitly name each service in a finely granular 960 way. This allows the distribution of services across the network. 961 For further discussion about services and SIP URIs, see RFC 3087 962 [RFC3087] 964 Consider a conferencing service, where we have separated the names of 965 ad-hoc conferences from scheduled conferences, we can program proxies 966 to route calls for ad-hoc conferences to one set of servers, and 967 calls for scheduled ones to another, possibly even in a different 968 provider. In fact, since each conference itself is given a URI, we 969 can distribute conferences across servers, and easily guarantee that 970 calls for the same conference always get routed to the same server. 971 This is in stark contrast to conferences in the telephone network, 972 where the equivalent of the URI - the phone number - is scarce. An 973 entire conferencing provider generally has one or two numbers. 974 Conference IDs must be obtained through IVR interactions with the 975 caller, or through a human attendant. This makes it difficult to 976 distribute conferences across servers all over the network, since the 977 PSTN routing only knows about the dialed number. 979 For more examples, consider the URI conventions of RFC 4240 [RFC4240] 980 for media servers and RFC 4458 [RFC4458] for voicemail and IVR 981 systems. 983 In practical applications, it is important that an invoker does not 984 necessarily apply semantic rules to various URIs it did not create. 985 Instead, it should allow any arbitrary string to be provisioned, and 986 map the string to the desired behavior. The administrator of a 987 service may choose to provision specific conventions or mnemonic 988 strings, but the application should not require it. In any large 989 installation, the system owner is likely to have pre-existing rules 990 for mnemonic URIs, and any attempt by an application to define its 991 own rules may create a conflict. Implementations should allow an 992 arbitrary mix of URIs from these schemes, or any other scheme that 993 renders valid SIP URIs to be provisioned, rather than enforce only 994 one particular scheme. 996 As we have shown, SIP URIs represent an ideal, flexible mechanism for 997 describing and naming service resources, regardless if the resources 998 are queues, conferences, voice dialogs, announcements, voicemail 999 treatments, or phone features. 1001 2.8. Invoker Independence 1003 With functional signaling, only the invoker of features in SIP needs 1004 to know exactly which feature they are invoking. One of the primary 1005 benefits of this approach is that combinations of functional features 1006 work in SIP call control without requiring complex feature 1007 interaction matrices. For example, let us examine the combination of 1008 a "transfer" of a call that is "conferenced". 1010 Alice calls Bob. Alice silently "conferences in" her robotic 1011 assistant Albert as a hidden party. Bob transfers Alice to Carol. 1012 If Bob asks Alice to Replace her leg with a new one to Carol then 1013 both Alice and Albert should be communicating with Carol 1014 (transparently). 1016 Using the peer-to-peer model, this combination of features works fine 1017 if A is doing local mixing (Alice replaces Bob's dialog with 1018 Carol's), or if A is using a central mixer (the mixer replaces Bob's 1019 dialog with Carol's). A clever implementation using the 3pcc model 1020 can generate similar results. 1022 New extensions to the SIP Call Control Framework should attempt to 1023 preserve this property. 1025 2.9. Billing issues 1027 Billing in the PSTN is typically based on who initiated a call. At 1028 the moment billing in a SIP network is neither consistent with 1029 itself, nor with the PSTN. (A billing model for SIP should allow for 1030 both PSTN-style billing, and non-PSTN billing.) The example below 1031 demonstrates one such inconsistency. 1033 Alice places a call to Bob. Alice then blind transfers Bob to Carol 1034 through a PSTN gateway. In current usage of REFER, Bob may be billed 1035 for a call he did not initiate (his UA originated the outgoing dialog 1036 however). This is not necessarily a terrible thing, but it 1037 demonstrates a security concern (Bob must have appropriate local 1038 policy to prevent fraud). Also, Alice may wish to pay for Bob's 1039 session with Carol. There should be a way to signal this in SIP. 1041 Likewise a Replacement call may maintain the same billing 1042 relationship as a Replaced call, so if Alice first calls Carol, then 1043 asks Bob to Replace this call, Alice may continue to receive a bill. 1045 Further work in SIP billing should define a way to set or discover 1046 the direction of billing. 1048 3. Catalog of call control actions and sample features 1050 Call control actions can be categorized by the dialogs upon which 1051 they operate. The actions may involve a single or multiple dialogs. 1052 These dialogs can be early or established. Multiple dialogs may be 1053 related in a conversation space to form a conference or other 1054 interesting media topologies. 1056 It should be noted that it is desirable to provide a means by which a 1057 party can discover the actions that may be performed on a dialog. 1058 The interested party may be independent or related to the dialogs. 1059 One means of accomplishing this is through the ability to define and 1060 obtain URIs for these actions as described in Section 2.7.2. 1062 Below are listed several call control "actions" that establish or 1063 modify dialogs and relate the participants in a conversation space. 1064 The names of the actions listed are for descriptive purposes only 1065 (they are not normative). This list of actions is not meant to be 1066 exhaustive. 1068 In the examples, all actions are initiated by the user "Alice" 1069 represented by UA "A". 1071 3.1. Remote Call Control Actions on Early Dialogs 1073 The following are a set of actions that may be performed on a single 1074 early dialog. These actions can be thought of as a set of remote 1075 control operations. For example an automaton might perform the 1076 operation on behalf of a user. Alternatively a user might use the 1077 remote control in the form of an application to perform the action on 1078 the early dialog of a UA that may be out of reach. All of these 1079 actions correspond to telling the UA how to respond to a request to 1080 establish an early dialog. These actions provide useful 1081 functionality for PDA, PC and server based applications that desire 1082 the ability to control a UA. A proposed mechanism for this type of 1083 functionality is described in Remote Call Control 1084 [I-D.audet-sipping-feature-ref]. 1086 3.1.1. Remote Answer 1088 A dialog is in some early dialog state such as 180 Ringing. It may 1089 be desirable to tell the UA to answer the dialog. That is tell it to 1090 send a 200 Ok response to establish the dialog. 1092 3.1.2. Remote Forward or Put 1094 It may be desirable to tell the UA to respond with a 3xx class 1095 response to forward an early dialog to another UA. 1097 3.1.3. Remote Busy or Error Out 1099 It may be desirable to instruct the UA to send an error response such 1100 as 486 Busy Here. 1102 3.2. Remote Call Control Actions on Single Dialogs 1104 There is another useful set of actions that operate on a single 1105 established dialog. These operations are useful in building 1106 productivity applications for aiding users to control their phone. 1107 For example a Customer Relationship Management (CRM) application that 1108 sets up calls for a user eliminating the need for the user to 1109 actually enter an address. These operations can also be thought of a 1110 remote control actions. A proposed mechanism for this type of 1111 functionality is described in Remote Call Control 1112 [I-D.audet-sipping-feature-ref]. 1114 3.2.1. Remote Dial 1116 This action instructs the UA to initiate a dialog. This action can 1117 be performed using the REFER method. 1119 3.2.2. Remote On and Off Hold 1121 This action instructs the UA to put an established dialog on hold. 1122 Though this operation can conceptually be performed with the REFER 1123 method, there is no semantics defined as to what the referred party 1124 should do with the SDP. There is no way to distinguish between the 1125 desire to go on or off hold on a per media stream basis. 1127 3.2.3. Remote Hangup 1129 This action instructs the UA to terminate an early or established 1130 dialog. A REFER request with the following Refer-To URI and Target- 1131 Dialog header field [RFC4538] performs this action. Note: this 1132 example does not show the full set of header fields. 1134 REFER sip:carol@client.chicago.net SIP/2.0 1135 Refer-To: sip:bob@babylon.biloxi.example.com;method=BYE 1136 Target-Dialog: 13413098;local-tag=879738;remote-tag=023214 1138 3.3. Call Control Actions on Multiple Dialogs 1140 These actions apply to a set of related dialogs. 1142 3.3.1. Transfer 1144 This section describes how call transfer can be achieved using 1145 centralized (3pcc) and peer-to-peer (REFER) approaches. 1147 The conversation space changes as follows: 1149 before after 1150 { A , B } --> { C , B } 1152 A replaces itself with C. 1154 To make this happen using the peer-to-peer approach, "A" would send 1155 two SIP requests. A shorthand for those requests is shown below: 1157 REFER B Refer-To:C 1158 BYE B 1160 To make this happen instead using the 3pcc approach, the controller 1161 sends requests represented by the shorthand below: 1163 INVITE C (w/SDP of B) 1164 reINVITE B (w/SDP of C) 1165 BYE A 1167 Features enabled by this action: 1169 - blind transfer 1170 - transfer to a central mixer (some type of conference or forking) 1171 - transfer to park server (park) 1172 - transfer to music on hold or announcement server 1173 - transfer to a "queue" 1174 - transfer to a service (such as Voice Dialogs service) 1175 - transition from local mixer to central mixer 1177 This action is frequently referred to as "completing an attended 1178 transfer". It is described in more detail in [RFC5589]. 1180 Note that if a transfer requires URI hiding or privacy, then the 3pcc 1181 approach can more easily implement this. For example, if the URI of 1182 C needs to be hidden from B, then the use of 3pcc helps accomplish 1183 this. 1185 3.3.2. Take 1187 The conversation space changes as follows: 1189 { B , C } --> { B , A } 1191 A forcibly replaces C with itself. In most uses of this primitive, A 1192 is just "un-replacing" itself. 1194 Using the peer-to-peer approach, "A" sends: 1196 INVITE B Replaces: 1198 Using the 3pcc approach (all requests sent from controller): 1200 INVITE A (w/SDP of B) 1201 reINVITE B (w/SDP of A) 1202 BYE C 1204 Features enabled by this action: 1206 - transferee completes an attended transfer 1207 - retrieve from central mixer (not recommended) 1208 - retrieve from music on hold or park 1209 - retrieve from queue 1210 - call center take 1211 - voice portal resuming ownership of a call it originated 1212 - answering-machine style screening (pickup) 1213 - pickup of a ringing call (i.e.,, early dialog) 1215 Note: that pick up of a ringing call has perhaps some interesting 1216 additional requirements. First of all it is an early dialog as 1217 opposed to an established dialog. Secondly the party which is to 1218 pickup the call may only wish to do so only while it is an early 1219 dialog. That is in the race condition where the ringing UA accepts 1220 just before it receives signaling from the party wishing to take the 1221 call, the taking party wishes to yield or cancel the take. The goal 1222 is to avoid yanking an answered call from the called party. 1224 This action is described in Replaces [RFC3891] and in [RFC5589]. 1226 3.3.3. Add 1228 Note that the following 4 actions are described in [RFC4579]. 1230 This is merely adding a participant to a SIP conference. The 1231 conversation space changes as follows: 1233 { A , B } --> { A , B , C } 1235 A adds C to the conversation. 1237 Using the peer-to-peer approach, adding a party using local mixing 1238 requires no signaling. To transition from a 2-party call or a 1239 locally mixed conference to centrally mixing A could send the 1240 following requests: 1242 REFER B Refer-To: conference-URI 1243 INVITE conference-URI 1244 BYE B 1246 To add a party to a conference: 1248 REFER C Refer-To: conference-URI 1249 or 1250 REFER conference-URI Refer-To: C 1252 Using the 3pcc approach to transition to centrally mixed, the 1253 controller would send: 1255 INVITE mixer leg 1 (w/SDP of A) 1256 INVITE mixer leg 2 (w/SDP of B) 1257 INVITE C (late SDP) 1258 reINVITE A (w/SDP of mixer leg 1) 1259 reINVITE B (w/SDP of mixer leg 2) 1260 INVITE mixer leg3 (w/SDP of C) 1262 To add a party to a SIP conference: 1264 INVITE C (late SDP) 1265 INVITE conference-URI (w/SDP of C) 1267 Features enabled: 1269 - standard conference feature 1270 - call recording 1271 - answering-machine style screening (screening) 1273 3.3.4. Local Join 1275 The conversation space changes like this: 1277 { A , B } , { A , C } --> { A , B , C } 1279 or like this 1281 { A , B } , { C , D } --> { A , B , C , D } 1283 A takes two conversation spaces and joins them together into a single 1284 space. 1286 Using the peer-to-peer approach, A can mix locally, or REFER the 1287 participants of both conversation spaces to the same central mixer 1288 (as in 3.3.5). 1290 For the 3pcc approach, the call flows for inserting participants, and 1291 joining and splitting conversation spaces are tedious yet 1292 straightforward, so these are left as an exercise for the reader. 1294 Features enabled: 1296 - standard conference feature 1297 - leaving a sidebar to rejoin a larger conference 1299 3.3.5. Insert 1301 The conversation space changes like this: 1303 { B , C } --> { A , B , C } 1305 A inserts itself into a conversation space. 1307 A proposed mechanism for signaling this using the peer-to-peer 1308 approach is to send a new header in an INVITE with "joining" 1309 [RFC3911] semantics. For example: 1311 INVITE B Join: 1313 If B accepted the INVITE, B would accept responsibility to setup the 1314 dialogs and mixing necessary (for example: to mix locally or to 1315 transfer the participants to a central mixer) 1317 Features enabled: 1319 - barge-in 1320 - call center monitoring 1321 - call recording 1323 3.3.6. Split 1325 { A , B , C , D } --> { A , B } , { C , D } 1327 If using a central conference with peer-to-peer 1329 REFER C Refer-To: conference-URI (new URI) 1330 REFER D Refer-To: conference-URI (new URI) 1331 BYE C 1332 BYE D 1334 Features enabled: 1336 - sidebar conversations during a larger conference 1338 3.3.7. Near fork 1340 A participates in two conversation spaces simultaneously: 1342 { A, B } --> { B , A } & { A , C } 1344 A is a participant in two conversation spaces such that A sends the 1345 same media to both spaces, and renders media from both spaces, 1346 presumably by mixing or rendering the media from both. We can define 1347 that A is the "anchor" point for both forks, each of which is a 1348 separate conversation space. 1350 This action is purely local implementation (it requires no special 1351 signaling). Local features such as switching calls between the 1352 background and foreground are possible using this media relationship. 1354 3.3.8. Far fork 1356 The conversation space diagram... 1358 { A, B } --> { A , B } & { B , C } 1360 A requests B to be the "anchor" of two conversation spaces. 1362 This is easily setup by creating a conference with two sub- 1363 conferences and setting the media policy appropriately such that B is 1364 a participant in both. Media forking can also be setup using 3pcc as 1365 described in Section 5.1 of RFC3264 [RFC3264] (an offer/answer model 1366 for SDP). The session descriptions for forking are quite complex. 1367 Controllers should verify that endpoints can handle forked media, for 1368 example using prior configuration. 1370 Features enabled: 1372 - barge-in 1373 - voice portal services 1374 - whisper 1375 - key word detection 1376 - sending DTMF somewhere else 1378 4. Security Considerations 1380 Call Control primitives provide a powerful set of features that can 1381 be dangerous in the hands of an attacker. To complicate matters, 1382 call control primitives are likely to be automatically authorized 1383 without direct human oversight. 1385 The class of attacks that are possible using these tools includes the 1386 ability to eavesdrop on calls, disconnect calls, redirect calls, 1387 render irritating content (including ringing) at a user agent, cause 1388 an action that has billing consequences, subvert billing (theft-of- 1389 service), and obtain private information. Call control extensions 1390 must take extra care to describe how these attacks will be prevented. 1392 We can also make some general observations about authorization and 1393 trust with respect to call control. The security model is 1394 dramatically dependent on the signaling model chosen (see section 1395 2.3) 1397 Let us first examine the security model used in the 3pcc approach. 1398 All signaling goes through the controller, which is a trusted entity. 1399 Traditional SIP authentication and hop-by-hop encryption and message 1400 integrity work fine in this environment, but end-to-end encryption 1401 and message integrity may not be possible. 1403 When using the peer-to-peer approach, call control actions and 1404 primitives can be legitimately initiated by a) an existing 1405 participant in the conversation space, b) a former participant in the 1406 conversation space, or c) an entity trusted by one of the 1407 participants. For example, a participant always initiates a 1408 transfer; a retrieve from Park (a take) is initiated on behalf of a 1409 former participant; and a barge-in (insert or far-fork) is initiated 1410 by a trusted entity (an operator for example). 1412 Authenticating requests by an existing participant or a trusted 1413 entity can be done with baseline SIP mechanisms. In the case of 1414 features initiated by a former participant, these should be protected 1415 against replay attacks, e.g. by using a unique name or identifier per 1416 invocation. The Replaces header exhibits this behavior as a by- 1417 product of its operation (once a Replaces operation is successful, 1418 the dialog being Replaced no longer exists). These credentials may 1419 for example need to be passed transitively or fetched in an event 1420 body. 1422 To authorize call control primitives that trigger special behavior 1423 (such as an INVITE with Replaces or Join semantics), the receiving 1424 user agent may have trouble finding appropriate credentials with 1425 which to challenge or authorize the request, as the sender may be 1426 completely unknown to the receiver, except through the introduction 1427 of a third party. These credentials need to be passed transitively 1428 in some way or fetched in an event body, for example. 1430 Standard SIP privacy and anonymity mechanisms such as [RFC3323] and 1431 [RFC3325] used during SIP session establishment apply equally well to 1432 SIP call control operations. SIP call control mechanisms should 1433 address privacy and anonymity issues associated with that operation. 1434 For example, privacy during a transfer operation using REFER is 1435 discussed in Section 7.2 of [RFC5589] 1437 5. IANA Considerations 1439 This document required no action by IANA. 1441 6. Appendix A: Example Features 1443 Primitives are defined in terms of their ability to provide features. 1444 These example features should require an amply robust set of services 1445 to demonstrate a useful set of primitives. They are described here 1446 briefly. Note that the descriptions of these features are non- 1447 normative. Note also that this document describes a mixture of both 1448 features originating in the world of telephones, and features that 1449 are clearly Internet oriented. 1451 6.1. Attended Transfer 1453 In Attended Transfer [RFC5589] the transferring party establishes a 1454 session with the transfer target before completing the transfer. 1456 6.2. Auto Answer 1458 In Auto Answer, calls to a certain address or URI answer immediately 1459 via a speakerphone. The Answer-Mode [RFC5373] header field can be 1460 used for this feature. 1462 6.3. Automatic Callback 1464 In Automatic Callback [RFC5359], Alice calls Bob, but Bob is busy. 1465 Alice would like Bob to call her automatically when he is available. 1466 When Bob hangs up, Alice's phone rings. When Alice answers, Bob's 1467 phone rings. Bob answers and they talk. 1469 6.4. Barge-in 1471 In Barge-in, Carol interrupts Alice who has a call in-progress call 1472 with Bob. In some variations, Alice forcibly joins a new conversation 1473 with Carol, in other variations, all three parties are placed in the 1474 same conversation (basically a 3-way conference). Barge-in works the 1475 same as call monitoring except that it must indicate that the send 1476 media stream to be mixed so that all of the other parties can hear 1477 the stream from the UA which is barging in. 1479 6.5. Blind Transfer 1481 In Blind Transfer [RFC5589], Alice is in a conversation with Bob. 1482 Alice asks Bob to contact Carol, but makes no attempt to contact 1483 Carol independently. In many implementations, Alice does not verify 1484 Bob's success or failure in contacting Carol. 1486 6.6. Call Forwarding 1488 In call forwarding [RFC5359], before a dialog is accepted it is 1489 redirected to another location, for example, because the originally 1490 intended recipient is busy, does not answer, is disconnected from the 1491 network, configured all requests to go somewhere else. 1493 6.7. Call Monitoring 1495 Call monitoring is a Join [RFC3911] operation. For example, a call 1496 center supervisor joins an in-progress call for monitoring purposes. 1497 The monitoring UA sends a Join to the dialog it wants to listen to. 1498 It is able to discover the dialog via the dialog state on the 1499 monitored UA. The monitoring UA sends SDP in the INVITE that 1500 indicates receive only media. As the UA is monitoring only it does 1501 not matter whether the UA indicates it wishes the send stream be mix 1502 or point to point. 1504 6.8. Call Park 1506 In Call Park [RFC5359], a participant parks a call (essentially puts 1507 the call on hold), and then retrieves it at a later time (typically 1508 from another location). Call park requires the ability to: put a 1509 dialog some place, advertise it to users in a pickup group and to 1510 uniquely identify it in a means that can be communicated (including 1511 human voice). The dialog can be held locally on the UA parking the 1512 dialog or alternatively transferred to the park service for the 1513 pickup group. The parked dialog then needs to be labeled (e.g. orbit 1514 12) in a way that can be communicated to the party that is to pick up 1515 the call. The UAs in the pick up group discovers the parked 1516 dialog(s) via the dialog package from the park service. If the 1517 dialog is parked locally the park service merely aggregates the 1518 parked call states from the set of UAs in the pickup up group. 1520 6.9. Call Pickup 1522 There are two different features that are called Call Pickup 1523 [RFC5359]. The first is the pickup of a parked dialog. The UA from 1524 which the dialog is to be picked up subscribes to the dialog state of 1525 the park service or the UA that has locally parked the dialog. 1526 Dialogs that are parked should be labeled with an identifier. The 1527 labels are used by the UA to allow the user to indicate which dialog 1528 is to be picked up. The UA picking up the call invoked the URI in 1529 the call state that is labeled as replace-remote. 1531 The other call pickup feature involves picking up an early dialog 1532 (typically ringing). A party picks up a call that was ringing at 1533 another location. One variation allows the caller to choose which 1534 location, another variation just picks up any call in that user's 1535 "pickup group". This feature uses some of the same primitives as the 1536 pick up of a parked call. The call state of the UA ringing phone is 1537 advertised using the dialog package. The UA that is to pickup the 1538 early dialog subscribes either directly to the ringing UA or to a 1539 service aggregating the states for UAs in the pickup group. The call 1540 state identifies early dialogs. The UA uses the call state(s) to 1541 help the user choose which early dialog that is to be picked up. The 1542 UA then invokes the URI in the call state labeled as replace-remote. 1544 6.10. Call Return 1546 In Call Return, Alice calls Bob. Bob misses the call or is 1547 disconnected before he is finished talking to Alice. Bob invokes 1548 Call return that calls Alice, even if Alice did not provide her real 1549 identity or location to Bob. 1551 6.11. Call Waiting 1553 In Call Waiting, Alice is in a call, then receives another call. 1554 Alice can place the first call on hold, and talk with the other 1555 caller. She can typically switch back and forth between the callers. 1557 6.12. Click-to-Dial 1559 In Click-to-Dial [RFC5359], Alice looks in her company directory for 1560 Bob. When she finds Bob, she clicks on a URI to call him. Her phone 1561 rings (or possibly answers automatically), and when she answers, 1562 Bob's phone rings. The application or server that hosts the Click- 1563 to-Dial application captures the URI to be dialed and can setup the 1564 call using 3pcc or can send a REFER request to the UA that is to dial 1565 the address. As users sometimes change their mind or wish to give up 1566 listing to a ringing or voicemail answered phone, this application 1567 illustrates the need to also have the ability to remotely hangup a 1568 call. 1570 6.13. Conference Call 1572 In a Conference Call [RFC4579], there are three or more active, 1573 visible participants in the same conversation space. 1575 6.14. Consultative Transfer 1577 In Consultative Transfer [RFC5589], the transferring party 1578 establishes a session with the target and mixes both sessions 1579 together so that all three parties can participate, then disconnects 1580 leaving the transferee and transfer target with an active session. 1582 6.15. Distinctive Ring 1584 In Distinctive Ring, incoming calls have different ring cadences or 1585 sample sounds depending on the From party, the To party, or other 1586 factors. The target UA either makes a local decision based on 1587 information in an incoming INVITE (To, From, Contact, Request-URI) or 1588 trusts an Alert-Info [RFC3261] header provided by the caller or 1589 inserted by a trusted proxy. In the latter case, the UA fetches the 1590 content described in the URI (typically via http) and renders it to 1591 the user. 1593 6.16. Do Not Disturb 1595 In Do Not Disturb, Alice selects the Do Not Disturb option. Calls to 1596 her either ring briefly or not at all and are forwarded elsewhere. 1597 Some variations allow specially authorized callers to override this 1598 feature and ring Alice anyway. Do Not Disturb is best implemented in 1599 SIP using presence [RFC3264]. 1601 6.17. Find-Me 1603 In Find-Me, Alice sets up complicated rules for how she can be 1604 reached (possibly using CPL (Call Processing Language) [RFC3880], 1605 presence [RFC3856], or other factors). When Bob calls Alice, his 1606 call is eventually routed to a temporary Contact where Alice happens 1607 to be available. 1609 6.18. Hotline 1611 In Hotline, Alice picks up a phone and is immediately connected to 1612 the technical support hotline, for example. Hotline is also 1613 sometimes known as a Ringdown line. 1615 6.19. IM Conference Alerts 1617 In IM Conference Alerts, A user receives an notification as an 1618 Instant Message whenever someone joins a conference they are also in. 1620 6.20. Inbound Call Screening 1622 In Inbound Call Screening, Alice doesn't want to receive calls from 1623 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1624 some variations this works even if Matt hides his identity. 1626 6.21. Intercom 1628 In Intercom, Alice typically presses a button on a phone that 1629 immediately connects to another user or phone and causes that phone 1630 to play her voice over its speaker. Some variations immediately 1631 setup two-way communications, other variations require another button 1632 to be pressed to enable a two-way conversation. The UA initiates a 1633 dialog using INVITE and the Answer-Mode: Auto header field as 1634 described in [RFC5373]. The called UA accepts the INVITE with a 200 1635 OK and automatically enables the speakerphone. 1637 Alternatively this can be a local decision for the UA to auto answer 1638 based upon called party identification. 1640 6.22. Message Waiting 1642 In Message Waiting [RFC3842], Bob calls Alice when she steps away 1643 from her phone, when she returns a visible or audible indicator 1644 conveys that someone has left her a voicemail message. The message 1645 waiting indication may also convey how many messages are waiting, 1646 from whom, what time, and other useful pieces of information. 1648 6.23. Music on Hold 1650 In Music on Hold [RFC5359], when Alice places a call with Bob on 1651 hold, it replaces its audio with streaming content such as music, 1652 announcements, or advertisements. Music on hold can be implemented a 1653 number of ways. One way is to transfer the held call to a holding 1654 service. When the UA wishes to take the call off hold it basically 1655 performs a take on the call from the holding service. This involves 1656 subscribing to call state on the holding service and then invoking 1657 the URI in the call state labeled as replace-remote. 1659 Alternatively music on hold can be performed as a local mixing 1660 operation. The UA holding the call can mix in the music from the 1661 music service via RTP (i.e.,, an additional dialog) or RTSP or other 1662 streaming media source. This approach is simpler (i.e., the held 1663 dialog does not move so there is less chance of loosing them) from a 1664 protocol perspective, however it does use more LAN bandwidth and 1665 resources on the UA. 1667 6.24. Outbound Call Screening 1669 In Outbound Call Screening, Alice is paged and unknowingly calls a 1670 PSTN pay-service telephone number in the Caribbean, but local policy 1671 blocks her call, and possibly informs her why. 1673 6.25. Pre-paid Calling 1675 In Pre-paid Calling, Alice pays for a certain currency or unit amount 1676 of calling value. When she places a call, she provides her account 1677 number somehow. If her account runs out of calling value during a 1678 call her call is disconnected or redirected to a service where she 1679 can purchase more calling value. 1681 For prepaid calling, the user's media always passes through a device 1682 that is trusted by the pre-paid provider. This may be the other 1683 endpoint (for example a PSTN gateway). In either case, an 1684 intermediary proxy or B2BUA can periodically verify the amount of 1685 time available on the pre-paid account, and use the session-timer 1686 extension to cause the trusted endpoint (gateway) or intermediary 1687 (media relay) to send a reINVITE before that time runs out. During 1688 the reINVITE, the SIP intermediary can re-verify the account and 1689 insert another session-timer header. 1691 Note that while most pre-paid systems on the PSTN use an IVR to 1692 collect the account number and destination, this isn't strictly 1693 necessary for a SIP-originated prepaid call. SIP requests and SIP 1694 URIs are sufficiently expressive to convey the final destination, the 1695 provider of the prepaid service, the location from which the user is 1696 calling, and the prepaid account they want to use. If a pre-paid IVR 1697 is used, the mechanism described below (Voice Portals) can be 1698 combined as well. 1700 6.26. Presence-Enabled Conferencing 1702 In Presence-Enabled Conferencing, Alice wants to set up a conference 1703 call with Bob and Cathy when they all happen to be available (rather 1704 than scheduling a predefined time). The server providing the 1705 application monitors their status, and calls all three when they are 1706 all "online", not idle, and not in another call. This could be 1707 implemented using conferencing [RFC4579] and presence [RFC3264] 1708 primitives. 1710 6.27. Single Line Extension/Multiple Line Appearance 1712 In Single Line Extension/Multiple Line Appearances, group of phones 1713 are all treated as "extensions" of a single line or AOR. A call for 1714 one rings them all. As soon as one answers, the others stop ringing. 1715 If any extension is actively in a conversation, another extension can 1716 "pick up" and immediately join the conversation. This emulates the 1717 behavior of a home telephone line with multiple phones. Incoming 1718 calls ring all the extensions through basic parallel forking. Each 1719 extension subscribes to dialog events from each other extension. 1720 While one user has an active call, any other UA extension can insert 1721 itself into that conversation (it already knows the dialog 1722 information) in the same way as barge-in. 1724 When implemented using SIP, this feature is known as Shared 1725 Appearances of an AOR [I-D.ietf-bliss-shared-appearances]. 1727 Extensions to the dialog package are used to convey appearance 1728 numbers (line numbers). 1730 6.28. Speakerphone Paging 1732 In Speakerphone Paging, Alice calls the paging address and speaks. 1733 Her voice is played on the speaker of every idle phone in a 1734 preconfigured group of phones. Speakerphone paging can be 1735 implemented using either multicast or through a simple multipoint 1736 mixer. In the multicast solution the paging UA sends a multicast 1737 INVITE with send only media in the SDP (see also RFC3264). The 1738 automatic answer and enabling of the speakerphone is a locally 1739 configured decision on the paged UAs. The paging UA sends RTP via 1740 the multicast address indicated in the SDP. 1742 The multipoint solution is accomplished by sending an INVITE to the 1743 multipoint mixer. The mixer is configured to automatically answer 1744 the dialog. The paging UA then sends REFER requests for each of the 1745 UAs that are to become paging speakers (The UA is likely to send out 1746 a single REFER that is parallel forked by the proxy server). The UAs 1747 performing as paging speakers are configured to automatically answer 1748 based upon caller identification (e.g. To field, URI or Referred-To 1749 headers). 1751 Finally as a third option, the user agent can send a mass-invitation 1752 request to a conference server, which would create a conference and 1753 send INVITEs containing the Answer-Mode: Auto header field to all 1754 user agents in the paging group. 1756 6.29. Speed Dial 1758 In Speed Dial, Alice dials an abbreviated number, or enters an alias, 1759 or presses a special speed dial button representing Bob. Her action 1760 is interpreted as if she specified the full address of Bob. 1762 6.30. Voice Message Screening 1764 In Voice Message Screening, Bob calls Alice. Alice is screening her 1765 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1766 leave his message. If she decides to talk to Bob, she can take the 1767 call back from the voicemail system, otherwise she can let Bob leave 1768 a message. This emulates the behavior of a home telephone answering 1769 machine. 1771 At first, this is the same as Call Monitoring (Section 6.7). In this 1772 case the voicemail service is one of the UAs. The UA screening the 1773 message monitors the call on the voicemail service, and also 1774 subscribes to dialog information. If the user screening their 1775 messages decides to answer, they perform a Take from the voicemail 1776 system (for example, send an INVITE with Replaces to the UA leaving 1777 the message) 1779 6.31. Voice Portal 1781 Voice Portal is service that allows users to access a portal site 1782 using spoken dialog interaction. For example, Alice needs to 1783 schedule a working dinner with her co-worker Carol. Alice uses a 1784 voice portal to check Carol's flight schedule, find a restaurant near 1785 her hotel, make a reservation, get directions there, and page Carol 1786 with this information. A voice portal is essentially a complex 1787 collection of voice dialogs used to access interesting content. One 1788 of the most desirable call control features of a Voice Portal is the 1789 ability to start a new outgoing call from within the context of the 1790 Portal (to make a restaurant reservation, or return a voicemail 1791 message for example). Once the new call is over, the user should be 1792 able to return to the Portal by pressing a special key, using some 1793 DTMF sequence (e.g., a very long pound or hash tone), or by speaking 1794 a key word (e.g., "Main Menu"). 1796 In order to accomplish this, the Voice Portal starts with the 1797 following media relationship: 1799 { User , Voice Portal } 1801 The user then asks to make an outgoing call. The Voice Portal asks 1802 the User to perform a Far-Fork. In other words the Voice Portal 1803 wants the following media relationship: 1805 { Target , User } & { User , Voice Portal } 1807 The Voice Portal is now just listening for a key word or the 1808 appropriate DTMF. As soon as the user indicates they are done, the 1809 Voice Portal takes the call from the old Target, and we are back to 1810 the original media relationship. 1812 This feature can also be used by the account number and phone number 1813 collection menu in a pre-paid calling service. A user can press a 1814 DTMF sequence that presents them with the appropriate menu again. 1816 6.32. Voicemail 1818 In Voicemail, Alice calls Bob who does not answer or is not 1819 available. The call forwards to a voicemail server which plays Bob's 1820 greeting and records Alice's message for Bob. An indication is sent 1821 to Bob that a new message is waiting, and he retrieves the message at 1822 a later date. This feature is implemented using features such as 1823 Call Forwarding (Section 6.6) and the History-Info [RFC4244] header 1824 field or voicemail URI [RFC4458] convention and Message Waiting 1825 [RFC3842] features. 1827 6.33. Whispered Call Waiting 1829 In Whispered Call Waiting, Alice is in a conversation with Bob. Carol 1830 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1831 get lunch in 15 minutes?"), or an automaton whispers to Alice 1832 informing her that Carol is trying to reach her. 1834 7. Acknowledgments 1836 The authors would like to acknowledge Ben Campbell for his 1837 contributions to the document and thank AC Mahendran, John Elwell, 1838 and Xavier Marjou for their detailed Working Group review of the 1839 document. The authors would like to thank Magnus Nystrom for his 1840 review of the document. 1842 8. Informative References 1844 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1845 A., Peterson, J., Sparks, R., Handley, M., and E. 1846 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1847 June 2002. 1849 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1850 with Session Description Protocol (SDP)", RFC 3264, 1851 June 2002. 1853 [RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific 1854 Event Notification", RFC 3265, June 2002. 1856 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1857 Description Protocol", RFC 4566, July 2006. 1859 [RFC5359] Johnston, A., Sparks, R., Cunningham, C., Donovan, S., and 1860 K. Summers, "Session Initiation Protocol Service 1861 Examples", BCP 144, RFC 5359, October 2008. 1863 [RFC3725] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. 1864 Camarillo, "Best Current Practices for Third Party Call 1865 Control (3pcc) in the Session Initiation Protocol (SIP)", 1866 BCP 85, RFC 3725, April 2004. 1868 [RFC3515] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1869 Method", RFC 3515, April 2003. 1871 [RFC3891] Mahy, R., Biggs, B., and R. Dean, "The Session Initiation 1872 Protocol (SIP) "Replaces" Header", RFC 3891, 1873 September 2004. 1875 [RFC3911] Mahy, R. and D. Petrie, "The Session Initiation Protocol 1876 (SIP) "Join" Header", RFC 3911, October 2004. 1878 [I-D.ietf-bliss-problem-statement] 1879 Rosenberg, J., "Basic Level of Interoperability for 1880 Session Initiation Protocol (SIP) Services (BLISS) 1881 Problem Statement", draft-ietf-bliss-problem-statement-04 1882 (work in progress), March 2009. 1884 [RFC4235] Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE- 1885 Initiated Dialog Event Package for the Session Initiation 1886 Protocol (SIP)", RFC 4235, November 2005. 1888 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 1889 Initiation Protocol (SIP) Event Package for Conference 1890 State", RFC 4575, August 2006. 1892 [RFC3680] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1893 Package for Registrations", RFC 3680, March 2004. 1895 [RFC3856] Rosenberg, J., "A Presence Event Package for the Session 1896 Initiation Protocol (SIP)", RFC 3856, August 2004. 1898 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1899 Session Initiation Protocol (SIP)", RFC 4353, 1900 February 2006. 1902 [RFC5629] Rosenberg, J., "A Framework for Application Interaction in 1903 the Session Initiation Protocol (SIP)", RFC 5629, 1904 October 2009. 1906 [RFC5369] Camarillo, G., "Framework for Transcoding with the Session 1907 Initiation Protocol (SIP)", RFC 5369, October 2008. 1909 [I-D.ietf-xcon-ccmp] 1910 Barnes, M., Boulton, C., Romano, S., and H. Schulzrinne, 1911 "Centralized Conferencing Manipulation Protocol", 1912 draft-ietf-xcon-ccmp-04 (work in progress), November 2009. 1914 [RFC5589] Sparks, R., Johnston, A., and D. Petrie, "Session 1915 Initiation Protocol (SIP) Call Control - Transfer", 1916 BCP 149, RFC 5589, June 2009. 1918 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1919 (SIP) Call Control - Conferencing for User Agents", 1920 BCP 119, RFC 4579, August 2006. 1922 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1923 "Indicating User Agent Capabilities in the Session 1924 Initiation Protocol (SIP)", RFC 3840, August 2004. 1926 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1927 Preferences for the Session Initiation Protocol (SIP)", 1928 RFC 3841, August 2004. 1930 [RFC3087] Campbell, B. and R. Sparks, "Control of Service Context 1931 using SIP Request-URI", RFC 3087, April 2001. 1933 [I-D.audet-sipping-feature-ref] 1934 Audet, F., Johnston, A., Mahy, R., and C. Jennings, 1935 "Feature Referral in the Session Initiation Protocol 1936 (SIP)", draft-audet-sipping-feature-ref-00 (work in 1937 progress), February 2008. 1939 [RFC4240] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network 1940 Media Services with SIP", RFC 4240, December 2005. 1942 [RFC4458] Jennings, C., Audet, F., and J. Elwell, "Session 1943 Initiation Protocol (SIP) URIs for Applications such as 1944 Voicemail and Interactive Voice Response (IVR)", RFC 4458, 1945 April 2006. 1947 [RFC4538] Rosenberg, J., "Request Authorization through Dialog 1948 Identification in the Session Initiation Protocol (SIP)", 1949 RFC 4538, June 2006. 1951 [RFC3880] Lennox, J., Wu, X., and H. Schulzrinne, "Call Processing 1952 Language (CPL): A Language for User Control of Internet 1953 Telephony Services", RFC 3880, October 2004. 1955 [RFC5373] Willis, D. and A. Allen, "Requesting Answering Modes for 1956 the Session Initiation Protocol (SIP)", RFC 5373, 1957 November 2008. 1959 [RFC3842] Mahy, R., "A Message Summary and Message Waiting 1960 Indication Event Package for the Session Initiation 1961 Protocol (SIP)", RFC 3842, August 2004. 1963 [I-D.ietf-bliss-shared-appearances] 1964 Johnston, A., Soroushnejad, M., and V. Venkataramanan, 1965 "Shared Appearances of a Session Initiation Protocol (SIP) 1966 Address of Record (AOR)", 1967 draft-ietf-bliss-shared-appearances-04 (work in progress), 1968 October 2009. 1970 [RFC4244] Barnes, M., "An Extension to the Session Initiation 1971 Protocol (SIP) for Request History Information", RFC 4244, 1972 November 2005. 1974 [RFC4313] Oran, D., "Requirements for Distributed Control of 1975 Automatic Speech Recognition (ASR), Speaker 1976 Identification/Speaker Verification (SI/SV), and Text-to- 1977 Speech (TTS) Resources", RFC 4313, December 2005. 1979 [RFC3323] Peterson, J., "A Privacy Mechanism for the Session 1980 Initiation Protocol (SIP)", RFC 3323, November 2002. 1982 [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private 1983 Extensions to the Session Initiation Protocol (SIP) for 1984 Asserted Identity within Trusted Networks", RFC 3325, 1985 November 2002. 1987 Authors' Addresses 1989 Rohan Mahy 1990 Unaffiliated 1992 Email: rohan@ekabal.com 1994 Robert Sparks 1995 Tekelek 1997 Email: rjsparks@nostrum.com 1999 Jonathan Rosenberg 2000 jdrosen.net 2002 Email: jdrosen@jdrosen.net 2004 Dan Petrie 2005 SIP EZ 2007 Email: dpetrie@sipez.com 2008 Alan Johnston (editor) 2009 Avaya 2011 Email: alan@sipstation.com