idnits 2.17.1 draft-ietf-sipping-cc-framework-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 24. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1889. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1866. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1873. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1879. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 979 has weird spacing: '...with on sip...' == Line 991 has weird spacing: '... prompt sip:s...' == Line 1532 has weird spacing: '... Alerts sub...' == Line 1819 has weird spacing: '...riented dialo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Oct 2005) is 6761 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '3pcc' on line 174 -- Looks like a reference, but probably isn't: 'JTAPI' on line 234 -- Looks like a reference, but probably isn't: 'CSTA' on line 235 -- Looks like a reference, but probably isn't: 'SDP' on line 460 -- Looks like a reference, but probably isn't: 'VoiceXML' on line 697 -- Looks like a reference, but probably isn't: 'CPL' on line 1451 == Unused Reference: '22' is defined on line 1815, but no explicit reference was found in the text == Unused Reference: '24' is defined on line 1822, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3265 (ref. '4') (Obsoleted by RFC 6665) ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566) == Outdated reference: A later version (-15) exists of draft-ietf-sipping-service-examples-09 ** Downref: Normative reference to an Informational draft: draft-ietf-sipping-conferencing-framework (ref. '15') == Outdated reference: A later version (-05) exists of draft-ietf-sipping-transc-framework-02 ** Downref: Normative reference to an Informational draft: draft-ietf-sipping-transc-framework (ref. '17') == Outdated reference: A later version (-12) exists of draft-ietf-sipping-cc-transfer-05 == Outdated reference: A later version (-05) exists of draft-mahy-sip-remote-cc-01 Summary: 8 errors (**), 0 flaws (~~), 12 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING WG R. Mahy 3 Internet-Draft SIP Edge LLC 4 Expires: April 4, 2006 B. Campbell 5 R. Sparks 6 Estacado Systems 7 J. Rosenberg 8 Cisco Systems 9 D. Petrie 10 SIP EZ 11 A. Johnston 12 MCI 13 Oct 2005 15 A Call Control and Multi-party usage framework for the Session 16 Initiation Protocol (SIP) 17 draft-ietf-sipping-cc-framework-05.txt 19 Status of this Memo 21 By submitting this Internet-Draft, each author represents that any 22 applicable patent or other IPR claims of which he or she is aware 23 have been or will be disclosed, and any of which he or she becomes 24 aware will be disclosed, in accordance with Section 6 of BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on April 4, 2006. 44 Copyright Notice 46 Copyright (C) The Internet Society (2005). 48 Abstract 49 This document defines a framework and requirements for multi-party 50 usage of SIP. To enable discussion of multi-party features and 51 applications we define an abstract call model for describing the 52 media relationships required by many of these. The model and actions 53 described here are specifically chosen to be independent of the SIP 54 signaling and/or mixing approach chosen to actually setup the media 55 relationships. In addition to its dialog manipulation aspect, this 56 framework includes requirements for communicating related information 57 and events such as conference and session state, and session history. 58 This framework also describes other goals which embody the spirit of 59 SIP applications as used on the Internet. 61 Table of Contents 63 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . 4 64 2. Motivation and Background . . . . . . . . . . . . . . . . . 4 65 3. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . 6 66 3.1 "Conversation Space" Model . . . . . . . . . . . . . . . . 6 67 3.2 Comparison with Related Definitions . . . . . . . . . . . 7 68 3.3 Signaling Models . . . . . . . . . . . . . . . . . . . . . 8 69 3.4 Mixing Models . . . . . . . . . . . . . . . . . . . . . . 9 70 3.4.1 Tightly Coupled . . . . . . . . . . . . . . . . . . . 10 71 3.4.2 Loosely Coupled . . . . . . . . . . . . . . . . . . . 10 72 3.5 Conveying Information and Events . . . . . . . . . . . . . 11 73 3.6 Componentization and Decomposition . . . . . . . . . . . . 13 74 3.6.1 Media Intermediaries . . . . . . . . . . . . . . . . . 14 75 3.6.2 Mixer . . . . . . . . . . . . . . . . . . . . . . . . 14 76 3.6.3 Transcoder . . . . . . . . . . . . . . . . . . . . . . 14 77 3.6.4 Media Relay . . . . . . . . . . . . . . . . . . . . . 14 78 3.6.5 Queue Server . . . . . . . . . . . . . . . . . . . . . 14 79 3.6.6 Parking Place . . . . . . . . . . . . . . . . . . . . 15 80 3.6.7 Announcements and Voice Dialogs . . . . . . . . . . . 15 81 3.7 Use of URIs . . . . . . . . . . . . . . . . . . . . . . . 17 82 3.7.1 Naming Users in SIP . . . . . . . . . . . . . . . . . 17 83 3.7.2 Naming Services with SIP URIs . . . . . . . . . . . . 19 84 3.8 Invoker Independence . . . . . . . . . . . . . . . . . . . 22 85 3.9 Billing issues . . . . . . . . . . . . . . . . . . . . . . 23 86 4. Catalog of call control actions and sample features . . . . 23 87 4.1 Early Dialog Actions . . . . . . . . . . . . . . . . . . . 24 88 4.1.1 Remote Answer . . . . . . . . . . . . . . . . . . . . 24 89 4.1.2 Remote Forward or Put . . . . . . . . . . . . . . . . 24 90 4.1.3 Remote Busy or Error Out . . . . . . . . . . . . . . . 24 91 4.2 Single Dialog Actions . . . . . . . . . . . . . . . . . . 24 92 4.2.1 Remote Dial . . . . . . . . . . . . . . . . . . . . . 24 93 4.2.2 Remote On and Off Hold . . . . . . . . . . . . . . . . 25 94 4.2.3 Remote Hangup . . . . . . . . . . . . . . . . . . . . 25 95 4.3 Multi-dialog actions . . . . . . . . . . . . . . . . . . . 25 96 4.3.1 Transfer . . . . . . . . . . . . . . . . . . . . . . . 25 97 4.3.2 Take . . . . . . . . . . . . . . . . . . . . . . . . . 26 98 4.3.3 Add . . . . . . . . . . . . . . . . . . . . . . . . . 26 99 4.3.4 Local Join . . . . . . . . . . . . . . . . . . . . . . 27 100 4.3.5 Insert . . . . . . . . . . . . . . . . . . . . . . . . 27 101 4.3.6 Split . . . . . . . . . . . . . . . . . . . . . . . . 27 102 4.3.7 Near-fork . . . . . . . . . . . . . . . . . . . . . . 27 103 4.3.8 Far fork . . . . . . . . . . . . . . . . . . . . . . . 28 104 5. Security Considerations . . . . . . . . . . . . . . . . . . 28 105 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . 29 106 7. Appendix A: Example Features . . . . . . . . . . . . . . . . 29 107 7.1 Implementation of these features . . . . . . . . . . . . . 33 108 7.1.1 Call Park . . . . . . . . . . . . . . . . . . . . . . 33 109 7.1.2 Call Pickup . . . . . . . . . . . . . . . . . . . . . 34 110 7.1.3 Music on Hold . . . . . . . . . . . . . . . . . . . . 34 111 7.1.4 Call Monitoring . . . . . . . . . . . . . . . . . . . 34 112 7.1.5 Barge-in . . . . . . . . . . . . . . . . . . . . . . . 35 113 7.1.6 Intercom . . . . . . . . . . . . . . . . . . . . . . . 35 114 7.1.7 Speakerphone paging . . . . . . . . . . . . . . . . . 35 115 7.1.8 Distinctive ring . . . . . . . . . . . . . . . . . . . 35 116 7.1.9 Voice message screening . . . . . . . . . . . . . . . 36 117 7.1.10 Single Line Extension . . . . . . . . . . . . . . . 36 118 7.1.11 Click-to-dial . . . . . . . . . . . . . . . . . . . 36 119 7.1.12 Pre-paid calling . . . . . . . . . . . . . . . . . . 36 120 7.1.13 Voice Portal . . . . . . . . . . . . . . . . . . . . 37 121 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 122 8.1 Normative References . . . . . . . . . . . . . . . . . . . 37 123 8.2 Informational References . . . . . . . . . . . . . . . . . 39 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 39 125 Intellectual Property and Copyright Statements . . . . . . . 41 127 1. Conventions 129 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 130 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 131 document are to be interpreted as described in RFC-2119 [2]. 133 2. Motivation and Background 135 The Session Initiation Protocol [1] (SIP) was defined for the 136 initiation, maintenance, and termination of sessions or calls between 137 one or more users. However, despite its origins as a large-scale 138 multiparty conferencing protocol, SIP is used today primarily for 139 point to point calls. This two-party configuration is the focus of 140 the SIP specification and most of its extensions. 142 This document defines a framework and requirements for multi-party 143 usage of SIP. Most multi-party operations manipulate SIP session 144 dialogs (also known as call legs) or SIP conference media policy to 145 cause participants in a conversation to perceive specific media 146 relationships. In other protocols that deal with the concept of 147 calls, this manipulation is known as call control. In addition to 148 its dialog or policy manipulation aspect, "call control" also 149 includes communicating information and events related to manipulating 150 calls, including information and events dealing with session state 151 and history, conference state, user state, and even message state. 153 Based on input from the SIP community, the authors compiled the 154 following set of goals for SIP call control and multiparty 155 applications: 156 o Define Primitives, Not Services. Allow for a handful of robust 157 yet simple mechanisms which can be combined to deliver features 158 and services. Throughout this document we refer to these simple 159 mechanisms as "primitives". Primitives should be sufficiently 160 robust that when they are combined they can be used to build lots 161 of services. However, the goal is not to define a provably 162 complete set of primitives. Note that while the IETF will NOT 163 standardize behavior or services, it may define example services 164 for informational purposes, as in service examples [6]. 165 o Participant oriented. The primitives should be designed to 166 provide services which are oriented around the experience of the 167 participants. The authors observe that end users of features and 168 services usually don't care how a media relationship is setup. 169 Their ultimate experience is based only on the resulting media and 170 other externally visible characteristics. 171 o Signaling Model independent: Support both a central control and a 172 peer-to-peer feature invocation model (and combinations of the 173 two). Baseline SIP already supports a centralized control model 174 described in [3pcc], and the SIP community has expressed a great 175 deal of interest in peer-to-peer or distributed call control using 176 primitives such as those defined in REFER [8], Replaces [9], and 177 Join [10]. 178 o Mixing Model independent: The bulk of interesting multiparty 179 applications involve mixing or combining media from multiple 180 participants. This mixing can be performed by one or more of the 181 participants, or by a centralized mixing resource. The experience 182 of the participants should not depend on the mixing model used. 183 While most examples in this document refer to audio mixing, the 184 framework applies to any media type. In this context a "mixer" 185 refers to combining media in an appropriate, media-specific way. 186 This is consistent with model described in the SIP conferencing 187 framework. 188 o Invoker oriented. Only the user who invokes a feature or a 189 service needs to know exactly which service is invoked or why. 190 This is good because it allows new services to be created without 191 requiring new primitives from all the participants; and it allows 192 for much simpler feature authorization policies, for example, when 193 participation spans organizational boundaries. As discussed in 194 section 3.8, this also avoids exponential state explosion when 195 combining features. The invoker only has to manage a user 196 interface or API to prevent local feature interactions. All the 197 other participants simply need to manage the feature interactions 198 of a much smaller number of primitives. 199 o Primitives make full use of URIs. URIs are a very powerful 200 mechanism for describing users and services. They represent a 201 plentiful resource which can be extremely expressive and easily 202 routed, translated, and manipulated--even across organizational 203 boundaries. URIs can contain special parameters and informational 204 headers which need only be relevant to the owner of the namespace 205 (domain) of the URI. Just as a user who selects an http: URL need 206 not understand the significance and organization of the web site 207 it references, a user may encounter a SIP URL which translates 208 into an email-style group alias, which plays a pre-recorded 209 message, or runs some complex call-handling logic. Note that 210 while this may seem paradoxical to the previous goal, both goals 211 can be satisfied by the same model. 212 o Make use of SIP headers and SIP event packages to provide SIP 213 entities with information about their environment. These should 214 include information about the status / handling of dialogs on 215 other user agents, information about the history of other contacts 216 attempted prior to the current contact, the status of 217 participants, the status of conferences, user presence 218 information, and the status of messages. 219 o Encourage service decomposition, and design to make use of 220 standard components using well-defined, simple interfaces. Sample 221 components include a SIP mixer, recording service, announcement 222 server, and voice dialog server. (This is not an exhaustive 223 list). 224 o Include authentication, authorization, policy, logging, and 225 accounting mechanisms to allow these primitives to be used safely 226 among mutually untrusted participants. Some of these mechanisms 227 may be used to assist in billing, but no specific billing system 228 will be endorsed. 229 o Permit graceful fallback to baseline SIP. Definitions for new SIP 230 call control extensions/primitives MUST describe a graceful way to 231 fallback to baseline SIP behavior. Support for one primitive MUST 232 NOT imply support for another primitive. 233 o There is no desire or goal to reinvent traditional models, such as 234 the model used the [H.450] family of protocols, [JTAPI], or the 235 [CSTA] call model, as these other models do not share the design 236 goals presented in this document. 238 3. Key Concepts 240 3.1 "Conversation Space" Model 242 This document introduces the concept of an abstract "conversation 243 space" (essentially as a set of participants who believe they are all 244 communicating among one another). Each conversation space contains 245 one or more participants. 247 Participants are SIP User Agents which send original media to or 248 terminate and receive media from other members of the conversation 249 space. Logically, every participant in the conversation space has 250 access to all the media generated in that space (this is strictly 251 true if all participants share a common media type). A SIP User 252 Agent which does not contribute or consume any media is NOT a 253 participant; nor is a user agent which merely forwards, transcodes, 254 mixes, or selects media originating elsewhere in the conversation 255 space. [Note that a conversation space consists of zero or more SIP 256 calls or SIP conferences. A conversation space is similar to the 257 definition of a "call" in some other call models.] 259 Participants may represent human users or non-human users (referred 260 to as robots or automatons in this document). Some participants may 261 be hidden within a conversation space. Some examples of hidden 262 participants include: robots which generate tones, images, or 263 announcements during a conference to announce users arriving and 264 departing, a human call center supervisor monitoring a conversation 265 between a trainee and a customer, and robots which record media for 266 training or archival purposes. 268 Participants may also be active or passive. Active participants are 269 expected to be intelligent enough to leave a conversation space when 270 they no longer desire to participate. (An attentive human 271 participant is obviously active.) Some robotic participants (such as 272 a voice messaging system, an instant messaging agent, or a voice 273 dialog system) may be active participants if they can leave the 274 conversation space when there is no human interaction. Other robots 275 (for example our tone generating robot from the previous example) are 276 passive participants. A human participant "on-hold" is passive. 278 An example diagram of a conversation space can be shown as a "bubble" 279 or ovals, or as a "set" in curly or square brace notation. Each set, 280 oval, or "bubble" represents a conversation space. Hidden 281 participants are shown in lowercase letters. 283 { A , B } [ A , B ] 285 .-. .---. 286 / \ / \ 287 / A \ / A b \ 288 ( ) ( ) 289 \ B / \ C D / 290 \ / \ / 291 '-' '---' 293 3.2 Comparison with Related Definitions 295 In SIP, a call is "an informal term that refers to some communication 296 between peers, generally set up for the purposes of a multimedia 297 conversation." Obviously we cannot discuss normative behavior based 298 on such an intentionally vague definition. The concept of a 299 conversation space is needed because the SIP definition of call is 300 not sufficiently precise for the purpose of describing the user 301 experience of multiparty features. 303 Do any other definitions convey the correct meaning? SIP, and SDP 304 [5] both define a conference as "a multimedia session identified by a 305 common session description." A session is defined as "a set of 306 multimedia senders and receivers and the data streams flowing from 307 senders to receivers." Both of these definitions are heavily 308 oriented toward multicast sessions with little differenciation among 309 participants. As such, neither is particularly useful for our 310 purposes. In fact, the definition of "call" in some call models is 311 more similar to our definition of a conversation space. 313 Some examples of the relationship between conversation spaces, SIP 314 call legs, and SIP sessions are listed below. In each example, a 315 human user will perceive that there is a single call. 317 o A simple two-party call is a single conversation space, a single 318 session, and a single call-leg. 319 o A locally mixed three-way call is two sessions and two call-legs. 320 It is also a single conversation space. 321 o A simple dial-in audio conference is a single conversation space, 322 but is represented by as many call-legs and sessions as there are 323 human participants. 324 o A multicast conference is a single conversation space, a single 325 session, and as many call-legs as participants. 327 3.3 Signaling Models 329 Obviously to make changes to a conversation space, you must be able 330 to use SIP signaling to cause these changes. Specifically there must 331 be a way to manipulate SIP dialogs (call legs) to move participants 332 into and out of conversation spaces. Although this is not as 333 obvious, there also must be a way to manipulate SIP dialogs to 334 include non-participant user agents which are otherwise involved in a 335 conversation space (ex: B2BUAs, 3pcc controllers, mixers, 336 transcoders, translators, or relays). 338 Implementations may setup the media relationships described in the 339 conversation space model using the approach described in 3pcc [7]. 340 The 3pcc approach relies on only the following 3 primitive 341 operations: 342 o Create a new call-leg (INVITE) 343 o Modify a call-leg (reINVITE) 344 o Destroy a call-leg (BYE) 346 The main advantage of the 3pcc approach is that it only requires very 347 basic SIP support from end systems to support call control features. 348 As such, third-party call control is a natural way to handle protocol 349 conversion and mid-call features. It also has the advantage and 350 disadvantage that new features can/must be implemented in one place 351 only (the controller), and neither requires enhanced client 352 functionality, nor takes advantage of it. 354 In addition, a peer-to-peer approach is discussed at length in this 355 draft. The primary drawback of the peer-to-peer model is additional 356 end system complexity. The benefits of the peer-to-peer model 357 include: 358 o state remains at the edges 359 o call signaling need only go through participants involved (there 360 are no additional points of failure) 361 o peers can take advantage of end-to-end message integrity or 362 encryption 364 o setup time is shorter (fewer messages and round trips are 365 required) 367 The peer-to-peer approach relies on additional "primitive" 368 operations, some of which are identified here. 369 o Replace an existing dialog 370 o Join a new dialog with an existing dialog 371 o Support SIP conference policy control 372 o Locally perform media forking (multi-unicast) 373 o Ask another UA to send a request on your behalf 375 Many of the features, primitives, and actions described in this 376 document also require some type of media mixing, combining, or 377 selection as described in the next section. 379 3.4 Mixing Models 381 SIP permits a variety of mixing models, which are discussed here 382 briefly. This topic is discussed more thoroughly in the SIP 383 conferencing framework [15] and cc-conferencing [19]. SIP supports 384 both tightly-coupled and loosely-coupled conferencing, although more 385 sophisticated behavior is available in tightly-coupled conferences. 386 In a tightly-coupled conference, a single SIP user agent (called the 387 focus) has a direct dialog relationship with each participant (and 388 may control non participant user agents as well). In a loosely- 389 coupled conference there is no coordinated signaling relationships 390 among the participants. 392 For brevity, only the two most popular conferencing models are 393 significantly discussed in this document (local and centralized 394 mixing). Applications of the conversation spaces model to loosely- 395 coupled multicast and distributed full unicast mesh conferences are 396 left as an exercise for the reader. Note that a distributed full 397 mesh conference can be used for basic conferences, but does not 398 easily allow for more complex conferencing actions like splitting, 399 merging, and sidebars. 401 Call control features should be designed to allow a mixer (local or 402 centralized) to decide when to reduce a conference back to a 2-party 403 call, or drop all the participants (for example if only two 404 automatons are communicating). The actual heuristics used to release 405 calls are beyond the scope of this document, but may depend on 406 properties in the conversation space, such as the number of active, 407 passive, or hidden participants; and the send-only, receive-only, or 408 send-and-receive orientation of various participants. 410 3.4.1 Tightly Coupled 412 3.4.1.1 (Single) End System Mixing 414 The first model we call "end system mixing". In this model, user A 415 calls user B, and they have a conversation. At some point later, A 416 decides to conference in user C. To do this, A calls C, using a 417 completely separate SIP call. This call uses a different Call-ID, 418 different tags, etc. There is no call set up directly between B and 419 C. No SIP extension or external signaling is needed. A merely 420 decides to locally join two call-legs. 422 B C 423 \ / 424 \ / 425 A 427 A receives media streams from both B and C, and mixes them. A sends 428 a stream containing A's and C's streams to B, and a stream containing 429 A's and B's streams to C. Basically, user A handles both signaling 430 and media mixing. 432 3.4.1.2 Centralized Mixing 434 In a centralized mixing model, all participants have a pairwise SIP 435 and media relationship with the mixer. Common applications of 436 centralized mixing include ad-hoc conferences and scheduled dial-in 437 or dial-out conferences. [need diagram] 439 3.4.1.3 Centralized Signaling, Distributed Media 441 In this conferencing model, there is a centralized controller, as in 442 the dial-in and dial-out cases. However, the centralized server 443 handles signaling only. The media is still sent directly between 444 participants, using either multicast or multi-unicast. Multi-unicast 445 is when a user sends multiple packets (one for each recipient, 446 addressed to that recipient). This is referred to as a 447 "Decentralized Multipoint Conference" in [H.323]. 449 3.4.2 Loosely Coupled 451 In these models, there is no point of central control of SIP 452 signaling. As in the "Centralized Signaling, Distributed Media" case 453 above, all endpoints send media to all other endpoints. Consequently 454 every endpoint mixes their own media from all the other sources, and 455 sends their own media to every other participant. [add diagrams] 457 3.4.2.1 Large-Scale Multicast Conferences 459 Large-scale multicast conferences were the original motivation for 460 both the Session Description Protocol [SDP] and SIP. In a large- 461 scale multicast conference, one or more multicast addresses are 462 allocated to the conference. Each participant joins that multicast 463 groups, and sends their media to those groups. Signaling is not sent 464 to the multicast groups. The sole purpose of the signaling is to 465 inform participants of which multicast groups to join. Large-scale 466 multicast conferences are usually pre-arranged, with specific start 467 and stop times. However, multicast conferences do not need to be 468 pre-arranged, so long as a mechanism exists to dynamically obtain a 469 multicast address. 471 3.4.2.2 Full Distributed Unicast Conferencing 473 In this conferencing model, each participant has both a pairwise 474 media relationship and a pairwise SIP relationship with every other 475 participant (a full mesh). This model requires a mechanism to 476 maintain a consistent view of distributed state across the group. 477 This is a classic hard problem in computer science. Also, this model 478 does not scale well for large numbers of participants. because for 479 participants the number of media and SIP relationships is 480 approximately n-squared. As a result, this model is not generally 481 available in commercial implementations; to the contrary it is 482 primarily the topic of research or experimental implementations. 483 Note that this model assumes peer-to-peer signaling. 485 3.5 Conveying Information and Events 487 Participants should have access to information about the other 488 participants in a conversation space, so that this information can be 489 rendered to a human user or processed by an automaton. Although some 490 of this information may be available from the Request-URI or To, 491 From, Contact, or other SIP headers, another mechanism of reporting 492 this information is necessary. 494 Many applications are driven by knowledge about the progress of calls 495 and conferences. In general these types of events allow for the 496 construction of distributed applications, where the application 497 requires information on session dialog and conference state, but is 498 not necessarily co-resident with an endpoint user agent or conference 499 server. For example, a focus involved in a conversation space may 500 wish to provide URLs for conference status, and/or conference/floor 501 control. 503 The SIP Events [4] architecture defines general mechanisms for 504 subscription to and notification of events within SIP networks. It 505 introduces the notion of a package which is a specific 506 "instantiation" of the events mechanism for a well-defined set of 507 events. 509 Event packages are needed to provide the status of a user's session 510 dialogs, provide the status of conferences and its participants, 511 provide user presence information, provide the status of 512 registrations, and provide the status of user's messages. While this 513 is not an exhaustive list, these are sufficient to enable the sample 514 features described in this document. 516 The conference event package [12] allows users to subscribe to 517 information about an entire tightly-coupled SIP conference. 518 Notifications convey information about the pariticipants such as: the 519 SIP URL identifying each user, their status in the space (active, 520 declined, departed), URLs to invoke other features (such as sidebar 521 conversations), links to other relevant information (such as floor 522 control policies), and if floor control policies are in place, the 523 user's floor control status. For conversation spaces created from 524 cascaded conferences, converstation state can be gathered from 525 relevant foci and merged into a cohesive set of state. 527 The session dialog package [11] provides information about all the 528 dialogs the target user is maintaining, what conversations the user 529 in participating in, and how these are correlated. Likewise the 530 registration package [13] provides notifications when contacts have 531 changed for a specific address-of-record. The combination of these 532 allows a user agent to learn about all conversations occurring for 533 the entire registered contact set for an address-of-record. 535 Note that user presence in SIP [14] has a close relationship with 536 these later two event packages. It is fundamental to the presence 537 model that the information used to obtain user presence is 538 constructed from any number of different input sources. Examples of 539 other such sources include calendaring information and uploads of 540 presence documents. These two packages can be considered another 541 mechanism that allows a presence agent to determine the presence 542 state of the user. Specifically, a user presence server can act as a 543 subscriber for the session dialog and registration packages to obtain 544 additional information that can be used to construct a presence 545 document. 547 The multi-party architecture may also need to provide a mechanism to 548 get information about the status /handling of a dialog (for example, 549 information about the history of other contacts attempted prior to 550 the current contact). Finally, the architecture should provide ample 551 opportunities to present informational URIs which relate to calls, 552 conversations, or dialogs in some way. For example, consider the SIP 553 Call-Info header, or Contact headers returned in a 300-class 554 response. Frequently additional information about a call or dialog 555 can be fetched via non-SIP URIs. For example, consider a web page 556 for package tracking when calling a delivery company, or a web page 557 with related documentation when joining a dial-in conference. The 558 use of URIs in the multiparty framework is discussed in more detail 559 in Section 3.7. 561 Finally the interaction of SIP with stimulus-signaling-based 562 applications, which allow a user agent to interact with an 563 application without knowledge of the semantics of that application, 564 is discussed in the SIP application interaction framework [16]. 565 Stimulus signaling can occur to a user interface running locally with 566 the client, or to a remote user interface, through media streams. 567 Stimulus signaling encompasses a wide range of mechanisms, ranging 568 from clicking on hyperlinks, to pressing buttons, to traditional Dual 569 Tone Multi Frequency (DTMF) input. In all cases, stimulus signaling 570 is supported through the use of markup languages, which play a key 571 role in that framework. 573 3.6 Componentization and Decomposition 575 This framework proposes a decomposed component architecture with a 576 very loose coupling of services and components. This means that a 577 service (such as a conferencing server or an auto-attendant) need not 578 be implemented as an actual server. Rather, these services can be 579 built by combining a few basic components in straightforward or 580 arbitrarily complex ways. 582 Since the components are easily deployed on separate boxes, by 583 separate vendors, or even with separate providers, we achieve a 584 separation of function that allows each piece to be developed in 585 complete isolation. We can also reuse existing components for new 586 applications. This allows rapid service creation, and the ability 587 for services to be distributed across organizational domains anywhere 588 in the Internet. 590 For many of these components it is also desirable to discover their 591 capabilities, for example querying the ability of a mixer to host a 592 10 dialog conference, or to reserve resources for a specific time. 593 These actions could be provided in the form of URLs, provided there 594 is an a priori means of understanding their semantics. For example 595 if there is a published dictionary of operations, a way to query the 596 service for the available operations and the associated URLs, the URL 597 can be the interface for providing these service operations. This 598 concept is described in more detail in the context of dialog 599 operations in section 601 3.6.1 Media Intermediaries 603 Media Intermediaries are not participants in any conversation space, 604 although an entity which is also a media translator may also have a 605 colocated participant component (for example a mixer which also 606 announces the arrival of a new participant; the announcement portion 607 is a participant, but the mixer itself is not). Media intermediaries 608 should be as transparent as possible to the end users--offering a 609 useful, fundamental service; without getting in the way of new 610 features implemented by participants. Some common media 611 intermediaries are desribed below. 613 3.6.2 Mixer 615 A SIP mixer is a component that combines media from all dialogs in 616 the same conversation in a media specific way. For example, the 617 default combining for an audio conference might be an N-1 618 configuration, while a text mixer might interleave text messages on a 619 per-line basis. More details about how to manipulate the media 620 policy used by mixers is being discussed in the XCON Working Group. 622 3.6.3 Transcoder 624 A transcoder translates media from one encoding or format to another 625 (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to 626 text/plain), or from one media type to another (for example text to 627 speech). A more thorough discussion of transcoding is described in 628 SIP transcoding services invocation [17]. 630 3.6.4 Media Relay 632 A media relay terminates media and simply forwards it to a new 633 destination without changing the content in any way. Sometimes media 634 relays are used to provide source IP address anonymity, to facilitate 635 middlebox traversal, or to provide a trusted entity where media can 636 be forcefully disconnected. 638 3.6.5 Queue Server 640 A queue server is a location where calls can be entered into one of 641 several FIFO (first-in, first-out) queues. A queue server would 642 subscribe to the presence of groups or individuals who are interested 643 in its queues. When detecting that a user is available to service a 644 queue, the server redirects or transfers the last call in the 645 relevant queue to the available user. On a queue-by-queue basis, 646 authorized users could also subscribe to the call state (dialog 647 information) of calls within a queue. Authorized users could use 648 this information to effectively pluck (take) a call out of the queue 649 (for example by sending an INVITE with a Replaces header to one of 650 the user agents in the queue). 652 3.6.6 Parking Place 654 A parking place is a location where calls can be terminated 655 temporarily and then retrieved later. While a call is "parked", it 656 can receive media "on-hold" such as music, announcements, or 657 advertisements. Such a service could be further decomposed such that 658 announcements or music are handled by a separate component. 660 3.6.7 Announcements and Voice Dialogs 662 An announcement server is a server which can play digitized media 663 (frequently audio), such as music or recorded speech. These servers 664 are typically accessible via SIP, HTTP, or RTSP. An analogous 665 service is a recording service which stores digitized media. A 666 convention for specifying announcements in SIP URIs is described in 667 [netann]. Likewise the same server could easily provide a service 668 which records digitized media. 670 A "voice dialog" is a model of spoken interactive behavior between a 671 human and an automaton which can include synthesized speech, 672 digitized audio, recognition of spoken and DTMF key input, recording 673 of spoken input, and interaction with call control. Voice dialogs 674 frequently consist of forms or menus. Forms present information and 675 gather input; menus offer choices of what to do next. 677 Spoken dialogs are a basic building block of applications which use 678 voice. Consider for example that a voice mail system, the 679 conference-id and passcode collection system for a conferencing 680 system, and complicated voice portal applications all require a voice 681 dialog component. 683 3.6.7.1 Text-to-Speech and Automatic Speech Recognition 685 Text-to-Speech (TTS) is a service which converts text into digitized 686 audio. TTS is frequently integrated into other applications, but 687 when separated as a component, it provides greater opportunity for 688 broad reuse. Automatic Speech Recognition (ASR) is a service which 689 attempts to decipher digitized speech based on a proposed grammar. 690 Like TTS, ASR services can be embedded, or exposed so that many 691 applications can take advantage of such services. A standardized 692 (decomposed) interface to access standalone TTS and ASR services is 693 currently being developed in the SPEECHSC Working Group. 695 3.6.7.2 VoiceXML 697 [VoiceXML] is a W3C recommendation that was designed to give authors 698 control over the spoken dialog between users and applications. The 699 application and user take turns speaking: the application prompts the 700 user, and the user in turn responds. Its major goal is to bring the 701 advantages of web-based development and content delivery to 702 interactive voice response applications. We believe that VoiceXML 703 represents the ideal partner for SIP in the development of 704 distributed IVR servers. VoiceXML is an XML based scripting language 705 for describing IVR services at an abstract level. VoiceXML supports 706 DTMF recognition, speech recognition, text-to-speech, and playing out 707 of recorded media files. The results of the data collected from the 708 user are passed to a controlling entity through an HTTP POST 709 operation. The controller can then return another script, or 710 terminate the interaction with the IVR server. 712 A VoiceXML server also need not be implemented as a monolithic 713 server. Below is a diagram of a VoiceXML browser which is split into 714 media and non-media handling parts. The VoiceXML interpreter handles 715 SIP dialog state and state within a VoiceXML document, and sends 716 requests to the media component over another protocol. 718 +-------------+ 719 | | 720 | VoiceXML | 721 | Interpreter | 722 | (signaling) | 723 +-------------+ 724 ^ ^ 725 | | 726 SIP | | RTSP 727 | | 728 | | 729 v v 730 +-------------+ +-------------+ 731 | | | | 732 | SIP UA | RTP | RTSP Server | 733 | |<------>| (media) | 734 | | | | 735 +-------------+ +-------------+ 737 Figure : Decomposed VoiceXML Server 739 3.7 Use of URIs 741 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 742 contexts: the Request-URI; Contact, To, From, and *-Info headers; 743 application/uri bodies; and embedded in email, web pages, instant 744 messages, and ENUM records. The request-URI identifies the user or 745 service that the call is destined for. 747 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 748 SIP content can also specify methods, special parameters, headers, 749 and even bodies. For example: 751 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 752 &To=;tag=879738 753 &From=;tag=023214 755 sip:bob@babylon.biloxi.com;method=REFER? 756 Refer-To= 758 Throughout this draft we discuss call control primitive operations. 759 One of the biggest problems is defining how these operations may be 760 invoked. There are a number of ways to do this. One way is to 761 define the primitives in the protocol itself such that SIP methods 762 (for example REFER) or SIP headers (for example Replaces) indicate a 763 specific call control action. Another way to invoke call control 764 primitives is to define a specific Request-URI naming convention. 765 Either these conventions must be shared between the client (the 766 invoker) and the server, or published by or on behlf of the server. 767 The former involves defining URL construction techniques (e.g. URL 768 parameters and/or token conventions) as proposed in [netannc]. The 769 latter technique usually involves discovering the URI via a SIP event 770 package, a web page, a business card, or an Instant Message. Yet 771 another means to acquire the URLs is to define a dictionary of 772 primitives with well-defined semantics and provide a means to query 773 the named primitives and corresponding URLs that may be invoked on 774 the service or dialogs. 776 3.7.1 Naming Users in SIP 778 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 779 that points to a domain with a location server that can map the URI 780 to set of Contact URIs where the user might be available. Typically 781 the Contact URIs are populated via registration. 783 Address of Record Contacts 785 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 786 sip:bbrown@mailbox.provider.net 787 sip:+1.408.555.6789@mobile.net 789 Callee Capabilities [20] defines a set of additional parameters to 790 the Contact header that define the characteristics of the user agent 791 at the specified URI. For example, there is a mobility parameter 792 which indicates whether the UA is fixed or mobile. When a user agent 793 registers, it places these parameters in the Contact headers to 794 characterize the URIs it is registering. This allows a proxy for 795 that domain to have information about the contact addresses for that 796 user. 798 When a caller sends a request, it can optionally request Caller 799 Preferences [21], by including the Accept-Contact and Reject-Contact 800 headers which request certain handling by the proxy in the target 801 domain. These headers contain preferences that describe the set of 802 desired URIs to which the caller would like their request routed. 803 The proxy in the target domain matches these preferences with the 804 Contact characteristics originally registered by the target user. 805 The target user can also choose to run arbitrarily complex "Find-me" 806 feature logic on a proxy in the target domain. 808 There is a strong asymmetry in how preferences for callers and 809 callees can be presented to the network. While a caller takes an 810 active role by initiating the request, the callee takes a passive 811 role in waiting for requests. This motivates the use of callee- 812 supplied scripts and caller preferences included in the call 813 request. This asymmetry is also reflected in the appropriate 814 relationship between caller and callee preferences. A server for a 815 callee should respect the wishes of the caller to avoid certain 816 locations, while the preferences among locations has to be the 817 callee's choice, as it determines where, for example, the phone rings 818 and whether the callee incurs mobile telephone charges for incoming 819 calls. 821 SIP User Agent implementations are encouraged to make intelligent 822 decisions based on the type of participants (active/passive, hidden, 823 human/robot) in a conversation space. This information is conveyed 824 via the session dialog package or in a SIP header parameter 825 communicated using an appropriate SIP header. For example, a music 826 on hold service may take the sensible approach that if there are two 827 or more unhidden participants, it should not provide hold music; or 828 that it will not send hold music to robots. 830 Multiple participants in the same conversation space may represent 831 the same human user. For example, the user may use one participant 832 for video, chat, and whiteboard media on a PC and another for audio 833 media on a SIP phone. In this case, the address-of-record is the 834 same for both user agents, but the Contacts are different. In 835 addition, human users may add robot participants which act on their 836 behalf (for example a call recording service, or a calendar 837 reminder). Call Control features in SIP should continue to function 838 as expected in such an environment. 840 3.7.2 Naming Services with SIP URIs 842 [Editor's Note: this section needs to be pared down considerably, and 843 the examples replaced with example.{com|org|net} domain names.] A 844 critical piece of defining a session level service that can be 845 accessed by SIP is defining the naming of the resources within that 846 service. This point cannot be overstated. 848 In the context of SIP control of application components, we take 849 advantage of the fact that the standard SIP URI has a user part. 850 Most services may be thought of as user automatons that participate 851 in SIP sessions. It naturally follows that the user address, or the 852 left-hand-side of the URI, should be utilized as a service indicator. 854 For example, media servers commonly offer multiple services at a 855 single host address. Use of the user part as a service indicator 856 enables service consumers to direct their requests without ambiguity. 857 It has the added benefit of enabling media services to register their 858 availability with SIP Registrars just as any "real" SIP user would. 859 This maintains consistency and provides enhanced flexibility in the 860 deployment of media services in the network. 862 There has been much discussion about the potential for confusion if 863 media services URIs are not readily distinguishable from other types 864 of SIP UA's. The use of a service namespace provides a mechanism to 865 unambiguously identify standard interfaces while not constraining 866 the development of private or experimental services. 868 In SIP, the request-URI identifies the user or service that the call 869 is destined for. The great advantage of using URIs (specifically, 870 the SIP request URI) as a service identifier comes because of the 871 combination of two facts. First, unlike in the PSTN, where the 872 namespace (dialable telephone numbers) are limited, URIs come from an 873 infinite space. They are plentiful, and they are free. Secondly, 874 the primary function of SIP is call routing through manipulations of 875 the request URI. In the traditional SIP application, this URI 876 represents people. However, the URI can also represent services, as 877 we propose here. This means we can apply the routing services SIP 878 provides to routing of calls to services. The result - the problem 879 of service invocation and service location becomes a routing problem, 880 for which SIP provides a scalable and flexible solution. Since there 881 is such a vast namespace of services, we can explicitly name each 882 service in a finely granular way. This allows the distribution of 883 services across the network. 885 Consider a conferencing service, where we have separated the names of 886 ad-hoc conferences from scheduled conferences, we can program proxies 887 to route calls for ad-hoc conferences to one set of servers, and 888 calls for scheduled ones to another, possibly even in a different 889 provider. In fact, since each conference itself is given a URI, we 890 can distribute conferences across servers, and easily guarantee that 891 calls for the same conference always get routed to the same server. 892 This is in stark contrast to conferences in the telephone network, 893 where the equivalent of the URI - the phone number - is scarce. An 894 entire conferencing provider generally has one or two numbers. 895 Conference IDs must be obtained through IVR interactions with the 896 caller, or through a human attendant. This makes it difficult to 897 distribute conferences across servers all over the network, since the 898 PSTN routing only knows about the dialed number. 900 In the case of a dialog server, the voice dialog itself is the target 901 for the call. As such, the request URI should contain the identifier 902 for this spoken dialog. This is consistent with the Request-URI 903 service invocation model of RFC 3087. This URL can be in one of two 904 formats. In the first, the VoiceXML script is identified directly by 905 an HTTP URL. In the second, the script is not specified. Rather, 906 the dialog server uses its configuration to map the incoming request 907 to a specific script. 909 Since the request URI could indicate a request for a variety of 910 different services, of which a dialog server is only one type, this 911 example request URI first begins with a service identifier, that 912 indicates the basic service required. For VoiceXML scripts, this 913 identification information is a URL-encoded version of the URL which 914 references the script to execute, or if not present, the dialog 915 server uses server-specific configuration to determine which script 916 to execute. 918 Examples of URLs that invoke VoiceXML dialogs are: (line folding for 919 clarity only) 921 sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml 922 @vxmlservers.com 924 sip:dialog.vxml@vxmlservers.com 926 The first of these indicates that the dialog server (located at 927 vxmlservers.com) should invoke a VoiceXML script fetched from 928 http://dialogs.server.com/script32.vxml. Since the user part of the 929 SIP URL cannot contain the : character, this must be escaped to %3a. 931 These types of conventions are not limited to application component 932 servers. An ordinary SIP User Agent can have a special URIs as well, 933 for example, one which is automatically answered by a speakerphone. 934 Since URIs are so plentiful, using a separate URI for this service 935 does not exhaust a valuable resource. The requested service is clear 936 to the user agent receiving the request. This URI can also be 937 included as part of another feature (for example, the Intercom 938 feature described in Section 6.1.6). This feature can be specified 939 with a SIP user parameter, since are part of the userpart of a SIP 940 URI. 942 Likewise a Request URI can fully describe an announcement service 943 through the use of the user part of the address and additional URI 944 parameters. In our example, the user portion of the address, "annc", 945 specifies the announcement service on the media server. The two URI 946 parameters "play=" and "early=" specify the audio resource to play 947 and whether early media is desired. 949 sip:annc@ms2.carrier.net; 950 play=http://audio.carrier.net/allcircuitsbusy.au;early=yes 952 sip:annc@ms2.carrier.net; 953 play=file://fileserver.carrier.net/geminii/yourHoroscope.wav 955 In practical applications, it is important that an invoker does not 956 necessarily apply semantic rules to various URIs it did not create. 957 Instead, it should allow any arbitrary string to be provisioned, and 958 map the string to the desired behavior. The administrator of a 959 service may choose to provision specific conventions or mnemonic 960 strings, but the application should not require it. In any large 961 installation, the system owner is likely to have pre-existing rules 962 for mnemonic URIs, and any attempt by an application to define its 963 own rules may create a conflict. Implementations should allow an 964 arbitrary mix of URLs from these schemes, or any other scheme that 965 renders valid SIP URIs to be provisioned, rather than enforce only 966 one particular scheme. 968 For example, a voicemail application can be built using very 969 different sets of URI conventions, as illustrated below: 971 URI Identity Example Scheme 1 972 Example Scheme 2 973 Example Scheme 3 975 Deposit with sip:sub-rjs-deposit@vm.wcom.com 976 standard greeting sip:677283@vm.wcom.com 977 sip:rjs@vm.wcom.com;mode=deposit 979 Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com 980 phone greeting sip:677372@vm.wcom.com 981 sip:rjs@vm.wcom.com;mode=3991243 983 Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com 984 special greeting sip:677384@vm.wcom.com 985 sip:rjs@vm.wcom.com;mode=sg 987 Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com 988 authentication sip:677405@vm.wcom.com 989 sip:rjs@vm.wcom.com;mode=retrieve 991 Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com 992 for PIN in-band sip:677415@vm.wcom.com 993 sip:rjs@vm.wcom.com;mode=inpin 995 As we have shown, SIP URIs represent an ideal, flexbile mechanism for 996 describing and naming service resources, be they queues, conferences, 997 voice dialogs, announcements, voicemail treatments, or phone 998 features. 1000 3.8 Invoker Independence 1002 With functional signaling, only the invoker of features in SIP need 1003 to know exactly which feature they are invoking. One of the primary 1004 benefits of this approach is that combinations of functional features 1005 work in SIP call control without requiring complex feature 1006 interaction matrices. For example, let us examine the combination of 1007 a "transfer" of a call which is "conferenced". 1009 Alice calls Bob. Alice silently "conferences in" her robotic 1010 assistant Albert as a hidden party. Bob transfers Alice to Carol. 1011 If Bob asks Alice to Replace her leg with a new one to Carol then 1012 both Alice and Albert should be communicating with Carol 1013 (transparently). 1015 Using the peer-to-peer model, this combination of features works fine 1016 if A is doing local mixing (Alice replaces Bob's call-leg with 1017 Carol's), or if A is using a central mixer (the mixer replaces Bob's 1018 call leg with Carol's). A clever implementation using the 3pcc model 1019 can generate similar results. 1021 New extensions to the SIP Call Control Framework should attempt to 1022 preserve this property. 1024 3.9 Billing issues 1026 Billing in the PSTN is typically based on who initiated a call. At 1027 the moment billing in a SIP network is neither consistent with 1028 itself, nor with the PSTN. (A billing model for SIP should allow for 1029 both PSTN-style billing, and non-PSTN billing.) The example below 1030 demonstrates one such inconsistency. 1032 Alice places a call to Bob. Alice then blind transfers Bob to Carol 1033 through a PSTN gateway. In current usage of REFER, Bob may be billed 1034 for a call he did not initiate (his UA originated the outgoing call 1035 leg however). This is not necessarily a terrible thing, but it 1036 demonstrates a security concern (Bob must have appropriate local 1037 policy to prevent fraud). Also, Alice may wish to pay for Bob's 1038 session with Carol. There should be a way to signal this in SIP. 1040 Likewise a Replacement call may maintain the same billing 1041 relationship as a Replaced call, so if Alice first calls Carol, then 1042 asks Bob to Replace this call, Alice may continue to receive a bill. 1044 Further work in SIP billing should define a way to set or discover 1045 the direction of billing. 1047 4. Catalog of call control actions and sample features 1049 Call control actions can be categorized by the dialogs upon which 1050 they operate. The actions may involve a single or multiple dialogs. 1051 These dialogs can be early or established. Multiple dialogs may be 1052 related in a conversation space to form a conference or other 1053 interesting media topologies. 1055 It should be noted that it is desirable to provide a means by which a 1056 party can discover the actions which may be performed on a dialog. 1057 The interested party may be independent or related to the dialogs. 1058 One means of accomplishing this is through the ability to define and 1059 obtain URLs for these actions as described in section . 1061 Below are listed several call control "actions" which establish or 1062 modify dialogs and relate the participants in a conversation space. 1063 The names of the actions listed are for descriptive purposes only 1064 (they are not normative). This list of actions is not meant to be 1065 exhaustive. 1067 In the examples, all actions are initiated by the user "Alice" 1068 represented by UA "A". 1070 4.1 Early Dialog Actions 1072 The following are a set of actions that may be performed on a single 1073 early dialog. These actions can be thought of as a set of remote 1074 control operations. For example an automaton might perform the 1075 operation on behalf of a user. Alternatively a user might use the 1076 remote control in the form of an application to perform the action on 1077 the early dialog of a UA which may be out of reach. All of these 1078 actions correspond to telling the UA how to respond to a request to 1079 establish an early dialog. These actions provide useful 1080 functionality for PDA, PC and server based applications which desire 1081 the ability to control a UA. A proposed mechanism for this type of 1082 functionality is described in Remote Call Control [23]. 1084 4.1.1 Remote Answer 1086 A dialog is in some early dialog state such as 180 Ringing. It may 1087 be desirable to tell the UA to answer the dialog. That is tell it to 1088 send a 200 Ok response to establish the dialog. 1090 4.1.2 Remote Forward or Put 1092 It may be desirable to tell the UA to respond with a 3xx class 1093 response to forward an early dialog to another UA. 1095 4.1.3 Remote Busy or Error Out 1097 It may be desirable to instruct the UA to send an error response such 1098 as 486 Busy Here. 1100 4.2 Single Dialog Actions 1102 There is another useful set of actions which operate on a single 1103 established dialog. These operations are useful in building 1104 productivity applications for aiding users to control their phone. 1105 For example a CRM application which sets up calls for a user 1106 eliminating the need for the user to actually enter an address. 1107 These operations can also be thought of a remote control actions. A 1108 proposed mechanism for this type of functionality is described in 1109 Remote Call Control [23]. 1111 4.2.1 Remote Dial 1113 This action instructs the UA to initiate a dialog. This action can 1114 be performed using the REFER method. 1116 4.2.2 Remote On and Off Hold 1118 This action instructs the UA to put an established dialog on hold. 1119 Though this operation can be conceptually be performed with the REFER 1120 method, there is no semantics defined as to what the referred party 1121 should do with the SDP. There is no way to distinguish between the 1122 desire to go on or off hold. 1124 4.2.3 Remote Hangup 1126 This action instructs the UA to terminate an early or established 1127 dialog. A REFER request with the following Refer-To URI performs 1128 this action. Note: this URL is not properly escaped. 1130 sip:bob@babylon.biloxi.example.com;method=BYE?Call-ID=13413098 1131 &To=;tag=879738 1132 &From=;tag=023214 1134 4.3 Multi-dialog actions 1136 These actions apply to a set of related dialogs. 1138 4.3.1 Transfer 1140 The conversation space changes as follows: 1142 before after 1143 { A , B } --> { C , B } 1145 A replaces itself with C. 1147 To make this happen using the peer-to-peer approach, "A" would send 1148 two SIP requests. A shorthand for those requests is shown below: 1150 REFER B Refer-To:C 1151 BYE B 1153 To make this happen instead using the 3pcc approach, the controller 1154 sends requests represented by the shorthand below: 1156 INVITE C (w/SDP of B) 1157 reINVITE B (w/SDP of C) 1158 BYE A 1160 Features enabled by this action: - blind transfer - transfer to a 1161 central mixer (some type of conference or forking) - transfer to park 1162 server (park) - transfer to music on hold or announcement server - 1163 transfer to a "queue" - transfer to a service (such as Voice Dialogs 1164 service) - transition from local mixer to central mixer 1166 This action is frequently referred to as "completing an attended 1167 transfer". It is described in more detail in cc-transfer [18]. 1169 4.3.2 Take 1171 The conversation space changes as follows: { B , C } --> { B , A } 1172 A forcibly replaces C with itself. In most uses of this primitive, A 1173 is just "un-replacing" itself. Using the peer-to-peer approach, "A" 1174 sends: INVITE B Replaces: 1176 Using the 3pcc approach (all requests sent from controller) INVITE A 1177 (w/SDP of B) reINVITE B (w/SDP of A) BYE C 1179 Features enabled by this action: - transferee completes an attended 1180 transfer - retrieve from central mixer (not recommended) - retrieve 1181 from music on hold or park - retrieve from queue - call center take - 1182 voice portal resuming ownership of a call it originated - answering- 1183 machine style screening (pickup) - pickup of a ringing call (i.e. 1184 early dialog) 1186 Note: that pick up of a ringing call has perhaps some interesting 1187 additional requirements. First of all it is an early dialog as 1188 opposed to an established dialog. Secondly the party which is to 1189 pickup the call may only wish to do so only while it is an early 1190 dialog. That is in the race condition where the ringing UA accepts 1191 just before it receives signaling from the party wishing to take the 1192 call, the taking party wishes to yield or cancel the take. The goal 1193 is to avoid yanking an answered call from the called party. 1195 This action is described in Replaces [9] and in cc-transfer [18]. 1197 4.3.3 Add 1199 Note that the following 4 actions are described in cc-conferencing 1200 [19]. 1202 This is merely adding a participant to a SIP conference. The 1203 conversation space changes as follows: { A , B } --> { A, B, C } A 1204 adds C to the conversation. Using the peer-to-peer approach, adding 1205 a party using local mixing requires no signaling. To transition from 1206 a 2-party call or a locally mixed conference to centrally mixing A 1207 could send the following requests: REFER B Refer-To: conference-URI 1208 INVITE conference-URI BYE B To add a party to a conference: REFER C 1209 Refer-To: conference-URI or REFER conference-URI Refer-To: C Using 1210 the 3pcc approach to transition to centrally mixed, the controller 1211 would send: INVITE mixer leg 1 (w/SDP of A) INVITE mixer leg 2 (w/SDP 1212 of B) INVITE C (late SDP) reINVITE A (w/SDP of mixer leg 1) reINVITE 1213 B (w/SDP of mixer leg 2) INVITE mixer leg3 (w/SDP of C) To add a 1214 party to a SIP conference: INVITE C (late SDP) INVITE conference-URI 1215 (w/SDP of C) Features enabled: - standard conference feature - call 1216 recording - answering-machine style screening (screening) 1218 4.3.4 Local Join 1220 The conversation space changes like this: { A, B} , {A, C} --> {A, 1221 B, C} or like this { A, B} , {C, D} --> {A, B, C, D} A takes two 1222 conversation spaces and joins them together into a single space. 1223 Using the peer-to-peer approach, A can mix locally, or REFER the 1224 participants of both conversation spaces to the same central mixer 1225 (as in 5.3) For the 3pcc approach, the call flows for inserting 1226 participants, and joining and splitting conversation spaces are 1227 tedious yet straightforward, so these are left as an exercise for the 1228 reader. Features enabled: - standard conference feature - leaving a 1229 sidebar to rejoin a larger conference 1231 4.3.5 Insert 1233 The conversation space changes like this: { B , C } --> {A, B, C } 1234 A inserts itself into a conversation space. A proposed mechanism for 1235 signaling this using the peer-to-peer approach is to send a new 1236 header in an INVITE with "joining" semantics. For example: INVITE B 1237 Join: If B accepted the INVITE, B would accept 1238 responsibility to setup the call legs and mixing necessary (for 1239 example: to mix locally or to transfer the participants to a central 1240 mixer) Features enabled: - barge-in - call center monitoring - call 1241 recording 1243 4.3.6 Split 1245 { A, B, C, D } --> { A, B } , { C, D } If using a central conference 1246 with peer-to-peer REFER C Refer-To: conference-URI (new URI) REFER D 1247 Refer-To: conference-URI (new URI) BYE C BYE D Features enabled: - 1248 sidebar conversations during a larger conference 1250 4.3.7 Near-fork 1252 A participates in two conversation spaces simultaneously: { A, B } 1253 --> { B , A } & { A , C } A is a participant in two conversation 1254 spaces such that A sends the same media to both spaces, and renders 1255 media from both spaces, presumably by mixing or rendering the media 1256 from both. We can define that A is the "anchor" point for both 1257 forks, each of which is a separate conversation space. This action 1258 is purely local implementation (it requires no special signaling). 1260 Local features such as switching calls between the background and 1261 foreground are possible using this media relationship. 1263 4.3.8 Far fork 1265 The conversation space diagram... { A, B } --> { A , B } & { B , C } 1266 A requests B to be the "anchor" of two conversation spaces. This is 1267 easily setup by creating a conference with two subconferences and 1268 setting the media policy appopriately such that B is a participant in 1269 both. Media forking can also be setup using 3pcc as described in 1270 Section 5.1 of RFC3264 [3] (an offer/answer model for SDP). The 1271 session descriptions for forking are quite complex. Controllers 1272 should verify that endpoints can handle forked-media, for example 1273 using prior configuration. 1275 Features enabled: 1276 o barge-in 1277 o voice portal services 1278 o whisper 1279 o hotword detection 1280 o sending DTMF somewhere else 1282 5. Security Considerations 1284 Call Control primitives provide a powerful set of features that can 1285 be dangerous in the hands of an attacker. To complicate matters, 1286 call control primitives are likely to be automatically authorized 1287 without direct human oversight. 1289 The class of attacks which are possible using these tools include the 1290 ability to eavesdrop on calls, disconnect calls, redirect calls, 1291 render irritating content (including ringing) at a user agent, cause 1292 an action that has billing consequences, subvert billing (theft-of- 1293 service), and obtain private information. Call control extensions 1294 must take extra care to describe how these attacks will be prevented. 1296 We can also make some general observations about authorization and 1297 trust with respect to call control. The security model is 1298 dramatically dependent on the signaling model chosen (see section 1299 3.2) 1301 Let us first examine the security model used in the 3pcc approach. 1302 All signaling goes through the controller, which is a trusted entity. 1303 Traditional SIP authentication and hop-by-hop encrpytion and message 1304 integrity work fine in this environment, but end-to-end encrpytion 1305 and message integrity may not be possible. 1307 When using the peer-to-peer approach, call control actions and 1308 primitives can be legitimately initiated by a) an existing 1309 participant in the conversation space, b) a former participant in the 1310 conversation space, or c) an entity trusted by one of the 1311 participants. For example, a participant always initiates a 1312 transfer; a retrieve from Park (a take) is initiated on behalf of a 1313 former participant; and a barge-in (insert or far-fork) is initiated 1314 by a trusted entity (an operator for example). 1316 Authenticating requests by an existing participant or a trusted 1317 entity can be done with baseline SIP mechanisms. In the case of 1318 features initiated by a former participant, these should be protected 1319 against replay attacks by using a unique name or identifier per 1320 invocation. The Replaces header exhibits this behavior as a by- 1321 product of its operation (once a Replaces operation is successful, 1322 the call-leg being Replaced no longer exists). For other requests, a 1323 "one-time" Request-URI may be provided to the feature invoker. 1325 To authorize call control primitives that trigger special behavior 1326 (such as an INVITE with Replaces or Join semantics), the receiving 1327 user agent may have trouble finding appropriate credentials with 1328 which to challenge or authorize the request, as the sender may be 1329 completely unknown to the receiver, except through the introduction 1330 of a third party. These credentials need to be passed transitively 1331 in some way or fetched in an event body, for example. 1333 6. IANA Considerations 1335 This document required no action by IANA. 1337 7. Appendix A: Example Features 1339 Primitives are defined in terms of their ability to provide features. 1340 These example features should require an amply robust set of services 1341 to demonstrate a useful set of primitives. They are described here 1342 briefly. Note that the descriptions of these features are non- 1343 normative. Some of these features are used as examples in section 6 1344 to demonstrate how some features may require certain media 1345 relationships. Note also that this document describes a mixture of 1346 both features originating in the world of telephones, and features 1347 which are clearly Internet oriented. 1349 Example Feature Definitions: 1351 Call Waiting - Alice is in a call, then receives another call. Alice 1352 can place the first call on hold, and talk with the other caller. 1353 She can typically switch back and forth between the callers. 1355 Blind Transfer - Alice is in a conversation with Bob. Alice asks Bob 1356 to contact Carol, but makes no attempt to contact Craol 1357 independently. In many implementations, Alice does not verify Bob's 1358 success or failure in contacting Carol. 1360 Attended Transfer - The transferring party establishes a session with 1361 the transfer target before completing the transfer. 1363 Consultative transfer - the transferring party establishes a session 1364 with the target and mixes both sessions together so that all three 1365 parties can participate, then disconnects leaving the transferee and 1366 transfer target with an active session. 1368 Conference Call - Three or more active, visible participants in the 1369 same conversation space. 1371 Call Park - A call participant parks a call (essentially puts the 1372 call on hold), and then retrieves it at a later time (typically from 1373 another location). 1375 Call Pickup - A party picks up a call that was ringing at another 1376 location. One variation allows the caller to choose which location, 1377 another variation just picks up any call in that user's "pickup 1378 group". 1380 Music on Hold - When Alice places a call with Bob on hold, it 1381 replaces its audio with streaming content such as music, 1382 announcements, or advertisements. 1384 Call Monitoring - A call center supervisor joins an in-progress call 1385 for monitoring purposes. 1387 Barge-in - Carol interrupts Alice who has a call in-progress call 1388 with Bob. In some variations, Alice forcibly joins a new conversation 1389 with Carol, in other variations, all three parties are placed in the 1390 same conversation (basically a 3-way conference). 1392 Hotline - Alice picks up a phone and is immediately connected to the 1393 technical support hotline, for example. 1395 Autoanswer - Calls to a certain address or location answer 1396 immediately via a speakerphone. 1398 Intercom - Alice typically presses a button on a phone which 1399 immediately connects to another user or phone and casues that phone 1400 to play her voice over its speaker. Some variations immediately 1401 setup two-way communications, other variations require another button 1402 to be pressed to enable a two-way conversation. 1404 Speakerphone paging - Alice calls the paging address and speaks. Her 1405 voice is played on the speaker of every idle phone in a preconfigured 1406 group of phones. 1408 Speed dial - Alice dials an abbreviated number, or enters an alias, 1409 or presses a special speed dial button representing Bob. Her action 1410 is interpreted as if she specified the full address of Bob. 1412 Call Return - Alice calls Bob. Bob misses the call or is disconnected 1413 before he is finished talking to Alice. Bob invokes Call return 1414 which calls Alice, even if Alice did not provide her real identity or 1415 location to Bob. 1417 Inbound Call Screening - Alice doesn't want to receive calls from 1418 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1419 some variations this works even if Matt hides his identity. 1421 Outbound Call Screening - Alice is paged and unknowingly calls a PSTN 1422 pay-service telephone number in the Carribean, but local policy 1423 blocks her call, and possibly informs her why. 1425 Call Forwarding - Before a call-leg is accepted it is redirected to 1426 another location, for example, because the originally intended 1427 recipient is busy, does not answer, is disconnected from the network, 1428 configured all requests to go soemwhere else. 1430 Message Waiting - Bob calls Alice when she steps away from her phone, 1431 when she returns a visible or audible indicator conveys that someone 1432 has left her a voicemail message. The message waiting indication may 1433 also convey how many messages are waiting, from whom, what time, and 1434 other useful pieces of information. 1436 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 1437 her either ring briefly or not at all and are forwarded elsewhere. 1438 Some variations allow specially authorized callers to override this 1439 feature and ring Alice anyway. 1441 Distinctive ring - Incoming calls have different ring cadences or 1442 sample sounds depending on the From party, the To party, or other 1443 factors. 1445 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 1446 like Bob to call her automatically when he is available. When Bob 1447 hangs up, alice's phone rings. When Alice answers, Bob's phone 1448 rings. Bob answers and they talk. 1450 Find-Me - Alice sets up complicated rules for how she can be reached 1451 (possibly using [CPL], [presence] or other factors). When Bob calls 1452 Alice, his call is eventually routed to a temporary Contact where 1453 Alice happens to be available. 1455 Whispered call waiting - Alice is in a conversation with Bob. Carol 1456 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1457 get lunch in 15 minutes?"), or an automaton whispers to Alice 1458 informing her that Carol is trying to reach her. 1460 Voice message screening - Bob calls Alice. Alice is screening her 1461 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1462 leave his message. If she decides to talk to Bob, she can take the 1463 call back from the voicemail system, otherwise she can let Bob leave 1464 a message. This emulates the behavior of a home telephone answering 1465 machine 1467 Presence-Enabled Conferencing: Alice wants to set up a conference 1468 call with Bob and Cathy when they all happen to be available (rather 1469 than scheduling a predefined time). The server providing the 1470 application monitors their status, and calls all three when they are 1471 all "online", not idle, and not in another call. 1473 IM Conference Alerts: A user receives an notification as an Instant 1474 Message whenever someone joins a conference they are also in. 1476 Single Line Extension -- A group of phones are all treated as 1477 "extensions" of a single line. A call for one rings them all. As 1478 soon as one answers, the others stop ringing. If any extension is 1479 actively in a coversation, another extension can "pick up" and 1480 immediately join the conversation. This emulates the behavior of a 1481 home telephone line with multiple phones. 1483 Click-to-dial - Alice looks in her company directory for Bob. When 1484 she finds Bob, she clicks on a URL to call him. Her phone rings (or 1485 possibly answers automatically), and when she answers, Bob's phone 1486 rings. 1488 Pre-paid calling - Alice pays for a certain currency or unit amount 1489 of calling value. When she places a call, she provides her account 1490 number somehow. If her account runs out of calling value during a 1491 call her call is disconnected or redirected to a service where she 1492 can purchase more calling value. 1494 Voice Portal - A service that allows users to access a portal site 1495 using spoken dialog interaction. For example, Alice needs to 1496 schedule a working dinner with her co-worker Carol. Alice uses a 1497 voice portal to check Carol's flight schedule, find a restauraunt 1498 near her hotel, make a reservation, get directions there, and page 1499 Carol with this information. 1501 7.1 Implementation of these features 1503 Example Features: 1504 Call Hold [Offer/Answer] for SIP 1505 Call Waiting Local Implementation 1506 Blind Transfer [cc-transfer] 1507 Attended Transfer [cc-transfer] 1508 Consultative transfer [cc-transfer] 1509 Conference Call [conf-models] 1510 Call Park *[examples] 1511 Call Pickup *[examples] 1512 Music on Hold *[examples] 1513 Call Monitoring *Insert 1514 Barge-in *Insert or Far-Fork 1515 Hotline Local Implementation 1516 Autoanswer Local URI convention 1517 Speed dial Local Implementation 1518 Intercom *Speed dial + autoanswer 1519 Speakerphone paging *Speed dial + autoanswer 1520 Call Return Proxy feature 1521 Inbound Call Screening Proxy or Local implementation 1522 Outbound Call Screening Proxy feature 1523 Call Forwarding Proxy or Local implementation 1524 Message Waiting [msg-waiting] 1525 Do Not Disturb [presence] 1526 Distinctive ring *Proxy or Local implementation 1527 Automatic Callback 2 person presence-based conference 1528 Find-Me Proxy service based on presence 1529 Whispered call waiting Local implementation 1530 Voice message screening * 1531 Presence-based Conferencing*call when presence = available 1532 IM Conference Alerts subscribe to conference status 1533 Single Line Extension * 1534 Click-to-dial * 1535 Pre-paid calling * 1536 Voice Portal * 1538 7.1.1 Call Park 1540 Call park requires the ability to: put a dialog some place, advertise 1541 it to users in a pickup group and to uniquely identify it in a means 1542 that can be communicated (including human voice). The dialog can be 1543 held locally on the UA parking the dialog or alternatively 1544 transferred to the park service for the pickup group. The parked 1545 dialog then needs to be labeled (e.g. orbit 12) in a way that can be 1546 communicated to the party that is to pick up the call. The UAs in 1547 the pick up group discovers the parked dialog(s) via the dialog 1548 package from the park service. If the dialog is parked locally the 1549 park service merely aggregates the parked call states from the set of 1550 UAs in the pickup up group. 1552 7.1.2 Call Pickup 1554 There are two different features which are called call pickup. The 1555 first is the pickup of a parked dialog. The UA from which the dialog 1556 is to be picked up subscribes to the session dialog state of the park 1557 service or the UA which has locally parked the dialog. Dialogs which 1558 are parked should be labeled with an identifier. The labels are used 1559 by the UA to allow the user to indicate which dialog is to be picked 1560 up. The UA picking up the call invoked the URL in the call state 1561 which is labeled as replace-remote. 1563 The other call pickup feature involves picking up an early dialog 1564 (typically ringing). This feature uses some of the same primitives 1565 as the pick up of a parked call. The call state of the UA ringing 1566 phone is advertised using the dialog package. The UA which is to 1567 pickup the early dialog subscribes either directly to the ringing UA 1568 or to a service aggregating the states for UAs in the pickup group. 1569 The call state identifies early dialogs. The UA uses the call 1570 state(s) to help the user choose which early dialog that is to be 1571 picked up. The UA then invokes the URL in the call state labeled as 1572 replace-remote. 1574 7.1.3 Music on Hold 1576 Music on hold can be implemented a number of ways. One way is to 1577 transfer the held call to a holding service. When the UA wishes to 1578 take the call off hold it basically performs a take on the call from 1579 the holding service. This involves subscribing to call state on the 1580 holding service and then invoking the URL in the call state labeled 1581 as replace-remote. 1583 Alternatively music on hold can be performed as a local mixing 1584 operation. The UA holding the call can mix in the music from the 1585 music service via RTP (i.e. an additional dialog) or RTSP or other 1586 streaming media source. This approach is simpler (i.e. the held 1587 dialog does not move so there is less chance of loosing them) from a 1588 protocol perspective, however it does use more LAN bandwidth and 1589 resources on the UA. 1591 7.1.4 Call Monitoring 1593 Call monitoring is a Join operation. The monitoring UA sends a Join 1594 to the dialog it wants to listen to. It is able to discover the 1595 dialog via the dialog state on the monitored UA. The monitoring UA 1596 sends SDP in the INVITE which indicates receive only media. As the 1597 UA is monitoring only it does not matter whether the UA indicates it 1598 wishes the send stream be mix or point to point. 1600 7.1.5 Barge-in 1602 Barge-in works the same as call monitoring except that it must 1603 indicate that the send media stream to be mixed so that all of the 1604 other parties can hear the stream from UA barging in. 1606 7.1.6 Intercom 1608 The UA initiates a dialog using INVITE in the ordinary way. The 1609 calling UA then signals the paged UA to answer the call. The calling 1610 UA may discover the URL to answer the call via the session dialog 1611 package of the called UA. The called UA accepts the INVITE with a 1612 200 Ok and automatically enables the speakerphone. 1614 Alternatively this can be a local decision for the UA to answer based 1615 upon called party identification. 1617 7.1.7 Speakerphone paging 1619 Speakerphone paging can be implemented using either multicast or 1620 through a simple multipoint mixer. In the multicast solution the 1621 paging UA sends a multicast INVITE with send only media in the SDP 1622 (see also RFC3264). The automatic answer and enabling of the 1623 speakerphone is a locally configured decision on the paged UAs. The 1624 paging UA sends RTP via the multicast address indicated in the SDP. 1626 The multipoint solution is accomplished by sending an INVITE to the 1627 multipoint mixer. The mixer is configured to automatically answer 1628 the dialog. The paging UA then sends REFER requests for each of the 1629 UAs that are to become paging speakers (The UA is likely to send out 1630 a single REFER which is parallel forked by the proxy server). The 1631 UAs performing as paging speakers are configured to automatically 1632 answer based upon caller identification (e.g. To field, URI or 1633 Referred-To headers). 1635 Finally as a third option, the user agent can send a mass-invitation 1636 request to a conference server, which would create a conference and 1637 send invitations to the conference to all user agents in the paging 1638 group. 1640 7.1.8 Distinctive ring 1642 The target UA either makes a local decision based on information in 1643 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1644 Alert-Info header provded by the caller or inserted by a trusted 1645 proxy. In the latter case, the UA fetches the content described in 1646 the URI (typically via http) and renders it to the user. 1648 7.1.9 Voice message screening 1650 At first, this is the same as call monitoring. In this case the 1651 voicemail service is one of the UAs. The UA screening the message 1652 monitors the call on the voicemail service, and also subscribes to 1653 call-leg information. If the user screening their messages decides 1654 to answer, they perform a Take from the voicemail system (for 1655 example, send an INVITE with Replaces to the UA leaving the message) 1657 7.1.10 Single Line Extension 1659 Incoming calls ring all the extensions through basic parallel forking 1660 [bis]. Each extension subscribes to call-leg events from each other 1661 extension. While one user has an active call, any other UA extension 1662 can insert itself into that conversation (it already knows the call- 1663 leg information)in the same way as barge-in. 1665 7.1.11 Click-to-dial 1667 The application or server which hosts the click-to-dial application 1668 captures the URL to be dialed and can setup the call using 3pcc or 1669 can send a REFER request to the UA which is to dial the address. As 1670 users sometimes change their mind or wish to give up listing to a 1671 ringing or voicemail answered phone, this application illustrates the 1672 need to also have the ability to remotely hangup a call. 1674 7.1.12 Pre-paid calling 1676 For prepaid calling, the user's media always passes through a device 1677 which is trusted by the pre-paid provider. This may be the other 1678 endpoint (for example a PSTN gateway). In either case, an 1679 intermediary proxy or B2BUA can periodically verify the amount of 1680 time available on the pre-paid account, and use the session-timer 1681 extension to cause the trusted endpoint (gateway) or intermediary 1682 (media relay) to send a reINVITE before that time runs out. During 1683 the reINVITE, the SIP intermediary can reverify the account and 1684 insert another session-timer header. 1686 Note that while most pre-paid systems on the PSTN use an IVR to 1687 collect the account number and destination, this isn't strictly 1688 necessary for a SIP-originated prepaid call. SIP requests and SIP 1689 URIs are sufficiently expressive to convey the final destination, the 1690 provider of the prepaid service, the location from which the user is 1691 calling, and the prepaid account they want to use. If a pre-paid IVR 1692 is used, the mechanism described below (Voice Portals) can be 1693 combined as well. 1695 7.1.13 Voice Portal 1697 A voice portal is essentially a complex collection of voice dialogs 1698 used to access interesting content. One of the most desirable call 1699 control features of a Voice Portal is the ability to start a new 1700 outgoing call from within the context of the Portal (to make a 1701 restauraunt reservation, or return a voicemail message for example). 1702 Once the new call is over, the user should be able to return to the 1703 Portal by pressing a special key, using some DTMF sequence (ex: a 1704 very long pound or hash tone), or by speaking a hotword (ex: "Main 1705 Menu"). 1707 In order to accomplish this, the Voice Portal starts with the 1708 following media relationship: 1710 { User , Voice Portal } 1712 The user then asks to make an outgoing call. The Voice Portal asks 1713 the User to perform a Far-Fork. In other words the Voice Portal 1714 wants the following media relationship: 1716 { Target , User } & { User , Voice Portal } 1718 The Voice Portal is now just listening for a hotword or the 1719 appropriate DTMF. As soon as the user indicates they are done, the 1720 Voice Portal Takes the call from the old Target, and we are back to 1721 the original media relationship. 1723 This feature can also be used by the account number and phone number 1724 collection menu in a pre-paid calling service. A user can press a 1725 DTMF sequence which presents them with the appropriate menu again. 1727 8. References 1729 8.1 Normative References 1731 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1732 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 1733 Session Initiation Protocol", RFC 3261, June 2002. 1735 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1736 Levels", BCP 14, RFC 2119, March 1997. 1738 [3] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1739 Session Description Protocol (SDP)", RFC 3264, June 2002. 1741 [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1742 Notification", RFC 3265, June 2002. 1744 [5] Handley, M. and V. Jacobson, "SDP: Session Description 1745 Protocol", RFC 2327, April 1998. 1747 [6] Johnston, A., "Session Initiation Protocol Service Examples", 1748 draft-ietf-sipping-service-examples-09 (work in progress), 1749 July 2005. 1751 [7] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, 1752 "Best Current Practices for Third Party Call Control (3pcc) in 1753 the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, 1754 April 2004. 1756 [8] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1757 Method", RFC 3515, April 2003. 1759 [9] Mahy, R., Biggs, B., and R. Dean, "The Session Initiation 1760 Protocol (SIP) "Replaces" Header", RFC 3891, September 2004. 1762 [10] Mahy, R. and D. Petrie, "The Session Initiation Protocol (SIP) 1763 "Join" Header", RFC 3911, October 2004. 1765 [11] Rosenberg, J., "An INVITE Inititiated Dialog Event Package for 1766 the Session Initiation Protocol (SIP)", 1767 draft-ietf-sipping-dialog-package-06 (work in progress), 1768 April 2005. 1770 [12] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1771 Package for Conference State", 1772 draft-ietf-sipping-conference-package-12 (work in progress), 1773 July 2005. 1775 [13] Rosenberg, J., "A Session Initiation Protocol (SIP) Event 1776 Package for Registrations", RFC 3680, March 2004. 1778 [14] Rosenberg, J., "A Presence Event Package for the Session 1779 Initiation Protocol (SIP)", RFC 3856, August 2004. 1781 [15] Rosenberg, J., "A Framework for Conferencing with the Session 1782 Initiation Protocol", 1783 draft-ietf-sipping-conferencing-framework-05 (work in 1784 progress), May 2005. 1786 [16] Rosenberg, J., "A Framework for Application Interaction in the 1787 Session Initiation Protocol (SIP)", 1788 draft-ietf-sipping-app-interaction-framework-05 (work in 1789 progress), July 2005. 1791 [17] Camarillo, G., "Framework for Transcoding with the Session 1792 Initiation Protocol (SIP)", 1793 draft-ietf-sipping-transc-framework-02 (work in progress), 1794 June 2005. 1796 [18] Sparks, R., "Session Initiation Protocol Call Control - 1797 Transfer", draft-ietf-sipping-cc-transfer-05 (work in 1798 progress), July 2005. 1800 [19] Johnston, A. and O. Levin, "Session Initiation Protocol Call 1801 Control - Conferencing for User Agents", 1802 draft-ietf-sipping-cc-conferencing-07 (work in progress), 1803 June 2005. 1805 [20] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating 1806 User Agent Capabilities in the Session Initiation Protocol 1807 (SIP)", RFC 3840, August 2004. 1809 [21] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1810 Preferences for the Session Initiation Protocol (SIP)", 1811 RFC 3841, August 2004. 1813 8.2 Informational References 1815 [22] Campbell, B. and R. Sparks, "Control of Service Context using 1816 SIP Request-URI", RFC 3087, April 2001. 1818 [23] Mahy, R., "Remote Call Control in SIP using the REFER method 1819 and the session-oriented dialog package", 1820 draft-mahy-sip-remote-cc-01 (work in progress), February 2004. 1822 [24] Burger, E., "Basic Network Media Services with SIP", 1823 draft-burger-sipping-netann-11 (work in progress), 1824 February 2005. 1826 Authors' Addresses 1828 Rohan Mahy 1829 SIP Edge LLC 1831 Email: rohan@ekabal.com 1832 Ben Campbell 1833 Estacado Systems 1835 Email: ben@nostrum.com 1837 Robert Sparks 1838 Estacado Systems 1840 Email: rjsparks@nostrum.com 1842 Jonathan Rosenberg 1843 Cisco Systems 1845 Email: jdrosen@cisco.com 1847 Dan Petrie 1848 SIP EZ 1850 Email: dpetrie@sipez.com 1852 Alan Johnston 1853 MCI 1855 Email: alan.johnston@mci.com 1857 Intellectual Property Statement 1859 The IETF takes no position regarding the validity or scope of any 1860 Intellectual Property Rights or other rights that might be claimed to 1861 pertain to the implementation or use of the technology described in 1862 this document or the extent to which any license under such rights 1863 might or might not be available; nor does it represent that it has 1864 made any independent effort to identify any such rights. Information 1865 on the procedures with respect to rights in RFC documents can be 1866 found in BCP 78 and BCP 79. 1868 Copies of IPR disclosures made to the IETF Secretariat and any 1869 assurances of licenses to be made available, or the result of an 1870 attempt made to obtain a general license or permission for the use of 1871 such proprietary rights by implementers or users of this 1872 specification can be obtained from the IETF on-line IPR repository at 1873 http://www.ietf.org/ipr. 1875 The IETF invites any interested party to bring to its attention any 1876 copyrights, patents or patent applications, or other proprietary 1877 rights that may cover technology that may be required to implement 1878 this standard. Please address the information to the IETF at 1879 ietf-ipr@ietf.org. 1881 Disclaimer of Validity 1883 This document and the information contained herein are provided on an 1884 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1885 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1886 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1887 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1888 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1889 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1891 Copyright Statement 1893 Copyright (C) The Internet Society (2005). This document is subject 1894 to the rights, licenses and restrictions contained in BCP 78, and 1895 except as set forth therein, the authors retain all their rights. 1897 Acknowledgment 1899 Funding for the RFC Editor function is currently provided by the 1900 Internet Society.