idnits 2.17.1 draft-ietf-sipping-cc-framework-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 2 longer pages, the longest (page 39) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1043 has weird spacing: '...with on sip...' == Line 1057 has weird spacing: '... prompt sip:s...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'SLP' is mentioned on line 467, but not defined -- Looks like a reference, but probably isn't: '13' on line 485 == Missing Reference: 'Events' is mentioned on line 558, but not defined == Missing Reference: 'CATS' is mentioned on line 735, but not defined == Missing Reference: 'RTSP' is mentioned on line 766, but not defined == Missing Reference: 'Caller-prefs' is mentioned on line 846, but not defined == Missing Reference: 'CPL' is mentioned on line 1590, but not defined == Unused Reference: 'RTP' is defined on line 1914, but no explicit reference was found in the text == Unused Reference: 'Presence' is defined on line 1939, but no explicit reference was found in the text == Unused Reference: 'GSM' is defined on line 1988, but no explicit reference was found in the text == Unused Reference: 'MPEG2' is defined on line 1990, but no explicit reference was found in the text == Unused Reference: 'G.711' is defined on line 1992, but no explicit reference was found in the text == Unused Reference: 'PHONECTL' is defined on line 2011, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2543 (ref. 'SIP') (Obsoleted by RFC 3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265) -- Possible downref: Non-RFC (?) normative reference: ref. 'REFER' -- Possible downref: Non-RFC (?) normative reference: ref. '3pcc' -- Possible downref: Non-RFC (?) normative reference: ref. 'Replaces' -- Possible downref: Non-RFC (?) normative reference: ref. 'Join' Unexpected reference format, failed extracting the RFC number: [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson , "RTP: A Transport Protocol for Real-Time Applications", Request for Comments (Standards Track)1889, IETF, January 1996 -- Possible downref: Non-RFC (?) normative reference: ref. 'RTP' ** Obsolete normative reference: RFC 2327 (ref. 'SDP') (Obsoleted by RFC 4566) -- Possible downref: Non-RFC (?) normative reference: ref. 'Presence' -- Possible downref: Non-RFC (?) normative reference: ref. 'VoiceXML' -- Possible downref: Non-RFC (?) normative reference: ref. 'GSM' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG2' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'JTAPI' -- Possible downref: Non-RFC (?) normative reference: ref. 'CSTA' -- Possible downref: Non-RFC (?) normative reference: ref. 'PHONECTL' Summary: 9 errors (**), 0 flaws (~~), 17 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING Working Group Mahy/Cisco 3 Internet Draft Campbell/dynamicsoft 4 Document: draft-ietf-sipping-cc-framework-01.txt Johnston/Worldcom 5 June 2002 Petrie/Pingtel 6 Rosenberg/dynamicsoft 7 Expires: December 2002 Sparks/dynamicsoft 9 A Multi-party Application Framework for SIP 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. Internet-Drafts are draft documents valid for a maximum of 20 six months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet- Drafts 22 as reference material or to cite them other than as "work in 23 progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 1 Abstract 31 This document defines a framework and requirements for multi-party 32 applications in SIP. To enable discussion of multi-party 33 applications we define an abstract call model for describing the 34 media relationships required by many of these applications. The 35 model and actions described here are specifically chosen to be 36 independent of the SIP signaling and/or mixing approach chosen to 37 actually setup the media relationships. In addition to its dialog 38 manipulation aspect, this framework includes requirements for 39 communicating related information and events such as conference and 40 session state, and session history. This framework also describes 41 other goals which embody the spirit of SIP applications as used on 42 the Internet. 44 2 Conventions used in this document 46 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 47 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" this 48 document are to be interpreted as described in RFC-2119 [RFC2119]. 50 SIP Multiparty Framework 52 Table of Contents 53 1 Abstract.......................................................1 54 2 Conventions used in this document..............................1 55 3 Motivation and Background......................................4 56 3.1 Goals........................................................4 57 3.2 Example Features............................................28 58 4 Key Concepts...................................................6 59 4.1 "Conversation Space" Model...................................6 60 4.1.1 Comparison with Related Definitions........................7 61 4.2 Signaling Models.............................................7 62 4.3 Mixing Models................................................8 63 4.3.1 (Single) End System Mixing.................................9 64 4.3.2 Centralized Mixing.........................................9 65 4.3.3 Multicast and Multi-unicast conferences...................10 66 4.4 Conveying Information and Events............................11 67 4.5 Componentization and Decomposition..........................13 68 4.5.1 Media Intermediaries......................................13 69 4.5.2 Queue Server..............................................14 70 4.5.3 Parking Place.............................................14 71 4.5.4 Announcements and Voice Dialogs...........................14 72 4.6 Use of URIs.................................................16 73 4.6.1 Naming Users in SIP.......................................17 74 4.6.2 Naming Services with SIP URIs.............................18 75 4.7 Invoker Independence........................................21 76 4.8 Billing issues..............................................21 77 5 Catalog of call control actions and sample features............22 78 5.1 Early Dialog Actions........................................22 79 5.1.1 Remote Answer.............................................22 80 5.1.2 Remote Forward or Put.....................................22 81 5.1.3 Remote Busy or Error Out..................................23 82 5.2 Single Dialog Actions.......................................23 83 5.2.1 Remote Dial...............................................23 84 5.2.2 Remote On and Off Hold....................................23 85 5.2.3 Remote Hangup.............................................23 86 5.3 Multi-dialog actions........................................23 87 5.3.1 Transfer..................................................23 88 5.3.2 Take......................................................24 89 5.3.3 Add.......................................................25 90 5.3.4 Local Join................................................25 91 5.3.5 Insert....................................................26 92 5.3.6 Split.....................................................26 93 5.3.7 Near-fork.................................................26 94 5.3.8 Far fork..................................................27 95 6 Putting it all together.............Error! Bookmark not defined. 96 6.1 Feature Solutions.................Error! Bookmark not defined. 97 6.1.1 Call Park.................................................32 98 6.1.2 Call Pickup...............................................32 99 6.1.3 Music on Hold.............................................33 100 6.1.4 Call Monitoring...........................................33 101 6.1.5 Barge-in..................................................33 102 6.1.6 Intercom..................................................33 103 6.1.7 Speakerphone paging.......................................34 104 6.1.8 Distinctive ring..........................................34 105 SIP Multiparty Framework 107 6.1.9 Voice message screening...................................34 108 6.1.10 Single Line Extension.....................................34 109 6.1.11 Click-to-dial.............................................34 110 6.1.12 Pre-paid calling..........................................35 111 6.1.13 Voice Portal..............................................35 112 7 Security Considerations.......................................27 113 8 References....................................................36 114 9 Acknowledgments...............................................39 115 10 Author's Addresses...........................................39 116 SIP Multiparty Framework 118 3 Motivation and Background 120 The Session Initiation Protocol [SIP] was defined for the 121 initiation, maintenance, and termination of sessions or calls 122 between one or more users. However, despite its origins as a large- 123 scale multiparty conferencing protocol, SIP is used today primarily 124 for point to point calls. This two-party configuration is the focus 125 of the SIP specification and most of its extensions. 127 This document defines a framework and requirements for multi-party 128 applications in SIP. Most multi-party applications manipulate SIP 129 dialogs (also known as call legs) to cause participants in a 130 conversation to perceive specific media relationships. In other 131 protocols that deal with the concept of calls, this manipulation is 132 known as call control. In addition to its dialog manipulation 133 aspect, "call control" also includes communicating information and 134 events related to manipulating calls, including information and 135 events dealing with session state and history, conference state, 136 user state, and even message state. 138 3.1 Goals 139 Based on input from the SIP community, the authors compiled the 140 following set of goals for SIP call control and multiparty 141 applications: 143 - Define Primitives, Not Services. Allow for a handful of robust 144 yet simple mechanisms which can be combined to deliver features and 145 services. Throughout this document we refer to these simple 146 mechanisms as "primitives". Primitives should be sufficiently 147 robust that when they are combined they can be used to build lots of 148 services. However, the goal is not to define a provably complete 149 set of primitives. Note that while the IETF will NOT standardize 150 behavior or services, it may define example services for 151 informational purposes, as in [service examples]. 153 - Participant oriented. The primitives should be designed to 154 provide services which are oriented around the experience of the 155 participants. The authors observe that end users of features and 156 services usually don't care how a media relationship is setup. 157 Their ultimate experience is based only on the resulting media and 158 other externally visible characteristics. 160 - Signaling Model independent: Support both a central control and a 161 peer-to-peer feature invocation model (and combinations of the two). 162 baseline SIP already supports a centralized control model described 163 in [3pcc], and the SIP community has expressed a great deal of 164 interest in peer-to-peer or distributed call control. Some such 165 primitives are already defined in [REFER] and [Replaces]. 167 - Mixing Model independent: The bulk of interesting multiparty 168 applications involve mixing or combining media from multiple 169 participants. This mixing can be performed by one or more of the 170 SIP Multiparty Framework 172 participants, or by a centralized mixing resource. The experience 173 of the participants should not depend on the mixing model used. 174 While most examples in this document refer to audio mixing, the 175 framework applies to any media type. In this context a "mixer" 176 refers to combining media in an appropriate, media-specific way. 178 - Invoker oriented. Only the user who invokes a feature or a service 179 needs to know exactly which service is invoked or why. This is good 180 because it allows new services to be created without requiring new 181 primitives from all the participants; and it allows for much simpler 182 feature authorization policies, for example, when participation 183 spans organizational boundaries. As discussed in section 4.7, this 184 also avoids exponential state explosion when combining features. 185 The invoker only has to manage a user interface or API to prevent 186 local feature interactions. All the other participants simply need 187 to manage the feature interactions of a much smaller number of 188 primitives. 190 - Primitives make full use of URIs. URIs are a very powerful 191 mechanism for describing users and services. They represent a 192 plentiful resource which can be extremely expressive and easily 193 routed, translated, and manipulated--even across organizational 194 boundaries. URIs can contain special parameters and informational 195 headers which need only be relevant to the owner of the namespace 196 (domain) of the URI. Just as a user who selects an http: URL need 197 not understand the significance and organization of the web site it 198 references, a user may encounter a SIP URL which translates into an 199 email-style group alias, which plays a pre-recorded message, or runs 200 some complex call-handling logic. 202 - Make use of SIP headers and SIP event packages to provide SIP 203 entities with information about their environment. These should 204 include information about the status / handling of dialogs on other 205 user agents, information about the history of other contacts 206 attempted prior to the current contact, the status of participants, 207 the status of conferences, user presence information, and the status 208 of messages. 210 - Encourage service decomposition, and design to make use of 211 standard components using well-defined, simple interfaces. Sample 212 components include a SIP mixer, recording service, announcement 213 server, and voice dialog server. (This is not an exhaustive list). 215 - Include authentication, authorization, policy, logging, and 216 accounting mechanisms to allow these primitives to be used safely 217 among mutually untrusted participants. Some of these mechanisms may 218 be used to assist in billing, but no specific billing system will be 219 endorsed. 221 - Permit graceful fallback to baseline SIP. Definitions for new SIP 222 call control extensions/primitives MUST describe a graceful way to 223 fallback to baseline SIP behavior. Support for one primitive MUST 224 NOT imply support for another primitive. 226 SIP Multiparty Framework 228 - There is no desire or goal to reinvent traditional models, such as 229 the model used the [H.450] family of protocols, [JTAPI], or the 230 [CSTA] call model, as these other models do not share the design 231 goals presented in this document. 233 4 Key Concepts 235 4.1 "Conversation Space" Model 237 This document introduces the concept of an abstract "conversation 238 space" (essentially as a set of participants who believe they are 239 all communicating among one another). Each conversation space 240 contains one or more participants. 242 Participants are SIP User Agents which send original media to or 243 terminate and receive media from other members of the conversation 244 space. Logically, every participant in the conversation space has 245 access to all the media generated in that space (this is strictly 246 true if all participants share a common media type). A SIP User 247 Agent which does not contribute or consume any media is NOT a 248 participant; nor is a user agent which merely forwards, transcodes, 249 mixes, or selects media originating elsewhere in the conversation 250 space. [Note that a conversation space consists of zero or more SIP 251 calls or SIP conferences. A conversation space is similar to the 252 definition of a "call" in some other call models.] 254 Participants may represent human users or non-human users (referred 255 to as robots or automatons in this document). Some participants may 256 be hidden within a conversation space. Some examples of hidden 257 participants include: robots which generate tones, images, or 258 announcements during a conference to announce users arriving and 259 departing, a human call center supervisor monitoring a conversation 260 between a trainee and a customer, and robots which record media for 261 training or archival purposes. 263 Participants may also be active or passive. Active participants are 264 expected to be intelligent enough to leave a conversation space when 265 they no longer desire to participate. (An attentive human 266 participant is obviously active.) Some robotic participants (such 267 as a voice messaging system, an instant messaging agent, or a voice 268 dialog system) may be active participants if they can leave the 269 conversation space when there is no human interaction. Other robots 270 (for example our tone generating robot from the previous example) 271 are passive participants. A human participant "on-hold" is passive. 273 An example diagram of a conversation space can be shown as a 274 "bubble" or ovals, or as a "set" in curly or square brace notation. 275 Each set, oval, or "bubble" represents a conversation space. Hidden 276 participants are shown in lowercase letters. 278 { A , B } [ A , B ] 279 SIP Multiparty Framework 281 .-. .---. 282 / \ / \ 283 / A \ / A b \ 284 ( ) ( ) 285 \ B / \ C D / 286 \ / \ / 287 '-' '---' 289 4.1.1 Comparison with Related Definitions 291 In SIP, a call is "an informal term that refers to some 292 communication between peers, generally set up for the purposes of a 293 multimedia conversation." Obviously we cannot discuss normative 294 behavior based on such an intentionally vague definition. The 295 concept of a conversation space is needed because the SIP definition 296 of call is not sufficiently precise for the purpose of describing 297 the user experience of multiparty features. 299 Do any other definitions convey the correct meaning? SIP, and [SDP] 300 both define a conference as "a multimedia session identified by a 301 common session description." A session is defined as "a set of 302 multimedia senders and receivers and the data streams flowing from 303 senders to receivers." Both of these definitions are heavily 304 oriented toward multicast sessions with little differenciation among 305 participants. As such, neither is particularly useful for our 306 purposes. In fact, the definition of "call" in some call models is 307 more similar to our definition of a conversation space. 309 Some examples of the relationship between conversation spaces, SIP 310 call legs, and SIP sessions are listed below. In each example, a 311 human user will perceive that there is a single call. 313 A simple two-party call is a single conversation space, a single 314 session, and a single call-leg. 316 A locally mixed three-way call is two sessions and two call- 317 legs. It is also a single conversation space. 319 A simple dial-in audio conference is a single conversation 320 space, but is represented by as many call-legs and sessions as 321 there are human participants. 323 A multicast conference is a single conversation space, a single 324 session, and as many call-legs as participants. 326 4.2 Signaling Models 328 Obviously to make changes to a conversation space, you must be able 329 to use SIP signaling to cause these changes. Specifically there 330 must be a way to manipulate SIP dialogs (call legs) to move 331 participants into and out of conversation spaces. Although this is 332 not as obvious, there also must be a way to manipulate SIP dialogs 333 to include non-participant user agents which are otherwise involved 334 SIP Multiparty Framework 336 in a conversation space (ex: B2BUAs, 3pcc controllers, mixers, 337 transcoders, translators, or relays). 339 Implementations may setup the media relationships described in the 340 conversation space model using the approach described in [3pcc]. The 341 3pcc approach relies on only the following 3 primitive operations: 343 Create a new call-leg (INVITE) 344 Modify a call-leg (reINVITE) 345 Destroy a call-leg (BYE) 347 The main advantage of the 3pcc approach is that it only requires 348 very basic SIP support from end systems to support call control 349 features. As such, third-party call control is a natural way to 350 handle protocol conversion and mid-call features. It also has the 351 advantage and disadvantage that new features can/must be implemented 352 in one place only (the controller), and neither requires enhanced 353 client functionality, nor takes advantage of it. 355 In addition, a peer-to-peer approach is discussed at length in this 356 draft. The primary drawback of the peer-to-peer model is additional 357 end system complexity. The benefits of the peer-to-peer model 358 include: 359 - state remains at the edges 360 - call signaling need only go through participants involved 361 (there are no additional points of failure) 362 - peers can take advantage of end-to-end message integrity or 363 encryption 364 - setup time is shorter (fewer messages and round trips 365 are required) 367 The peer-to-peer approach relies on additional "primitive" 368 operations, some of which are identified here. 370 Replace an existing dialog 371 Join a new dialog with an existing dialog [Join] 372 Fork a new dialog with an existing dialog 373 Locally do media forking (multi-unicast) 374 Ask another UA to send a request on your behalf 376 Many of the features, primitives, and actions described in this 377 document also require some type of media mixing, combining, or 378 selection as described in the next section. 380 4.3 Mixing Models 382 SIP permits a variety of mixing models, which are discussed here 383 briefly. This topic is discussed more thoroughly in [conf-models]. 384 For brevity, only the two most popular conferencing models are 385 significantly discussed in this document (local and centralized 386 mixing). Applications of the conversation spaces model to multicast 387 and multi-unicast (full unicast mesh) conferences are left as an 388 exercise for the reader. Note that a distributed full mesh 389 conference can be used for basic conferences, but does not easily 390 SIP Multiparty Framework 392 allow for more complex conferencing actions like splitting, joining, 393 and forking. 395 Call control features should be designed to allow a mixer (local or 396 centralized) to decide when to reduce a conference back to a 2-party 397 call, or drop all the participants (for example if only two 398 automatons are communicating). The actual heuristics used to 399 release calls are beyond the scope of this document, but may depend 400 on properties in the conversation space, such as the number of 401 active, passive, or hidden participants; and the send-only, receive- 402 only, or send-and-receive orientation of various participants. 404 4.3.1 (Single) End System Mixing 406 The first model we call "end system mixing". In this model, user A 407 calls user B, and they have a conversation. At some point later, A 408 decides to conference in user C. To do this, A calls C, using a 409 completely separate SIP call. This call uses a different Call-ID, 410 different tags, etc. There is no call set up directly between B and 411 C. No SIP extension or external signaling is needed. A merely 412 decides to locally join two call-legs. 414 B C 415 \ / 416 \ / 417 A 419 A receives media streams from both B and C, and mixes them. A sends 420 a stream containing A's and C's streams to B, and a stream 421 containing A's and B's streams to C. Basically, user A handles both 422 signaling and media mixing. 424 4.3.2 Centralized Mixing 426 In a centralized mixing model, all participants have a pairwise SIP 427 and media relationship with the mixer. Three applications of 428 centralized mixing are also discussed below. 430 [diagram] 432 4.3.2.1 Dial-In Conference Servers 434 Dial-In conference servers closely mirror dial-in conference bridges 435 in the traditional PSTN. A dial-in conference server acts as a 436 normal SIP UA. Users call it, and the server maintains point to 437 point SIP relationships with each user that calls in. The server 438 takes the media from the users who dial into the same conference, 439 mixes them, and sends out the appropriate mixed stream to each 440 participant separately. 442 As in other applications of centralized mixing, the conference is 443 identified by the request URI of the calls from each participant. 444 This provides numerous advantages from a services and routing point 445 of view. For example, one conference on the server might be known as 446 SIP Multiparty Framework 448 sip:conference34@servers.com. All users who call 449 sip:conference34@servers.com are mixed together. Dial-In conference 450 servers are usually associated with pre-arranged conferences. 451 However, the same model applies to ad-hoc conferences. An ad-hoc 452 conference server creates the conference state when the first user 453 joins, and destroys it when the last one leaves. The SIP interface 454 is identical to the pre-arranged case. 456 4.3.2.2 Ad-hoc Centralized Conferences 458 In an ad-hoc centralized conference, two users A and B start with a 459 normal SIP call. At some point later, they decide to add a third 460 party. Instead of using end system mixing, they would prefer to use 461 a central SIP mixer. Initially, A calls B. At some point, B decides 462 to add user C to the call, and begins the transition to a conference 463 server. The first step in this process is the discovery of a 464 conference server that supports ad-hoc conferences. This can be done 465 through static configuration, or through any of a number of standard 466 service discovery protocols, such as the Service Location Protocol 467 [SLP]. Once the server is discovered, a conference ID is chosen. The 468 first participant to send an INVITE to this URL creates the initial 469 conference state in the server. SIP dialogs are manipulated (using 470 any combination of 3pcc or peer-to-peer signaling) so that each 471 participant is sending media to the conference server. It is also 472 possible to transition from a end system mixed conference (even one 473 with a complex connection topology), to a centralized conference 474 server. 476 4.3.2.3 Dial-Out Conferences 478 Dial-out conferences are a simple variation on dial-in conferences. 479 Instead of the users joining the conference by sending an INVITE to 480 the server, the server chooses the users who are to be members of 481 the conference, and then sends them the INVITE. Typically dial out 482 conferences are pre-arranged, with specific start times and an 483 initial group membership list. However, there are other means for 484 the dial-out server to determine the list of participants, including 485 user presence [13]. Once the users accept or reject the call from 486 the dial out server, the behavior of this system is identical to the 487 dial-in server case. 489 4.3.3 Multicast and Multi-unicast conferences 491 In these models, all endpoints send media to all other endpoints. 492 Consequently every endpoint mixes their own media from all the other 493 sources, and sends their own media to every other participant. 495 [diagrams] 497 4.3.3.1 Large-Scale Multicast Conferences 499 Large-scale multicast conferences were the original motivation for 500 both the Session Description Protocol [SDP] and SIP. In a large- 501 scale multicast conference, one or more multicast addresses are 502 SIP Multiparty Framework 504 allocated to the conference. Each participant joins that multicast 505 groups, and sends their media to those groups. Signaling is not sent 506 to the multicast groups. The sole purpose of the signaling is to 507 inform participants of which multicast groups to join. Large-scale 508 multicast conferences are usually pre-arranged, with specific start 509 and stop times. However, multicast conferences do not need to be 510 pre-arranged, so long as a mechanism exists to dynamically obtain a 511 multicast address. 513 4.3.3.2 Centralized Signaling, Distributed Media 515 In this conferencing model, there is a centralized controller, as in 516 the dial-in and dial-out cases. However, the centralized server 517 handles signaling only. The media is still sent directly between 518 participants, using either multicast or multi-unicast. Multi-unicast 519 is when a user sends multiple packets (one for each recipient, 520 addressed to that recipient). This is referred to as a 521 "Decentralized Multipoint Conference" in [H.323]. 523 4.3.3.3 Full Distributed Unicast Conferencing 525 In this conferencing model, each participant has both a pairwise 526 media relationship and a pairwise SIP relationship with every other 527 participant (a full mesh). This model requires a mechanism to 528 maintain a consistent view of distributed state across the group. 529 This is a classic hard problem in computer science. Also, this 530 model does not scale well for large numbers of participants. 531 bascause for participants the number of media and SIP 532 relationships is approximately n-squared. As a result, this model 533 is not generally available in commercial implementations; to the 534 contrary it is primarily the topic of research or experimental 535 implementations. Note that this model assumes peer-to-peer 536 signaling. 538 4.4 Conveying Information and Events 540 Participants should have access to information about the other 541 participants in a conversation space, so that this information can 542 be rendered to a human user or processed by an automaton. Although 543 some of this information may be available from the Request-URI or 544 To, From, Contact, or other SIP headers, another mechanism of 545 reporting this information is necessary. 547 Many applications are driven by knowledge about the progress of 548 calls and conferences. In general these types of events allow for 549 the construction of distributed applications, where the application 550 requires information on dialog and conference state, but is not 551 necessarily co-resident with an endpoint user agent or conference 552 server. For example, a mixer involved in a conversation space may 553 wish to provide URLs for conference status, and/or conference/floor 554 control. 556 SIP Multiparty Framework 558 The SIP [Events] architecture defines general mechanisms for 559 subscription to and notification of events within SIP networks. It 560 introduces the notion of a package which is a specific 561 "instantiation" of the events mechanism for a well-defined set of 562 events. 564 New event packages should be able to 565 provide the status of a user's call-legs (dialogs), provide the 566 status of conferences and its participants, provide user presence 567 information, and provide the status of user's messages. While this 568 is not an exhaustive list, these are sufficient to enable the sample 569 features described in this document. 571 A conference event package allows users to subscribe to information 572 about an entire conference or conversation space. This conference 573 state could be provided by a conference server or mixing component 574 (described in a later section) if centralized mixing is used, or 575 gathered from relevant peers and merged into a cohesive set of 576 state. Notifications would convey information about the 577 pariticipants such as: the SIP URL identifying each user, their 578 status in the space (active, declined, departed), URLs to invoke 579 other features (such as sidebar conversations), links to other 580 relevant information (such as floor control policies), and if floor 581 control policies are in place, the user's floor control status. A 582 dialog event package would provide information about all the dialogs 583 the target user is maintaining, what conversations the user in 584 participating in, and how these are correlated. Concrete proposals 585 for conference events and dialog events are described in [dialog- 586 pkg] and [conf-pkg] respectively. 588 Note that user presence has a close relationship with these two 589 proposed event packages. It is fundamental to the presence model 590 that the information used to obtain user presence is constructed 591 from any number of different input sources. Examples of such sources 592 include SIP REGISTER requests and uploads of presence documents. 593 These two packages can be considered another mechanism that allows a 594 presence agent to determine the presence state of the user. 595 Specifically, a user presence server can act as a subscriber for the 596 call-leg and conference packages to obtain additional information 597 that can be used to construct a presence document. 599 The multi-party architecture should also provide a mechanism to get 600 information about the status /handling of a dialog (for example, 601 information about the history of other contacts attempted prior to 602 the current contact). Finally, the architecture should provide 603 ample opportunities to present informational URIs which relate to 604 calls, conversations, or dialogs in some way. For example, consider 605 the SIP Call-Info header, or Contact headers returned in a 300-class 606 response. Frequently additional information about a call or dialog 607 can be fetched via non-SIP URIs. For example, consider a web page 608 for package tracking when calling a delivery company, or a web page 609 with related documentation when joining a dial-in conference. The 610 use of URIs in the multiparty framework is discussed in more detail 611 in Section 4.6. 613 SIP Multiparty Framework 615 4.5 Componentization and Decomposition 617 This framework proposes a decomposed component architecture with a 618 very loose coupling of services and components. This means that a 619 service (such as a conferencing server or an auto-attendant) need 620 not be implemented as an actual server. Rather, these services can 621 be built by combining a few basic components in straightforward or 622 arbitrarily complex ways. 624 Since the components are easily deployed on separate boxes, by 625 separate vendors, or even with separate providers, we achieve a 626 separation of function that allows each piece to be developed in 627 complete isolation. We can also reuse existing components for new 628 applications. This allows rapid service creation, and the ability 629 for services to be distributed across organizational domains 630 anywhere in the Internet. 632 For many of these components it is also desirable to discover their 633 capabilities, for example querying the ability of a mixer to host a 634 10 dialog conference, or to reserve resources for a specific time. 635 These actions could be provided in the form of URLs, provided there 636 is an a priori means of understanding their semantics. For example 637 if there is a published dictionary of operations, a way to query the 638 service for the available operations and the associated URLs, the 639 URL can be the interface for providing these service operations. 640 This concept is described in more detail in the context of dialog 641 operations in section 4.6 643 4.5.1 Media Intermediaries 645 Media Intermediaries are not participants in any conversation space, 646 although an entity which is also a media translator may also have a 647 colocated participant component (for example a mixer which also 648 announces the arrival of a new participant; the announcement portion 649 is a participant, but the mixer itself is not). Media 650 intermediaries should be as transparent as possible to the end 651 users--offering a useful, fundamental service; without getting in 652 the way of new features implemented by participants. Some common 653 media intermediaries are desribed below. 655 4.5.1.1 Mixer 657 A SIP mixer is a component that combines media from all dialogs in 658 the same conversation in a media specific way. For example, the 659 default combining for an audio conference would be an N-1 660 configuration, while the same mixer might interleave text messages 661 on a per-line basis. 663 Conventions for specifying a mixing or conferencing service in a SIP 664 URI are proposed in [ms-uri]. 666 SIP Multiparty Framework 668 4.5.1.2 Transcoder 670 A transcoder translates media from one encoding or format to another 671 (for example, GSM voice to G.711, MPEG2 to H.261, or text/html to 672 text/plain). 674 4.5.1.3 Media Relay 676 A media relay terminates media and simply forwards it to a new 677 destination without changing the content in any way. Sometimes 678 media relays are used to provide source IP address anonymity, to 679 facilitate middlebox traversal, or to provide a trusted entity where 680 media can be forcefully disconnected. 682 4.5.2 Queue Server 684 A queue server is a location where calls can be entered into one of 685 several FIFO (first-in, first-out) queues. A queue server would 686 subscribe to the presence of groups or individuals who are 687 interested in its queues. When detecting that a user is available 688 to service a queue, the server redirects or transfers the last call 689 in the relevant queue to the available user. On a queue-by-queue 690 basis, authorized users could also subscribe to the call state 691 (dialog information) of calls within a queue. Authorized users 692 could use this information to effectively pluck (take) a call out of 693 the queue (for example by sending an INVITE with a Replaces header 694 to one of the user agents in the queue). 696 4.5.3 Parking Place 698 A parking place is a location where calls can be terminated 699 temporarily and then retrieved later. While a call is "parked", it 700 can receive media "on-hold" such as music, announcements, or 701 advertisements. Such a service could be further decomposed such 702 that announcements or music are handled by a separate component. 704 4.5.4 Announcements and Voice Dialogs 706 An announcement server is a server which can play digitized media 707 (frequently audio), such as music or recorded speech. These servers 708 are typically accessible via SIP, HTTP, or RTSP. An analogous 709 service is a recording service which stores digitized media. A 710 convention for specifying announcements in SIP URIs is described in 711 [ms-uri]. Likewise the same server could easily provide a service 712 which records digitized media. 714 A "voice dialog" is a model of spoken interactive behavior between a 715 human and an automaton which can include synthesized speech, 716 digitized audio, recognition of spoken and DTMF key input, recording 717 of spoken input, and interaction with call control. Dialogs 718 frequently consist of forms or menus. Forms present information and 719 gather input; menus offer choices of what to do next. 721 SIP Multiparty Framework 723 Spoken dialogs are a basic building block of applications which use 724 voice. Consider for example that a voice mail system, the 725 conference-id and passcode collection system for a conferencing 726 system, and complicated voice portal applications all require a 727 voice dialog component. 729 4.5.4.1. Text-to-Speech and Automatic Speech Recognition 731 Text-to-Speech (TTS) is a service which converts text into digitized 732 audio. TTS is frequently integrated into other applications, but 733 when separated as a component, it provides greater opportunity for 734 broad reuse. Various interfaces to access standalone TTS services 735 via HTTP, [CATS], and SIP ([app-components], and [ms-uri]) have been 736 proposed. 738 Automatic Speech Recognition (ASR) is a service which attempts to 739 decipher digitized speech based on a proposed grammar. Like TTS, 740 ASR services can be embedded, or exposed so that many applications 741 can take advantage of such services. Various IP interfaces to ASR, 742 such as CATS, have been proposed. 744 4.5.4.2. VoiceXML 746 [VoiceXML] is a W3C recommendation that was designed to give authors 747 control over the spoken dialog between users and applications. The 748 application and user take turns speaking: the application prompts 749 the user, and the user in turn responds. Its major goal is to bring 750 the advantages of web-based development and content delivery to 751 interactive voice response applications. We believe that VoiceXML 752 represents the ideal partner for SIP in the development of 753 distributed IVR servers. VoiceXML is an XML based scripting language 754 for describing IVR services at an abstract level. VoiceXML supports 755 DTMF recognition, speech recognition, text-to-speech, and playing 756 out of recorded media files. The results of the data collected from 757 the user are passed to a controlling entity through an HTTP POST 758 operation. The controller can then return another script, or 759 terminate the interaction with the IVR server. 761 A VoiceXML server also need not be implemented as a monolithic 762 server. Below is a diagram of a VoiceXML browser which is split 763 into media and non-media handling parts. The VoiceXML interpreter 764 handles SIP dialog state and state within a VoiceXML document, and 765 sends requests to the media component over another protocol (for 766 example [RTSP] or CATS). 768 +-------------+ 769 | | 770 | VoiceXML | 771 | Interpreter | 772 | (signaling) | 773 +-------------+ 774 SIP Multiparty Framework 776 ^ ^ 777 | | 778 SIP | | RTSP 779 | | 780 | | 781 v v 782 +-------------+ +-------------+ 783 | | | | 784 | SIP UA | RTP | RTSP Server | 785 | |<------>| (media) | 786 | | | | 787 +-------------+ +-------------+ 789 Figure : Decomposed VoiceXML Server 791 More details about the integration of SIP with VoiceXML are provided 792 in [sip-vxml] 794 4.6 Use of URIs 796 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 797 contexts: the Request-URI; Contact, To, From, and *-Info headers; 798 application/uri bodies; and embedded in email, web pages, instant 799 messages, and ENUM records. The request-URI identifies the user or 800 service that the call is destined for. 802 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 803 SIP content can also specify methods, special parameters, headers, 804 and even bodies. For example: 806 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 807 &To=;tag=879738 808 &From=;tag=023214 810 sip:bob@babylon.biloxi.com;method=REFER? 811 Refer-To= 813 Throughout this draft we discuss call control primitive operations. 814 One of the biggest problems is defining how these operations may be 815 invoked. There are a number of ways to do this. One way is to 816 define the primitives in the protocol itself such that SIP methods 817 (for example REFER) or SIP headers (for example Replaces) indicate a 818 specific call control action. Another way to invoke call control 819 primitives is to define a specific Request-URI naming convention. 820 Either these conventions must be shared between the client (the 821 invoker) and the server, or published by or on behlf of the server. 822 The former involves defining URL construction techniques (e.g. URL 823 parameters and/or token conventions) as proposed in [ms-uri]. The 824 latter technique usually involves discovering the URI via a SIP 825 event package, a web page, a business card, or an Instant Message. 826 Yet another means to acquire the URLs is to define a dictionary of 827 primitives with well-defined semantics and provide a means to query 828 SIP Multiparty Framework 830 the named primitives and corresponding URLs that may be invoked on 831 the service or dialogs. 833 4.6.1 Naming Users in SIP 835 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 836 that points to a domain with a location server that can map the URI 837 to set of Contact URIs where the user might be available. Typically 838 the Contact URIs are populated via registration. 840 Address of Record Contacts 842 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 843 sip:bbrown@mailbox.provider.net 844 sip:+1.408.555.6789@mobile.net 846 [Caller-prefs] defines a set of additional parameters to the Contact 847 header that define the characteristics of the user agent at the 848 specified URI. For example, there is a mobility parameter which 849 indicates whether the UA is fixed or mobile. When a user agent 850 registers, it places these parameters in the Contact headers to 851 characterize the URIs it is registering. This allows a proxy for 852 that domain to have information about the contact addresses for that 853 user. 855 When a caller sends a request, it can optionally include the Accept- 856 Contact and Reject-Contact headers which request certain handling by 857 the proxy in the target domain. These headers contain preferences 858 that describe the set of desired URIs to which the caller would like 859 their request routed. The proxy in the target domain matches these 860 preferences with the Contact characteristics originally registered 861 by the target user. The target user can also choose to run 862 arbitrarily complex "Find-me" feature logic on a proxy in the target 863 domain. 865 There is a strong asymmetry in how preferences for callers and 866 callees can be presented to the network. While a caller takes an 867 active role by initiating the request, the callee takes a passive 868 role in waiting for requests. This motivates the use of callee- 869 supplied scripts and caller preferences included in the call 870 request. This asymmetry is also reflected in the appropriate 871 relationship between caller and callee preferences. A server for a 872 callee should respect the wishes of the caller to avoid certain 873 locations, while the preferences among locations has to be the 874 callee's choice, as it determines where, for example, the phone 875 rings and whether the callee incurs mobile telephone charges for 876 incoming calls. 878 SIP User Agent implementations are encouraged to make intelligent 879 decisions based on the type of participants (active/passive, hidden, 880 human/robot) in a conversation space. This information is conveyed 881 in a SIP URI parameter and communicated using an appropriate SIP 882 header or event body. For example, a music on hold service may take 883 the sensible approach that if there are two or more unhidden 884 SIP Multiparty Framework 886 participants, it should not provide hold music; or that it will not 887 send hold music to robots. 889 Multiple participants in the same conversation space may represent 890 the same human user. For example, the user may use one participant 891 for video, chat, and whiteboard media on a PC and another for audio 892 media on a SIP phone. In this case, the address-of-record is the 893 same for both user agents, but the Contacts are different. In 894 addition, human users may add robot participants which act on their 895 behalf (for example a call recording service, or a calendar 896 reminder). Call Control features in SIP should continue to function 897 as expected in such an environment. 899 4.6.2 Naming Services with SIP URIs. 901 A critical piece of defining a session level service that can be 902 accessed by SIP is defining the naming of the resources within that 903 service. This point cannot be overstated. 905 In the context of SIP control of application components, we take 906 advantage of the fact that the standard SIP URI has a user part. 907 Most services may be thought of as user automatons that participate 908 in SIP sessions. It naturally follows that the user address, or the 909 left-hand-side of the URI, should be utilized as a service 910 indicator. 912 For example, media servers commonly offer multiple services at a 913 single host address. Use of the user part as a service indicator 914 enables service consumers to direct their requests without 915 ambiguity. It has the added benefit of enabling media services to 916 register their availability with SIP Registrars just as any "real" 917 SIP user would. This maintains consistency and provides enhanced 918 flexibility in the deployment of media services in the network. 920 There has been much discussion about the potential for confusion if 921 media services URIs are not readily distinguishable from other types 922 of SIP UA's. The use of a service namespace provides a mechanism to 923 unambiguously identify standard interfaces while not constraining 924 the development of private or experimental services. 926 In SIP, the request-URI identifies the user or service that the call 927 is destined for. The great advantage of using URIs (specifically, 928 the SIP request URI) as a service identifier comes because of the 929 combination of two facts. First, unlike in the PSTN, where the 930 namespace (dialable telephone numbers) are limited, URIs come from 931 an infinite space. They are plentiful, and they are free. Secondly, 932 the primary function of SIP is call routing through manipulations of 933 the request URI. In the traditional SIP application, this URI 934 represents people. However, the URI can also represent services, as 935 we propose here. This means we can apply the routing services SIP 936 provides to routing of calls to services. The result - the problem 937 of service invocation and service location becomes a routing 938 problem, for which SIP provides a scalable and flexible solution. 940 SIP Multiparty Framework 942 Since there is such a vast namespace of services, we can explicitly 943 name each service in a finely granular way. This allows the 944 distribution of services across the network. 946 Consider a conferencing service, where we have separated the names 947 of ad-hoc conferences from scheduled conferences, we can program 948 proxies to route calls for ad-hoc conferences to one set of servers, 949 and calls for scheduled ones to another, possibly even in a 950 different provider. In fact, since each conference itself is given a 951 URI, we can distribute conferences across servers, and easily 952 guarantee that calls for the same conference always get routed to 953 the same server. This is in stark contrast to conferences in the 954 telephone network, where the equivalent of the URI - the phone 955 number - is scarce. An entire conferencing provider generally has 956 one or two numbers. Conference IDs must be obtained through IVR 957 interactions with the caller, or through a human attendant. This 958 makes it difficult to distribute conferences across servers all over 959 the network, since the PSTN routing only knows about the dialed 960 number. 962 In the case of a dialog server, the voice dialog itself is the 963 target for the call. As such, the request URI should contain the 964 identifier for this spoken dialog. This is consistent with the 965 Request-URI service invocation model of RFC 3087. This URL can be in 966 one of two formats. In the first, the VoiceXML script is identified 967 directly by an HTTP URL. In the second, the script is not specified. 968 Rather, the dialog server uses its configuration to map the incoming 969 request to a specific script. 971 Since the request URI could indicate a request for a variety of 972 different services, of which a dialog server is only one type, this 973 example request URI first begins with a service identifier, that 974 indicates the basic service required. For VoiceXML scripts, this 975 identification information is a URL-encoded version of the URL which 976 references the script to execute, or if not present, the dialog 977 server uses server-specific configuration to determine which script 978 to execute. 980 Examples of URLs that invoke VoiceXML dialogs are: 981 (line folding for clarity only) 983 sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml 984 @vxmlservers.com 986 sip:dialog.vxml@vxmlservers.com 988 The first of these indicates that the dialog server (located at 989 vxmlservers.com) should invoke a VoiceXML script fetched from 990 http://dialogs.server.com/script32.vxml. Since the user part of the 991 SIP URL cannot contain the : character, this must be escaped to %3a. 993 These types of conventions are not limited to application component 994 servers. An ordinary SIP User Agent can have a special URIs as 995 well, for example, one which is automatically answered by a 996 SIP Multiparty Framework 998 speakerphone. Since URIs are so plentiful, using a separate URI for 999 this service does not exhaust a valuable resource. The requested 1000 service is clear to the user agent receiving the request. This URI 1001 can also be included as part of another feature (for example, the 1002 Intercom feature described in Section 6.1.6). This feature can be 1003 specified with a SIP user parameter, since are part of the userpart 1004 of a SIP URI. 1006 Likewise a Request URI can fully describe an announcement service 1007 through the use of the user part of the address and additional URI 1008 parameters. In our example, the user portion of the address, 1009 "annc", specifies the announcement service on the media server. 1010 The two URI parameters "play=" and "early=" specify the audio 1011 resource to play and whether early media is desired. 1013 sip:annc@ms2.carrier.net; 1014 play=http://audio.carrier.net/allcircuitsbusy.au;early=yes 1016 sip:annc@ms2.carrier.net; 1017 play=file://fileserver.carrier.net/geminii/yourHoroscope.wav 1019 In practical applications, it is important that an invoker does not 1020 necessarily apply semantic rules to various URIs it did not create. 1021 Instead, it should allow any arbitrary string to be provisioned, and 1022 map the string to the desired behavior. The administrator of a 1023 service may choose to provision specific conventions or mnemonic 1024 strings, but the application should not require it. In any large 1025 installation, the system owner is likely to have pre-existing rules 1026 for mnemonic URIs, and any attempt by an application to define its 1027 own rules may create a conflict. Implementations should allow an 1028 arbitrary mix of URLs from these schemes, or any other scheme that 1029 renders valid SIP URIs to be provisioned, rather than enforce only 1030 one particular scheme. 1032 For example, a voicemail application can be built using very 1033 different sets of URI conventions, as illustrated below: 1035 URI Identity Example Scheme 1 1036 Example Scheme 2 1037 Example Scheme 3 1039 Deposit with sip:sub-rjs-deposit@vm.wcom.com 1040 standard greeting sip:677283@vm.wcom.com 1041 sip:rjs@vm.wcom.com;mode=deposit 1043 Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com 1044 phone greeting sip:677372@vm.wcom.com 1045 sip:rjs@vm.wcom.com;mode=3991243 1047 Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com 1048 special greeting sip:677384@vm.wcom.com 1049 sip:rjs@vm.wcom.com;mode=sg 1051 SIP Multiparty Framework 1053 Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com 1054 authentication sip:677405@vm.wcom.com 1055 sip:rjs@vm.wcom.com;mode=retrieve 1057 Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com 1058 for PIN in-band sip:677415@vm.wcom.com 1059 sip:rjs@vm.wcom.com;mode=inpin 1061 As we have shown, SIP URIs represent an ideal, flexbile mechanism 1062 for describing and naming service resources, be they queues, 1063 conferences, voice dialogs, announcements, voicemail treatments, or 1064 phone features. 1066 4.7 Invoker Independence 1068 Only the invoker of features in SIP need to know exactly which 1069 feature they are invoking. One of the primary benefits of this 1070 approach is that combinations of features should work in SIP call 1071 control. For example, let us examine the combination of a 1072 "transfer" of a call which is "conferenced". 1074 Alice calls Bob. Alice silently "conferences in" her robotic 1075 assistant Albert as a hidden party. Bob transfers Alice to Carol. 1076 If Bob asks Alice to Replace her leg with a new one to Carol then 1077 both Alice and Albert should be communicating with Carol 1078 (transparently). 1080 Using the peer-to-peer model, this combination of features works 1081 fine if A is doing local mixing (Alice replaces Bob's call-leg with 1082 Carol's), or if A is using a central mixer (the mixer replaces Bob's 1083 call leg with Carol's). A clever implementation using the 3pcc 1084 model can generate similar results. 1086 New extensions to the SIP Call Control Framework should attempt to 1087 preserve this property. 1089 4.8 Billing issues 1091 Billing in the PSTN is typically based on who initiated a call. At 1092 the moment billing in a SIP network is neither consistent with 1093 itself, nor with the PSTN. (A billing model for SIP should allow 1094 for both PSTN-style billing, and non-PSTN billing.) The example 1095 below demonstrates one such inconsistency. 1097 Alice places a call to Bob. Alice then blind transfers Bob to Carol 1098 through a PSTN gateway. In current usage of REFER and BYE/Also, Bob 1099 may be billed for a call he did not initiate (his UA originated the 1100 outgoing call leg however). This is not necessarily a terrible 1101 thing, but it demonstrates a security concern (Bob must have 1102 appropriate local policy to prevent fraud). Also, Alice may wish to 1103 pay for Bob's session with Carol. There should be a way to signal 1104 this in SIP. 1106 SIP Multiparty Framework 1108 Likewise a Replacement call may maintain the same billing 1109 relationship as a Replaced call, so if Alice first calls Carol, then 1110 asks Bob to Replace this call, Alice may continue to receive a bill. 1112 Further work in SIP billing should define a way to set or discover 1113 the direction of billing. 1115 5 Catalog of call control actions and sample features 1117 Call control actions can be categorized by the dialogs upon which 1118 they operate. The actions may involve a single or multiple dialogs. 1119 These dialogs can be early or established. Multiple dialogs may be 1120 related in a conversation space to form a conference or other 1121 interesting media topologies. 1123 It should be noted that it is desirable to provide a means by which 1124 a party can discover the actions which may be performed on a dialog. 1125 The interested party may be independent or related to the dialogs. 1126 One means of accomplishing this is through the ability to define and 1127 obtain URLs for these actions as described in section 4.6. 1129 Below are listed several call control "actions" which establish or 1130 modify dialogs and relate the participants in a conversation space. 1131 The names of the actions listed are for descriptive purposes only 1132 (they are not normative). This list of actions is not meant to be 1133 exhaustive. 1135 In the examples, all actions are initiated by the user "Alice" 1136 represented by UA "A". 1138 5.1 Early Dialog Actions 1140 The following are a set of actions that may be performed on a single 1141 early dialog. These actions can be thought of as a set of remote 1142 control operations. For example an automaton might perform the 1143 operation on behalf of a user. Alternatively a user might use the 1144 remote control in the form of an application to perform the action 1145 on the early dialog of a UA which may be out of reach. All of these 1146 actions correspond to telling the UA how to respond to a request to 1147 establish an early dialog. These actions provide useful 1148 functionality for PDA, PC and server based applications which desire 1149 the ability to control a UA. 1151 5.1.1 Remote Answer 1153 A dialog is in some early dialog state such as 180 Ringing. It may 1154 be desirable to tell the UA to answer the dialog. That is tell it 1155 to send a 200 Ok response to establish the dialog. 1157 5.1.2 Remote Forward or Put 1158 SIP Multiparty Framework 1160 It may be desirable to tell the UA to respond with a 3xx class 1161 response to forward an early dialog to another UA. 1163 5.1.3 Remote Busy or Error Out 1165 It may be desirable to instruct the UA to send an error response 1166 such as 486 Busy Here. 1168 5.2 Single Dialog Actions 1170 There is another useful set of actions which operate on a single 1171 established dialog. These operations are useful in building 1172 productivity applications for aiding users to control their phone. 1173 For example a CRM application which sets up calls for a user 1174 eliminating the need for the user to actually enter an address. 1175 These operations can also be thought of a remote control actions. 1177 5.2.1 Remote Dial 1179 This action instructs the UA to initiate a dialog. This action can 1180 be performed using the REFER method. 1182 5.2.2 Remote On and Off Hold 1184 This action instructs the UA to put an established dialog on hold. 1185 Though this operation can be conceptually be performed with the 1186 REFER method, there is no semantics defined as to what the referred 1187 party should do with the SDP. There is no way to distinguish between 1188 the desire to go on or off hold. 1190 5.2.3 Remote Hangup 1192 This action instructs the UA to terminate an early or established 1193 dialog. A REFER request with the following Refer-To URI performs 1194 this action. Note: this URL is not properly escaped. 1196 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 1197 &To=;tag=879738 1198 &From=;tag=023214 1200 5.3 Multi-dialog actions 1202 These actions apply to a set of related dialogs. 1204 5.3.1 Transfer 1206 The conversation space changes as follows: 1208 before after 1209 { A , B } --> { C , B } 1211 A replaces itself with C. 1213 SIP Multiparty Framework 1215 To make this happen using the peer-to-peer approach, "A" would send 1216 two SIP requests. A shorthand for those requests is shown below: 1217 REFER B Refer-To:C 1218 BYE B 1220 To make this happen instead using the 3pcc approach, the controller 1221 sends requests represented by the shorthand below: 1222 INVITE C (w/SDP of B) 1223 reINVITE B (w/SDP of C) 1224 BYE A 1226 Features enabled by this action: 1227 - blind transfer 1228 - transfer to a central mixer (some type of conference or forking) 1229 - transfer to park server (park) 1230 - transfer to music on hold or announcement server 1231 - transfer to a "queue" 1232 - transfer to a service (such as Voice Dialogs service) 1233 - transition from local mixer to central mixer 1235 5.3.2 Take 1237 The conversation space changes as follows: 1239 { B , C } --> { B , A } 1241 A forcibly replaces C with itself. In most uses of this primitive, 1242 A is just "un-replacing" itself. 1244 Using the peer-to-peer approach, "A" sends: 1245 INVITE B Replaces: 1247 Using the 3pcc approach (all requests sent from controller) 1248 INVITE A (w/SDP of B) 1249 reINVITE B (w/SDP of A) 1250 BYE C 1252 Features enabled by this action: 1253 - transferee completes an attended transfer 1254 - retrieve from central mixer (not recommended) 1255 - retrieve from music on hold or park 1256 - retrieve from queue 1257 - call center take 1258 - voice portal resuming ownership of a call it originated 1259 - answering-machine style screening (pickup) 1260 - pickup of a ringing call (i.e. early dialog) 1262 Note: that pick up of a ringing call has perhaps some interesting 1263 additional requirements. First of all it is an early dialog as 1264 opposed to an established dialog. Secondly the party which is to 1265 pickup the call may only wish to do so only while it is an early 1266 dialog. That is in the race condition where the ringing UA accepts 1267 just before it receives signaling from the party wishing to take the 1268 SIP Multiparty Framework 1270 call, the taking party wishes to yield or cancel the take. The goal 1271 is to avoid yanking an answered call from the called party. 1273 5.3.3 Add 1275 The conversation space changes as follows: 1277 { A , B } --> { A, B, C } 1279 A adds C to the conversation. 1281 Using the peer-to-peer approach, adding a party using local mixing 1282 requires no signaling. To transition from a 2-party call or a 1283 locally mixed conference to centrally mixing A could send the 1284 following requests: 1285 REFER B Refer-To: mixer 1286 INVITE mixer 1287 BYE B 1289 To add a party to a central mixer: 1290 REFER C Refer-To: mixer 1291 or 1292 REFER mixer Refer-To: C 1294 Using the 3pcc approach to transition to centrally mixed, the 1295 controller would send: 1296 INVITE mixer leg 1 (w/SDP of A) 1297 INVITE mixer leg 2 (w/SDP of B) 1298 INVITE C (late SDP) 1299 reINVITE A (w/SDP of mixer leg 1) 1300 reINVITE B (w/SDP of mixer leg 2) 1301 INVITE mixer leg3 (w/SDP of C) 1303 To add a party to a central mixer: 1304 INVITE C (late SDP) 1305 INVITE mixer (w/SDP of C) 1307 Features enabled: 1308 - standard conference feature 1309 - call recording 1310 - answering-machine style screening (screening) 1312 5.3.4 Local Join 1314 The conversation space changes like this: 1316 { A, B} , {A, C} --> {A, B, C} 1318 or like this 1320 { A, B} , {C, D} --> {A, B, C, D} 1322 A takes two conversation spaces and joins them together into a 1323 single space. 1325 SIP Multiparty Framework 1327 Using the peer-to-peer approach, A can mix locally, or REFER the 1328 participants of both conversation spaces to the same central mixer 1329 (as in 5.3) 1331 For the 3pcc approach, the call flows for inserting participants, 1332 and joining and splitting conversation spaces are tedious yet 1333 straightforward, so these are left as an exercise for the reader. 1335 Features enabled: 1336 - standard conference feature 1337 - leaving a sidebar to rejoin a larger conference 1339 5.3.5 Insert 1341 The conversation space changes like this: 1343 { B , C } --> {A, B, C } 1345 A inserts itself into a conversation space. 1347 A proposed mechanism for signaling this using the peer-to-peer 1348 approach is to send a new header in an INVITE with "joining" 1349 semantics. For example: 1350 INVITE B Join: 1352 If B accepted the INVITE, B would accept responsibility to setup the 1353 call legs and mixing necessary (for example: to mix locally or to 1354 transfer the participants to a central mixer) 1356 Features enabled: 1357 - barge-in 1358 - call center monitoring 1359 - call recording 1361 5.3.6 Split 1362 { A, B, C, D } --> { A, B } , { C, D } 1364 If using a central mixer with peer-to-peer 1365 REFER C Refer-To: mixer (new URI) 1366 REFER D Refer-To: mixer (new URI) 1367 BYE C 1368 BYE D 1370 Features enabled: 1371 - sidebar conversations during a larger conference 1373 5.3.7 Near-fork 1375 A participates in two conversation spaces simultaneously: 1377 { A, B } --> { B , A } & { A , C } 1378 SIP Multiparty Framework 1380 A is a participant in two conversation spaces such that A sends the 1381 same media to both spaces, and renders media from both spaces, 1382 presumably by mixing or rendering the media from both. We can 1383 define that A is the "anchor" point for both forks, each of which is 1384 a separate conversation space. 1386 This action is purely local implementation (it requires no special 1387 signaling). Local features such as switching calls between the 1388 background and foreground are possible using this media 1389 relationship. 1391 5.3.8 Far fork 1393 The conversation space diagram... 1395 { A, B } --> { A , B } & { B , C } 1397 A requests B to be the "anchor" of two conversation spaces. 1399 For an example of using 3pcc to setup media forking, see [Media 1400 forking]. The session descriptions for forking are quite complex. 1401 Controllers should verify that endpoints can handle forked-media, by 1402 using some type of Requires header token. 1404 Two ways to setup this media relationship using peer-to-peer call 1405 control have been proposed: 1406 - the anchor receives a REFER with requires forked-media (implicit) 1407 - the anchor receives an INVITE with an explicit header (explicit) 1409 Features enabled: 1410 - barge-in 1411 - voice portal services 1412 - whisper 1413 - hotword detection 1414 - sending DTMF somewhere else 1416 6 Security Considerations 1418 Call Control primitives provide a powerful set of features that can 1419 be dangerous in the hands of an attacker. To complicate matters, 1420 call control primitives are likely to be automatically authorized 1421 without direct human oversight. 1423 The class of attacks which are possible using these tools include 1424 the ability to eavesdrop on calls, disconnect calls, redirect calls, 1425 render irritating content (including ringing) at a user agent, cause 1426 an action that has billing consequences, subvert billing (theft-of- 1427 service), and obtain private information. Call control extensions 1428 must take extra care to describe how these attacks will be 1429 prevented. 1431 SIP Multiparty Framework 1433 We can also make some general observations about authorization and 1434 trust with respect to call control. The security model is 1435 dramatically dependent on the signaling model chosen (see section 1436 4.2) 1438 Let us first examine the security model used in the 3pcc approach. 1439 All signaling goes through the controller, which is a trusted 1440 entity. Traditional SIP authentication and hop-by-hop encrpytion 1441 and message integrity work fine in this environment, but end-to-end 1442 encrpytion and message integrity may not be possible. 1444 When using the peer-to-peer approach, call control actions and 1445 primitives can be legitimately initiated by a) an existing 1446 participant in the conversation space, b) a former participant in 1447 the conversation space, or c) an entity trusted by one of the 1448 participants. For example, a participant always initiates a 1449 transfer; a retrieve from Park (a take) is initiated on behalf of a 1450 former participant; and a barge-in (insert or far-fork) is initiated 1451 by a trusted entity (an operator for example). 1453 Authenticating requests by an existing participant or a trusted 1454 entity can be done with baseline SIP mechanisms. In the case of 1455 features initiated by a former participant, these should be 1456 protected against replay attacks by using a unique name or 1457 identifier per invocation. The Replaces header exhibits this 1458 behavior as a by-product of its operation (once a Replaces operation 1459 is successful, the call-leg being Replaced no longer exists). For 1460 other requests, a "one-time" Request-URI may be provided to the 1461 feature invoker. 1463 To authorize call control primitives that trigger special behavior 1464 (such as an INVITE with Replace, Join, or Fork semantics), the 1465 receiving user agent may have trouble finding appropriate 1466 credentials with which to challenge or authorize the request, as the 1467 sender may be completely unknown to the receiver, except through the 1468 introduction of a third party. These credentials need to be passed 1469 transitively in some way or fetched in an event body, for example. 1471 7 Appendix A: Example Features 1473 Primitives are defined in terms of their ability to provide 1474 features. These example features should require an amply robust set 1475 of services to demonstrate a useful set of primitives. They are 1476 described here briefly. Note that the descriptions of these features 1477 are non-normative. Some of these features are used as examples in 1478 section 6 to demonstrate how some features may require certain media 1479 relationships. Note also that this document describes a mixture of 1480 both features originating in the world of telephones, and features 1481 which are clearly Internet oriented. 1483 7.1 Example Feature Definitions: 1485 SIP Multiparty Framework 1487 Call Waiting - Alice is in a call, then receives another call. 1488 Alice can place the first call on hold, and talk with the other 1489 caller. She can typically switch back and forth between the 1490 callers. 1492 Blind Transfer - Alice is in a conversation with Bob. Alice asks 1493 Bob to contact Carol, but makes no attempt to contact Craol 1494 independently. In many implementations, Alice does not verify Bob's 1495 success or failure in contacting Carol. 1497 Attended Transfer - The transferring party establishes a session 1498 with the transfer target before completing the transfer. 1500 Consultative transfer - the transferring party establishes a session 1501 with the target and mixes both sessions together so that all three 1502 parties can participate, then disconnects leaving the transferee and 1503 transfer target with an active session. 1505 Conference Call - Three or more active, visible participants in the 1506 same conversation space. 1508 Call Park - A call participant parks a call (essentially puts the 1509 call on hold), and then retrieves it at a later time (typically from 1510 another location). 1512 Call Pickup - A party picks up a call that was ringing at another 1513 location. One variation allows the caller to choose which location, 1514 another variation just picks up any call in that user's "pickup 1515 group". 1517 Music on Hold - When Alice places a call with Bob on hold, it 1518 replaces its audio with streaming content such as music, 1519 announcements, or advertisements. 1521 Call Monitoring - A call center supervisor joins an in-progress call 1522 for monitoring purposes. 1524 Barge-in - Carol interrupts Alice who has a call in-progress call 1525 with Bob. In some variations, Alice forcibly joins a new 1526 conversation with Carol, in other variations, all three parties are 1527 placed in the same conversation (basically a 3-way conference). 1529 Hotline - Alice picks up a phone and is immediately connected to the 1530 technical support hotline, for example. 1532 Autoanswer - Calls to a certain address or location answer 1533 immediately via a speakerphone. 1535 Intercom - Alice typically presses a button on a phone which 1536 immediately connects to another user or phone and casues that phone 1537 to play her voice over its speaker. Some variations immediately 1538 setup two-way communications, other variations require another 1539 button to be pressed to enable a two-way conversation. 1541 SIP Multiparty Framework 1543 Speakerphone paging - Alice calls the paging address and speaks. 1544 Her voice is played on the speaker of every idle phone in a 1545 preconfigured group of phones. 1547 Speed dial - Alice dials an abbreviated number, or enters an alias, 1548 or presses a special speed dial button representing Bob. Her action 1549 is interpreted as if she specified the full address of Bob. 1551 Call Return - Alice calls Bob. Bob misses the call or is 1552 disconnected before he is finished talking to Alice. Bob invokes 1553 Call return which calls Alice, even if Alice did not provide her 1554 real identity or location to Bob. 1556 Inbound Call Screening - Alice doesn't want to receive calls from 1557 Matt. Inbound Screening prevents Matt from disturbing Alice. In 1558 some variations this works even if Matt hides his identity. 1560 Outbound Call Screening - Alice is paged and unknowingly calls a 1561 PSTN pay-service telephone number in the Carribean, but local policy 1562 blocks her call, and possibly informs her why. 1564 Call Forwarding - Before a call-leg is accepted it is redirected to 1565 another location, for example, because the originally intended 1566 recipient is busy, does not answer, is disconnected from the 1567 network, configured all requests to go soemwhere else. 1569 Message Waiting - Bob calls Alice when she steps away from her 1570 phone, when she returns a visible or audible indicator conveys that 1571 someone has left her a voicemail message. The message waiting 1572 indication may also convey how many messages are waiting, from whom, 1573 what time, and other useful pieces of information. 1575 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 1576 her either ring briefly or not at all and are forwarded elsewhere. 1577 Some variations allow specially authorized callers to override this 1578 feature and ring Alice anyway. 1580 Distinctive ring - Incoming calls have different ring cadences or 1581 sample sounds depending on the From party, the To party, or other 1582 factors. 1584 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 1585 like Bob to call her automatically when he is available. When Bob 1586 hangs up, alice's phone rings. When Alice answers, Bob's phone 1587 rings. Bob answers and they talk. 1589 Find-Me - Alice sets up complicated rules for how she can be reached 1590 (possibly using [CPL], [presence] or other factors). When Bob calls 1591 Alice, his call is eventually routed to a temporary Contact where 1592 Alice happens to be available. 1594 Whispered call waiting - Alice is in a conversation with Bob. Carol 1595 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 1596 SIP Multiparty Framework 1598 get lunch in 15 minutes?"), or an automaton whispers to Alice 1599 informing her that Carol is trying to reach her. 1601 Voice message screening - Bob calls Alice. Alice is screening her 1602 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 1603 leave his message. If she decides to talk to Bob, she can take the 1604 call back from the voicemail system, otherwise she can let Bob leave 1605 a message. This emulates the behavior of a home telephone answering 1606 machine 1608 Presence-Enabled Conferencing: Alice wants to set up a conference 1609 call with Bob and Cathy when they all happen to be available (rather 1610 than scheduling a predefined time). The server providing the 1611 application monitors their status, and calls all three when they are 1612 all "online", not idle, and not in another call. 1614 IM Conference Alerts: A user receives an notification as an Instant 1615 Message whenever someone joins a conference they are also in. 1617 Single Line Extension -- A group of phones are all treated as 1618 "extensions" of a single line. A call for one rings them all. As 1619 soon as one answers, the others stop ringing. If any extension is 1620 actively in a coversation, another extension can "pick up" and 1621 immediately join the conversation. This emulates the behavior of a 1622 home telephone line with multiple phones. 1624 Click-to-dial - Alice looks in her company directory for Bob. When 1625 she finds Bob, she clicks on a URL to call him. Her phone rings (or 1626 possibly answers automatically), and when she answers, Bob's phone 1627 rings. 1629 Pre-paid calling - Alice pays for a certain currency or unit amount 1630 of calling value. When she places a call, she provides her account 1631 number somehow. If her account runs out of calling value during a 1632 call her call is disconnected or redirected to a service where she 1633 can purchase more calling value. 1635 Voice Portal - A service that allows users to access a portal site 1636 using spoken dialog interaction. For example, Alice needs to 1637 schedule a working dinner with her co-worker Carol. Alice uses a 1638 voice portal to check Carol's flight schedule, find a restauraunt 1639 near her hotel, make a reservation, get directions there, and page 1640 Carol with this information. 1642 7.2 Implementation of these features 1644 Example Features: 1645 Call Hold [Offer/Answer] for SIP 1646 Call Waiting Local Implementation 1647 Blind Transfer [cc-transfer] 1648 Attended Transfer [cc-transfer] 1649 Consultative transfer [cc-transfer] 1650 Conference Call [conf-models] 1651 SIP Multiparty Framework 1653 Call Park *[examples] 1654 Call Pickup *[examples] 1655 Music on Hold *[examples] 1656 Call Monitoring *Insert 1657 Barge-in *Insert or Far-Fork 1658 Hotline Local Implementation 1659 Autoanswer Local URI convention 1660 Speed dial Local Implementation 1661 Intercom *Speed dial + autoanswer 1662 Speakerphone paging *Speed dial + autoanswer 1663 Call Return Proxy feature 1664 Inbound Call Screening Proxy or Local implementation 1665 Outbound Call Screening Proxy feature 1666 Call Forwarding Proxy or Local implementation 1667 Message Waiting [msg-waiting] 1668 Do Not Disturb [presence] 1669 Distinctive ring *Proxy or Local implementation 1670 Automatic Callback 2 person presence-based conference 1671 Find-Me Proxy service based on presence 1672 Whispered call waiting Local implementation 1673 Voice message screening * 1674 Presence-based Conferencing*call when presence = available 1675 IM Conference Alerts subscribe to conference status 1676 Single Line Extension * 1677 Click-to-dial * 1678 Pre-paid calling * 1679 Voice Portal * 1681 7.2.1 Call Park 1683 Call park requires the ability to: put a dialog some place, 1684 advertise it to users in a pickup group and to uniquely identify it 1685 in a means that can be communicated (including human voice). The 1686 dialog can be held locally on the UA parking the dialog or 1687 alternatively transferred to the park service for the pickup group. 1688 The parked dialog then needs to be labeled (e.g. orbit 12) in a way 1689 that can be communicated to the party that is to pick up the call. 1690 The UAs in the pick up group discovers the parked dialog(s) via 1691 [call-leg] from the park service. If the dialog is parked locally 1692 the park service merely aggregates the parked call states from the 1693 set of UAs in the pickup up group. 1695 7.2.2 Call Pickup 1697 There are two different features which are called call pickup. The 1698 first is the pickup of a parked dialog. The UA from which the 1699 dialog is to be picked up subscribes to the call state [call-leg] of 1700 the park service or the UA which has locally parked the dialog. 1701 Dialogs which are parked should be labeled with an identifier. The 1702 labels are used by the UA to allow the user to indicate which dialog 1703 is to be picked up. The UA picking up the call invoked the URL in 1704 the call state which is labeled as replace-remote. 1706 SIP Multiparty Framework 1708 The other call pickup feature involves picking up an early dialog 1709 (typically ringing). This feature uses some of the same primitives 1710 as the pick up of a parked call. The call state of the UA ringing 1711 phone is advertised using [call-leg]. The UA which is to pickup the 1712 early dialog subscribes either directly to the ringing UA or to a 1713 service aggregating the states for UAs in the pickup group. The 1714 call state identifies early dialogs. The UA uses the call state(s) 1715 to help the user choose which early dialog that is to be picked up. 1716 The UA then invokes the URL in the call state labeled as replace- 1717 remote. 1719 7.2.3 Music on Hold 1721 Music on hold can be implemented a number of ways. One way is to 1722 transfer the held call to a holding service. When the UA wishes to 1723 take the call off hold it basically performs a take on the call from 1724 the holding service. This involves subscribing to call state on the 1725 holding service and then invoking the URL in the call state labeled 1726 as replace-remote. 1728 Alternatively music on hold can be performed as a local mixing 1729 operation. The UA holding the call can mix in the music from the 1730 music service via RTP (i.e. an additional dialog) or RTSP or other 1731 streaming media source. This approach is simpler (i.e. the held 1732 dialog does not move so there is less chance of loosing them) from a 1733 protocol perspective, however it does use more LAN bandwidth and 1734 resources on the UA. 1736 7.2.4 Call Monitoring 1738 Call monitoring is a [join] operation. The monitoring UA sends a 1739 Join to the dialog it wants to listen to. It is able to discover 1740 the dialog via the call state [call-leg] on the monitored UA. The 1741 monitoring UA sends SDP in the INVITE which indicates receive only 1742 media {offer/answer]. IN addition the monitoring UA should indicate 1743 that it wants to receive a mix (see Error! Reference source not 1744 found.). As the UA is monitoring only it does not matter whether 1745 the UA indicates it wishes the send stream be mix or point to point. 1747 7.2.5 Barge-in 1749 Barge-in works the same as call monitoring except that it must 1750 indicate that the send media stream to be mixed so that all of the 1751 other parties can hear the stream from UA barging in. 1753 7.2.6 Intercom 1755 The UA initiates a dialog using INVITE in the ordinary way [bis]. 1756 The calling UA then signals the paged UA to answer the call. The 1757 calling UA may discover the URL to answer the call via the call 1758 state [call-leg] of the called UA. The called UA accepts the INVITE 1759 with a 200 Ok and automatically enables the speakerphone. 1761 SIP Multiparty Framework 1763 Alternatively this can be a local decision for the UA to answer 1764 based upon called party identification. 1766 7.2.7 Speakerphone paging 1768 Speakerphone paging can be implemented using either multicast or 1769 through a simple multipoint mixer. In the multicast solution the 1770 paging UA sends a multicast INVITE [bis] with send only media in the 1771 [SDP] (see also [offer/answer]). The automatic answer and enabling 1772 of the speakerphone is a locally configured decision on the paged 1773 UAs. The paging UA sends RTP via the multicast address indicated in 1774 the SDP. 1776 The multipoint solution is accomplished by sending an INVITE to the 1777 multipoint mixer. The mixer is configured to automatically answer 1778 the dialog. The paging UA then sends [REFER] requests for each of 1779 the UAs that are to become paging speakers (The UA is likely to send 1780 out a single REFER which is parallel forked by the proxy server). 1781 The UAs performing as paging speakers are configured to 1782 automatically answer based upon caller identification (e.g. To 1783 field, URI or Referred-To headers). 1785 7.2.8 Distinctive ring 1787 The target UA either makes a local decision based on information in 1788 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1789 Alert-Info header provded by the caller or inserted by a trusted 1790 proxy. In the latter case, the UA fetches the content described in 1791 the URI (typically via http) and renders it to the user. 1793 7.2.9 Voice message screening 1795 At first, this is the same as call monitoring. In this case the 1796 voicemail service is one of the UAs. The UA screening the message 1797 monitors the call on the voicemail service, and also subscribes to 1798 call-leg information. If the user screening their messages decides 1799 to answer, they perform a Take from the voicemail system (for 1800 example, send an INVITE with Replaces to the UA leaving the message) 1802 7.2.10 Single Line Extension 1804 Incoming calls ring all the extensions through basic parallel 1805 forking [bis]. Each extension subscribes to call-leg events from 1806 each other extension. While one user has an active call, any other 1807 UA extension can insert itself into that conversation (it already 1808 knows the call-leg information)in the same way as barge-in. 1810 7.2.11 Click-to-dial 1812 The application or server which hosts the click-to-dial application 1813 captures the URL to be dialed and can setup the call using 3pcc or 1814 can send a [REFER] request to the UA which is to dial the address. 1815 As users sometimes change their mind or wish to give up listing to a 1816 SIP Multiparty Framework 1818 ringing or voicemail answered phone, this application illustrates 1819 the need to also have the ability to remotely hangup a call. 1821 7.2.12 Pre-paid calling 1823 For prepaid calling, the user's media always passes through a device 1824 which is trusted by the pre-paid provider. This may be the other 1825 endpoint (for example a PSTN gateway). In either case, an 1826 intermediary proxy or B2BUA can periodically verify the amount of 1827 time available on the pre-paid account, and use the session-timer 1828 extension to cause the trusted endpoint (gateway) or intermediary 1829 (media relay) to send a reINVITE before that time runs out. During 1830 the reINVITE, the SIP intermediary can reverify the account and 1831 insert another session-timer header. 1833 Note that while most pre-paid systems on the PSTN use an IVR to 1834 collect the account number and destination, this isn't strictly 1835 necessary for a SIP-originated prepaid call. SIP requests and SIP 1836 URIs are sufficiently expressive to convey the final destination, 1837 the provider of the prepaid service, the location from which the 1838 user is calling, and the prepaid account they want to use. If a 1839 pre-paid IVR is used, the mechanism described below (Voice Portals) 1840 can be combined as well. 1842 7.2.13 Voice Portal 1844 A voice portal is essentially a complex collection of voice dialogs 1845 used to access interesting content. One of the most desirable call 1846 control features of a Voice Portal is the ability to start a new 1847 outgoing call from within the context of the Portal (to make a 1848 restauraunt reservation, or return a voicemail message for example). 1849 Once the new call is over, the user should be able to return to the 1850 Portal by pressing a special key, using some DTMF sequence (ex: a 1851 very long pound or hash tone), or by speaking a hotword (ex: "Main 1852 Menu"). 1854 In order to accomplish this, the Voice Portal starts with the 1855 following media relationship: 1857 { User , Voice Portal } 1859 The user then asks to make an outgoing call. The Voice Portal asks 1860 the User to perform a Far-Fork. In other words the Voice Portal 1861 wants the following media relationship: 1863 { Target , User } & { User , Voice Portal } 1865 The Voice Portal is now just listening for a hotword or the 1866 appropriate DTMF. As soon as the user indicates they are done, the 1867 Voice Portal Takes the call from the old Target, and we are back to 1868 the original media relationship. 1870 SIP Multiparty Framework 1872 This feature can also be used by the account number and phone number 1873 collection menu in a pre-paid calling service. A user can press a 1874 DTMF sequence which presents them with the a 1876 8 References 1878 [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session 1879 Initiation Protocol", RFC2543, Internet Engineering Task Force, 1880 Nov 1998. 1882 [RFC2119] S. Bradner, "Key words for use in RFCs to indicate 1883 requirement levels," Request for Comments (Best Current 1884 Practice) 2119, Internet Engineering Task Force, Mar. 1997. 1886 [REFER] R. Sparks, "The Refer Method", Internet Draft , IETF, October 30, 2001, Work in progress. 1889 [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo, 1890 "Third Party Call Control in SIP", Internet Draft , IETF; March 2001. Work in progress 1893 [transfer] R. Sparks, "SIP Call Control - Transfer", Internet Draft 1894 , IETF; Feb. 2001. Work in 1895 progress. 1897 [Replaces] B. Biggs, R. Dean, R. Mahy, "The SIP Replaces Header", 1898 Internet Draft , IETF, Nov. 2001. 1899 Work in progress. 1901 [conf-models] J. Rosenberg, H. Schulzrinne, "Models for Multi Party 1902 Conferencing in SIP", Internet Draft , IETF; Nov. 2000. Work in progress. 1905 [service examples] A. Johnston, R. Sparks, C. Cunningham, S. 1906 Donovan, K. Summers, "SIP Service Examples" Internet Draft , IETF, June 2002, Work in 1908 progress. 1910 [Join] R. Mahy, D. Petrie, "The SIP Join and Fork Headers", Internet 1911 Draft , IETF, November 1912 2001, Work in progress. 1914 [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson , 1915 "RTP: A Transport Protocol for Real-Time Applications", Request for 1916 Comments (Standards Track)1889, IETF, January 1996 1918 [SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session 1919 Description Protocol", Request for Comments (Standards Track) 2327, 1920 Internet Engineering Task Force, April 1998 1921 SIP Multiparty Framework 1923 [events] A. Roach, "SIP-Specific Event Notification",Internet Draft 1924 , IETF, February 2002, Work in 1925 progress. 1927 [offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model 1928 with SDP", Internet Draft , IETF, February 21, 2002, Work in progress. 1931 [caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee 1932 Capabilities",Internet Draft , 1933 IETF, November 21, 2001, Work in progress. 1935 [msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet 1936 Draft , IETF, July 2001, Work 1937 in progress. 1939 [Presence] Rosenberg et al., "SIP Extensions for Presence", Internet 1940 Draft , IETF, November 21, 2001, 1941 Work in progress. 1943 [visited] D. Oran, H. Schulzrinne, "The Visited Header",Internet 1944 Draft <>, IETF, date, Work in progress. 1946 [app components] , "",Internet Draft <>, IETF, date, Work in 1947 progress. 1949 [ms-uri] J. Van Dyke, E. Burger, "SIP URI Conventions for Media 1950 Servers",Internet Draft , IETF, 1951 November 21, 2001, Work in progress. 1953 [call-pkg] J. Rosenberg, H. Schulzrinne, "SIP Event Packages for 1954 Call Leg and Conference State", Internet Draft , IETF, July 13, 2001, Work in progress. 1957 [enum] , "",Internet Draft <>, IETF, date, Work in progress. 1959 [http] R. Fielding et al, "Hypertext Transfer Protocol -- 1960 HTTP/1.1", Request for Comments (Standards Track) 2616, Internet 1961 Engineering Task Force, June 1999 1963 [rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming 1964 Protocol (RTSP)", Request for Comments (Standards Track) 2326, 1965 Internet Engineering Task Force, April 1998 1967 [mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource 1968 Control Protocol", Internet Draft , 1969 IETF, November 20, 2001, Work in progress. 1971 [VoiceXML] S. McGlashan et al, "Voice Extensible Markup Language 1972 (VoiceXML) Version 2.0", W3C Working Draft, 23 October 2001, Work in 1973 progress. 1975 [H.323] 1976 SIP Multiparty Framework 1978 [tel URL] 1980 [caller-prefs] 1982 [session timer] 1984 [service context] 1986 [avt tones] 1988 [GSM] 1990 [MPEG2] 1992 [G.711] 1994 [H.261] 1996 [H.450] 1998 [JTAPI] 2000 [CSTA] 2002 [mrcp-sip] , "",Internet Draft , 2003 IETF, date, Work in progress. 2005 [distributed full mesh conf] 2007 [Media forking] M. Shankar, "SIP Forked Media", Internet Draft 2008 , IETF, Feb. 2001. Work in 2009 progress. 2011 [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for 2012 Remote Phone Control", Internet Draft , 2013 IETF, Jan. 2001. Work in progress. 2015 9 Changes since -00 2017 - Removed many media-specific references. 2019 - Condensed discussion on mixing models, and VoiceXML discussion. 2021 - Moved the sample feature discussion to an Appendix 2023 10 2024 To Do 2026 - Add diagrams to section 4.3.2 and 4.3.3 2028 - Convert to XML 2030 - Fix references 2031 SIP Multiparty Framework 2033 - Propose to move Appendix A (sample features to service flows) 2035 - Align with terminology with conferencing drafts 2037 - Show roadmap for related drafts 2039 Other frameworks and requirements 2040 Conferencing framework 2041 Conferencing models 2042 Framework for markup 2044 Extensions 2045 REFER 2046 Replaces 2047 Join 2048 Caller prefs 2050 Packages 2051 conference-package 2052 dialog package 2054 Usage Drafts 2055 3pcc 2056 cc-transfer 2058 Informational Drafts 2059 Service flows 2061 - Define some semantics for authorization rules. For example one 2062 could define a dictionary of primitives and/or perhaps define sets 2063 or classes of these primitives, then configure who is allowed to use 2064 them 2066 11 2067 Acknowledgments 2069 Thanks to all who attended the SIP interim meeting in February 2001 2070 for their support of the ideas behind this document. 2072 12 2073 Author's Addresses 2075 Rohan Mahy 2076 Cisco Systems 2077 170 West Tasman Dr, MS: SJC-21/3/3 2078 Phone: +1 408 526 8570 2079 Email: rohan@cisco.com 2081 Ben Campbell 2082 dynamicsoft 2083 5100 Tennyson Parkway 2084 Suite 1200 2085 Plano, Texas 75024 2086 Email: bcampbell@dynamicsoft.com 2087 SIP Multiparty Framework 2089 Alan Johnston 2090 WorldCom 2091 100 S. 4th Street 2092 St. Louis, Missouri 63104 2093 Email: alan.johnston@wcom.com 2095 Daniel G. Petrie 2096 Pingtel Corp. 2097 400 W. Cummings Park 2098 Suite 2200 2099 Woburn, MA 01801 2100 Phone: +1 781 938 5306 2101 Email: dpetrie@pingtel.com 2103 Jonathan Rosenberg 2104 dynamicsoft 2105 72 Eagle Rock Avenue 2106 First Floor 2107 East Hanover, NJ 07936 2108 Email: jdrosen@dynamicsoft.com 2110 Robert J. Sparks 2111 dynamicsoft 2112 5100 Tennyson Parkway 2113 Suite 1200 2114 Plano, TX 75024 2115 Email: rsparks@dynamicsoft.com 2117 Full Copyright Statement 2119 "Copyright (C) The Internet Society (date). All Rights Reserved. 2120 This document and translations of it may be copied and furnished to 2121 others, and derivative works that comment on or otherwise explain it 2122 or assist in its implementation may be prepared, copied, published 2123 and distributed, in whole or in part, without restriction of any 2124 kind, provided that the above copyright notice and this paragraph 2125 are included on all such copies and derivative works. However, this 2126 document itself may not be modified in any way, such as by removing 2127 the copyright notice or references to the Internet Society or other 2128 Internet organizations, except as needed for the purpose of 2129 developing Internet standards in which case the procedures for 2130 copyrights defined in the Internet Standards process must be 2131 followed, or as required to translate it into languages other than 2132 English. 2134 The limited permissions granted above are perpetual and will not be 2135 revoked by the Internet Society or its successors or assigns. 2136 This document and the information contained herein is provided on an 2137 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2138 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2139 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2140 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2141 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2143 SIP Multiparty Framework