idnits 2.17.1 draft-ietf-sipping-cc-framework-00.txt: ** The Abstract section seems to be numbered -(1691): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(2137): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == There are 5 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1301 has weird spacing: '...with on sip...' == Line 1313 has weird spacing: '... prompt sip:s...' == Line 1679 has weird spacing: '...e media topol...' == Line 1699 has weird spacing: '...1 Pt2pt mix...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'CPL' is mentioned on line 354, but not defined -- Looks like a reference, but probably isn't: '9' on line 630 == Missing Reference: 'SLP' is mentioned on line 650, but not defined -- Looks like a reference, but probably isn't: '13' on line 674 == Missing Reference: 'SAP' is mentioned on line 704, but not defined == Missing Reference: 'Events' is mentioned on line 766, but not defined == Missing Reference: 'MRCP' is mentioned on line 973, but not defined == Missing Reference: 'MRCP-SIP' is mentioned on line 974, but not defined == Missing Reference: 'RTSP' is mentioned on line 1016, but not defined == Missing Reference: 'Caller-prefs' is mentioned on line 1105, but not defined == Unused Reference: 'RTP' is defined on line 2078, but no explicit reference was found in the text == Unused Reference: 'Presence' is defined on line 2104, but no explicit reference was found in the text == Unused Reference: 'GSM' is defined on line 2153, but no explicit reference was found in the text == Unused Reference: 'MPEG2' is defined on line 2155, but no explicit reference was found in the text == Unused Reference: 'G.711' is defined on line 2157, but no explicit reference was found in the text == Unused Reference: 'JTAPI' is defined on line 2163, but no explicit reference was found in the text == Unused Reference: 'CSTA' is defined on line 2165, but no explicit reference was found in the text == Unused Reference: 'PHONECTL' is defined on line 2176, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2543 (ref. 'SIP') (Obsoleted by RFC 3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265) -- Possible downref: Non-RFC (?) normative reference: ref. 'REFER' -- Possible downref: Non-RFC (?) normative reference: ref. '3pcc' -- Possible downref: Non-RFC (?) normative reference: ref. 'Replaces' -- Possible downref: Non-RFC (?) normative reference: ref. 'Join' Unexpected reference format, failed extracting the RFC number: [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson , "RTP: A Transport Protocol for Real-Time Applications", Request for Comments (Standards Track)1889, IETF, January 1996 -- Possible downref: Non-RFC (?) normative reference: ref. 'RTP' ** Obsolete normative reference: RFC 2327 (ref. 'SDP') (Obsoleted by RFC 4566) -- Possible downref: Non-RFC (?) normative reference: ref. 'Presence' -- Possible downref: Non-RFC (?) normative reference: ref. 'VoiceXML' -- Possible downref: Non-RFC (?) normative reference: ref. 'GSM' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG2' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'JTAPI' -- Possible downref: Non-RFC (?) normative reference: ref. 'CSTA' -- Possible downref: Non-RFC (?) normative reference: ref. 'PHONECTL' Summary: 9 errors (**), 0 flaws (~~), 23 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING Working Group Mahy/Cisco 3 Internet Draft Campbell/dynamicsoft 4 Document: draft-ietf-sipping-cc-framework-00.txt Johnston/Worldcom 5 February 2002 Petrie/Pingtel 6 Rosenberg/dynamicsoft 7 Expires: August 2002 Sparks/dynamicsoft 9 A Multi-party Application Framework for SIP 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026 [RFC2026]. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. Internet-Drafts are draft documents valid for a maximum of 20 six months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet- Drafts 22 as reference material or to cite them other than as "work in 23 progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 1 Abstract 31 This document defines a framework and requirements for multi-party 32 applications in SIP. To enable discussion of multi-party 33 applications we define an abstract call model for describing the 34 media relationships required by many of these applications. The 35 model and actions described here are specifically chosen to be 36 independent of the SIP signaling and/or mixing approach chosen to 37 actually setup the media relationships. In addition to its dialog 38 manipulation aspect, this framework includes requirements for 39 communicating related information and events such as conference and 40 session state, and session history. This framework also describes 41 other goals which embody the spirit of SIP applications as used on 42 the Internet. 44 2 Conventions used in this document 46 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 47 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" this 48 document are to be interpreted as described in RFC-2119 [RFC2119]. 50 SIP Multiparty Framework 52 Table of Contents 53 1 Abstract.......................................................1 54 2 Conventions used in this document..............................1 55 3 Motivation and Background......................................4 56 3.1 Goals........................................................4 57 3.2 Example Features.............................................6 58 4 Key Concepts...................................................9 59 4.1 "Conversation Space" Model...................................9 60 4.1.1 Comparison with Related Definitions.......................10 61 4.2 Signaling Models............................................11 62 4.3 Mixing Models...............................................12 63 4.3.1 (Single) End System Mixing................................12 64 4.3.2 Centralized Mixing........................................12 65 4.3.3 Multicast and Multi-unicast conferences...................14 66 4.4 Conveying Information and Events............................15 67 4.5 Componentization and Decomposition..........................16 68 4.5.1 Media Intermediaries......................................17 69 4.5.2 Queue Server..............................................18 70 4.5.3 Parking Place.............................................18 71 4.5.4 Announcements and Voice Dialogs...........................19 72 4.6 Use of URIs.................................................21 73 4.6.1 Naming Users in SIP.......................................21 74 4.6.2 Naming Services with SIP URIs.............................23 75 4.7 Invoker Independence........................................26 76 4.8 Billing issues..............................................26 77 5 Catalog of call control actions and sample features............26 78 5.1 Early Dialog Actions........................................27 79 5.1.1 Remote Answer.............................................27 80 5.1.2 Remote Forward or Put.....................................27 81 5.1.3 Remote Busy or Error Out..................................27 82 5.2 Single Dialog Actions.......................................27 83 5.2.1 Remote Dial...............................................28 84 5.2.2 Remote On and Off Hold....................................28 85 5.2.3 Remote Hangup.............................................28 86 5.3 Multi-dialog actions........................................28 87 5.3.1 Transfer..................................................28 88 5.3.2 Take......................................................29 89 5.3.3 Add.......................................................29 90 5.3.4 Local Join................................................30 91 5.3.5 Insert....................................................30 92 5.3.6 Split.....................................................31 93 5.3.7 Near-fork.................................................31 94 5.3.8 Far fork..................................................31 95 6 Putting it all together.......................................33 96 6.1 Feature Solutions...........................................34 97 6.1.1 Call Park.................................................34 98 6.1.2 Call Pickup...............................................34 99 6.1.3 Music on Hold.............................................35 100 6.1.4 Call Monitoring...........................................35 101 6.1.5 Barge-in..................................................35 102 6.1.6 Intercom..................................................35 103 6.1.7 Speakerphone paging.......................................36 104 6.1.8 Distinctive ring..........................................36 105 SIP Multiparty Framework 107 6.1.9 Voice message screening...................................36 108 6.1.10 Single Line Extension.....................................36 109 6.1.11 Click-to-dial.............................................36 110 6.1.12 Pre-paid calling..........................................37 111 6.1.13 Voice Portal..............................................37 112 7 Security Considerations.......................................38 113 8 References....................................................39 114 9 Acknowledgments...............................................41 115 10 Author's Addresses...........................................41 116 SIP Multiparty Framework 118 3 Motivation and Background 120 The Session Initiation Protocol [SIP] was defined for the 121 initiation, maintenance, and termination of sessions or calls 122 between one or more users. However, despite its origins as a large- 123 scale multiparty conferencing protocol, SIP is used today primarily 124 for point to point calls. This two-party configuration is the focus 125 of the SIP specification and most of its extensions. 127 This document defines a framework and requirements for multi-party 128 applications in SIP. Most multi-party applications manipulate SIP 129 dialogs (also known as call legs) to cause participants in a 130 conversation to perceive specific media relationships. In other 131 protocols that deal with the concept of calls, this manipulation is 132 known as call control. In addition to its dialog manipulation 133 aspect, "call control" also includes communicating information and 134 events related to manipulating calls, including information and 135 events dealing with session state and history, conference state, 136 user state, and even message state. 138 3.1 Goals 139 Based on input from the SIP community, the authors compiled the 140 following set of goals for SIP call control: 142 - Define Primitives, Not Services. Allow for a handful of robust 143 yet simple mechanisms which can be combined to deliver features and 144 services. Throughout this document we refer to these simple 145 mechanisms as "primitives". Primitives should be sufficiently 146 robust that when they are combined they can be used to build lots of 147 services. However, the goal is not to define a provably complete 148 set of primitives. Note that while the IETF will NOT standardize 149 behavior or services, it may define example services for 150 informational purposes, as in [service examples]. 152 - Participant oriented. The primitives should be designed to 153 provide services which are oriented around the experience of the 154 participants. The authors observe that end users of features and 155 services usually don't care how a media relationship is setup. 156 Their ultimate experience is based only on the resulting media and 157 other externally visible characteristics. 159 - Signaling Model independent: Support both a central control and a 160 peer-to-peer feature invocation model (and combinations of the two). 161 baseline SIP already supports a centralized control model described 162 in [3pcc], and the SIP community has expressed a great deal of 163 interest in peer-to-peer or distributed call control. Some such 164 primitives are already defined in [REFER] and [Replaces]. 166 - Mixing Model independent: The bulk of interesting multiparty 167 applications involve mixing or combining media from multiple 168 participants. This mixing can be performed by one or more of the 169 participants, or by a centralized mixing resource. The experience 170 SIP Multiparty Framework 172 of the participants should not depend on the mixing model used. 173 While most examples in this document refer to audio mixing, the 174 framework applies to any media type. In this context a "mixer" 175 refers to combining media in an appropriate, media-specific way. 177 - Invoker oriented. Only the user who invokes a feature or a service 178 needs to know exactly which service is invoked or why. This is good 179 because it allows new services to be created without requiring new 180 primitives from all the participants; and it allows for much simpler 181 feature authorization policies, for example, when participation 182 spans organizational boundaries. As discussed in section 4.7, this 183 also avoids exponential state explosion when combining features. 184 The invoker only has to manage a user interface or API to prevent 185 local feature interactions. All the other participants simply need 186 to manage the feature interactions of a much smaller number of 187 primitives. 189 - Primitives make full use of URIs. URIs are a very powerful 190 mechanism for describing users and services. They represent a 191 plentiful resource which can be extremely expressive and easily 192 routed, translated, and manipulated--even across organizational 193 boundaries. URIs can contain special parameters and informational 194 headers which need only be relevant to the owner of the namespace 195 (domain) of the URI. Just as a user who selects an http: URL need 196 not understand the significance and organization of the web site it 197 references, a user may encounter a SIP URL which translates into an 198 email-style group alias, which plays a pre-recorded message, or runs 199 some complex call-handling logic. 201 - Make use of SIP headers and SIP event packages to provide SIP 202 entities with information about their environment. These should 203 include information about the status / handling of dialogs on other 204 user agents, information about the history of other contacts 205 attempted prior to the current contact, the status of participants, 206 the status of conferences, user presence information, and the status 207 of messages. 209 - Encourage service decomposition, and design to make use of 210 standard components using well-defined, simple interfaces. Sample 211 components include a media mixer, recording service, announcement 212 server, and voice dialog server. (This is not an exhaustive list). 214 - Include authentication, authorization, policy, logging, and 215 accounting mechanisms to allow these primitives to be used safely 216 among mutually untrusted participants. Some of these mechanisms may 217 be used to assist in billing, but no specific billing system will be 218 endorsed. 220 - Permit graceful fallback to baseline SIP. Definitions for new SIP 221 call control extensions/primitives MUST describe a graceful way to 222 fallback to baseline SIP behavior. Support for one primitive MUST 223 NOT imply support for another primitive. 225 SIP Multiparty Framework 227 - Do not reinvent traditional models, such as the model used the 228 H.450 family of protocols, JTAPI, or the CSTA call model. In the 229 opinion of the authors, these models share more characteristics of 230 the traditional telephone network than with SIP. As these other 231 models do not share the design goals presented in this document, it 232 would be a disservice to these other protocols and SIP to try to 233 shoehorn our new design goals into an existing model. 235 3.2 Example Features 237 Primitives are defined in terms of their ability to provide 238 features. These example features should require an amply robust set 239 of services to demonstrate a useful set of primitives. They are 240 described here briefly. Note that the descriptions of these features 241 are non-normative. Some of these features are used as examples in 242 section 6 to demonstrate how some features may require certain media 243 relationships. Note also that this document describes a mixture of 244 both features originating in the world of telephones, and features 245 which are clearly Internet oriented. 247 Example Features: 249 Call Waiting - Alice is in a call, then receives another call. 250 Alice can place the first call on hold, and talk with the other 251 caller. She can typically switch back and forth between the 252 callers. 254 Blind Transfer - Alice is in a conversation with Bob. Alice asks 255 Bob to contact Carol, but makes no attempt to contact Craol 256 independently. In many implementations, Alice does not verify Bob's 257 success or failure in contacting Carol. 259 Attended Transfer - The transferring party establishes a session 260 with the transfer target before completing the transfer. 262 Consultative transfer - the transferring party establishes a session 263 with the target and mixes both sessions together so that all three 264 parties can participate, then disconnects leaving the transferee and 265 transfer target with an active session. 267 Conference Call - Three or more active, visible participants in the 268 same conversation space. 270 Call Park - A call participant parks a call (essentially puts the 271 call on hold), and then retrieves it at a later time (typically from 272 another location). 274 Call Pickup - A party picks up a call that was ringing at another 275 location. One variation allows the caller to choose which location, 276 another variation just picks up any call in that user's "pickup 277 group". 279 SIP Multiparty Framework 281 Music on Hold - When Alice places a call with Bob on hold, it 282 replaces its audio with streaming content such as music, 283 announcements, or advertisements. 285 Call Monitoring - A call center supervisor joins an in-progress call 286 for monitoring purposes. 288 Barge-in - Carol interrupts Alice who has a call in-progress call 289 with Bob. In some variations, Alice forcibly joins a new 290 conversation with Carol, in other variations, all three parties are 291 placed in the same conversation (basically a 3-way conference). 293 Hotline - Alice picks up a phone and is immediately connected to the 294 technical support hotline, for example. 296 Autoanswer - Calls to a certain address or location answer 297 immediately via a speakerphone. 299 Intercom - Alice typically presses a button on a phone which 300 immediately connects to another user or phone and casues that phone 301 to play her voice over its speaker. Some variations immediately 302 setup two-way communications, other variations require another 303 button to be pressed to enable a two-way conversation. 305 Speakerphone paging - Alice calls the paging address and speaks. 306 Her voice is played on the speaker of every idle phone in a 307 preconfigured group of phones. 309 Speed dial - Alice dials an abbreviated number, or enters an alias, 310 or presses a special speed dial button representing Bob. Her action 311 is interpreted as if she specified the full address of Bob. 313 Call Return - Alice calls Bob. Bob misses the call or is 314 disconnected before he is finished talking to Alice. Bob invokes 315 Call return which calls Alice, even if Alice did not provide her 316 real identity or location to Bob. 318 Inbound Call Screening - Alice doesn't want to receive calls from 319 Matt. Inbound Screening prevents Matt from disturbing Alice. In 320 some variations this works even if Matt hides his identity. 322 Outbound Call Screening - Alice is paged and unknowingly calls a 323 PSTN pay-service telephone number in the Carribean, but local policy 324 blocks her call, and possibly informs her why. 326 Call Forwarding - Before a call-leg is accepted it is redirected to 327 another location, for example, because the originally intended 328 recipient is busy, does not answer, is disconnected from the 329 network, configured all requests to go soemwhere else. 331 Message Waiting - Bob calls Alice when she steps away from her 332 phone, when she returns a visible or audible indicator conveys that 333 someone has left her a voicemail message. The message waiting 334 SIP Multiparty Framework 336 indication may also convey how many messages are waiting, from whom, 337 what time, and other useful pieces of information. 339 Do Not Disturb - Alice selects the Do Not Disturb option. Calls to 340 her either ring briefly or not at all and are forwarded elsewhere. 341 Some variations allow specially authorized callers to override this 342 feature and ring Alice anyway. 344 Distinctive ring - Incoming calls have different ring cadences or 345 sample sounds depending on the From party, the To party, or other 346 factors. 348 Automatic Callback: Alice calls Bob, but Bob is busy. Alice would 349 like Bob to call her automatically when he is available. When Bob 350 hangs up, alice's phone rings. When Alice answers, Bob's phone 351 rings. Bob answers and they talk. 353 Find-Me - Alice sets up complicated rules for how she can be reached 354 (possibly using [CPL], [presence] or other factors). When Bob calls 355 Alice, his call is eventually routed to a temporary Contact where 356 Alice happens to be available. 358 Whispered call waiting - Alice is in a conversation with Bob. Carol 359 calls Alice. Either Carol can "whisper" to Alice directly ("Can you 360 get lunch in 15 minutes?"), or an automaton whispers to Alice 361 informing her that Carol is trying to reach her. 363 Voice message screening - Bob calls Alice. Alice is screening her 364 calls, so Bob hears Alice's voicemail greeting. Alice can hear Bob 365 leave his message. If she decides to talk to Bob, she can take the 366 call back from the voicemail system, otherwise she can let Bob leave 367 a message. This emulates the behavior of a home telephone answering 368 machine 370 Presence-Enabled Conferencing: Alice wants to set up a conference 371 call with Bob and Cathy when they all happen to be available (rather 372 than scheduling a predefined time). The server providing the 373 application monitors their status, and calls all three when they are 374 all "online", not idle, and not in another call. 376 IM Conference Alerts: A user receives an notification as an Instant 377 Message whenever someone joins a conference they are also in. 379 Single Line Extension -- A group of phones are all treated as 380 "extensions" of a single line. A call for one rings them all. As 381 soon as one answers, the others stop ringing. If any extension is 382 actively in a coversation, another extension can "pick up" and 383 immediately join the conversation. This emulates the behavior of a 384 home telephone line with multiple phones. 386 Click-to-dial - Alice looks in her company directory for Bob. When 387 she finds Bob, she clicks on a URL to call him. Her phone rings (or 388 possibly answers automatically), and when she answers, Bob's phone 389 rings. 391 SIP Multiparty Framework 393 Pre-paid calling - Alice pays for a certain currency or unit amount 394 of calling value. When she places a call, she provides her account 395 number somehow. If her account runs out of calling value during a 396 call her call is disconnected or redirected to a service where she 397 can purchase more calling value. 399 Voice Portal - A service that allows users to access a portal site 400 using spoken dialog interaction. For example, Alice needs to 401 schedule a working dinner with her co-worker Carol. Alice uses a 402 voice portal to check Carol's flight schedule, find a restauraunt 403 near her hotel, make a reservation, get directions there, and page 404 Carol with this information. 406 4 Key Concepts 408 4.1 "Conversation Space" Model 410 This document introduces the concept of an abstract "conversation 411 space" (essentially as a set of participants who believe they are 412 all communicating among one another). Each conversation space 413 contains one or more participants. 415 Participants are SIP User Agents which send original media to or 416 terminate and receive media from other members of the conversation 417 space. Logically, every participant in the conversation space has 418 access to all the media generated in that space (this is strictly 419 true if all participants share a common media type). A SIP User 420 Agent which does not contribute or consume any media is NOT a 421 participant; nor is a user agent which merely forwards, transcodes, 422 mixes, or selects media originating elsewhere in the conversation 423 space. [Note that a conversation space consists of zero or more SIP 424 calls or SIP conferences. A conversation space is similar to the 425 definition of a "call" in some other call models.] 427 Participants may represent human users or non-human users (referred 428 to as robots or automatons in this document). Some participants may 429 be hidden within a conversation space. Some examples of hidden 430 participants include: robots which generate tones, images, or 431 announcements during a conference to announce users arriving and 432 departing, a human call center supervisor monitoring a conversation 433 between a trainee and a customer, and robots which record media for 434 training or archival purposes. 436 Participants may also be active or passive. Active participants are 437 expected to be intelligent enough to leave a conversation space when 438 they no longer desire to participate. (An attentive human 439 participant is obviously active.) Some robotic participants (such 440 as a voice messaging system, an instant messaging agent, or a voice 441 dialog system) may be active participants if they can leave the 442 conversation space when there is no human interaction. Other robots 443 (for example our tone generating robot from the previous example) 444 are passive participants. A human participant "on-hold" is passive. 446 SIP Multiparty Framework 448 An example diagram of a conversation space can be shown as a 449 "bubble" or ovals, or as a "set" in curly or square brace notation. 450 Each set, oval, or "bubble" represents a conversation space. Hidden 451 participants are shown in lowercase letters. 453 { A , B } [ A , B ] 455 .-. .---. 456 / \ / \ 457 / A \ / A b \ 458 ( ) ( ) 459 \ B / \ C D / 460 \ / \ / 461 '-' '---' 463 4.1.1 Comparison with Related Definitions 465 In SIP, a call is "an informal term that refers to some 466 communication between peers, generally set up for the purposes of a 467 multimedia conversation." Obviously we cannot discuss normative 468 behavior based on such an intentionally vague definition. The 469 concept of a conversation space is needed because the SIP definition 470 of call is not sufficiently precise for the purpose of describing 471 the user experience of multiparty features. 473 Do any other definitions convey the correct meaning? SIP, and [SDP] 474 both define a conference as "a multimedia session identified by a 475 common session description." A session is defined as "a set of 476 multimedia senders and receivers and the data streams flowing from 477 senders to receivers." Both of these definitions are heavily 478 oriented toward multicast sessions with little differenciation among 479 participants. As such, neither is particularly useful for our 480 purposes. In fact, the definition of "call" in some call models is 481 more similar to our definition of a conversation space. 483 Some examples of the relationship between conversation spaces, SIP 484 call legs, and SIP sessions are listed below. In each example, a 485 human user will perceive that there is a single call. 487 A simple two-party call is a single conversation space, a single 488 session, and a single call-leg. 490 A locally mixed three-way call is two sessions and two call- 491 legs. It is also a single conversation space. 493 A simple dial-in audio conference is a single conversation 494 space, but is represented by as many call-legs and sessions as 495 there are human participants. 497 A multicast conference is a single conversation space, a single 498 session, and as many call-legs as participants. 500 SIP Multiparty Framework 502 4.2 Signaling Models 504 Obviously to make changes to a conversation space, you must be able 505 to use SIP signaling to cause these changes. Specifically there 506 must be a way to manipulate SIP dialogs (call legs) to move 507 participants into and out of conversation spaces. Although this is 508 not as obvious, there also must be a way to manipulate SIP dialogs 509 to include non-participant user agents which are otherwise involved 510 in a conversation space (ex: B2BUAs, 3pcc controllers, mixers, 511 transcoders, translators, or relays). 513 Implementations may setup the media relationships described in the 514 conversation space model using the approach described in [3pcc]. The 515 3pcc approach relies on only the following 3 primitive operations: 517 Create a new call-leg (INVITE) 518 Modify a call-leg (reINVITE) 519 Destroy a call-leg (BYE) 521 The main advantage of the 3pcc approach is that it only requires 522 very basic SIP support from end systems to support call control 523 features. As such, third-party call control is a natural way to 524 handle protocol conversion and mid-call features. It also has the 525 advantage and disadvantage that new features can/must be implemented 526 in one place only (the controller), and neither requires enhanced 527 client functionality, nor takes advantage of it. 529 In addition, a peer-to-peer approach is discussed at length in this 530 draft. The primary drawback of the peer-to-peer model is additional 531 end system complexity. The benefits of the peer-to-peer model 532 include: 533 - state remains at the edges 534 - call signaling need only go through participants involved 535 (there are no additional points of failure) 536 - peers can take advantage of end-to-end message integrity or 537 encryption 538 - setup time is shorter (fewer messages and round trips 539 are required) 541 The peer-to-peer approach relies on additional "primitive" 542 operations, some of which are identified here. 544 Replace an existing dialog 545 Join a new dialog with an existing dialog [Join] 546 Fork a new dialog with an existing dialog 547 Locally do media forking (multi-unicast) 548 Ask another UA to send a request on your behalf 550 Many of the features, primitives, and actions described in this 551 document also require some type of media mixing, combining, or 552 selection as described in the next section. 554 SIP Multiparty Framework 556 4.3 Mixing Models 558 SIP permits a variety of mixing models, which are discussed here 559 briefly. This topic is discussed more thoroughly in [conf-models]. 560 For brevity, only the two most popular conferencing models are 561 significantly discussed in this document (local and centralized 562 mixing). Applications of the conversation spaces model to multicast 563 and multi-unicast (full unicast mesh) conferences are left as an 564 exercise for the reader. Note that a distributed full mesh 565 conference can be used for basic conferences, but does not easily 566 allow for more complex conferencing actions like splitting, joining, 567 and forking. 569 Call control features should be designed to allow a mixer (local or 570 centralized) to decide when to reduce a conference back to a 2-party 571 call, or drop all the participants (for example if only two 572 automatons are communicating). The actual heuristics used to 573 release calls are beyond the scope of this document, but may depend 574 on properties in the conversation space, such as the number of 575 active, passive, or hidden participants; and the send-only, receive- 576 only, or send-and-receive orientation of various participants. 578 4.3.1 (Single) End System Mixing 580 The first model we call "end system mixing". In this model, user A 581 calls user B, and they have a conversation. At some point later, A 582 decides to conference in user C. To do this, A calls C, using a 583 completely separate SIP call. This call uses a different Call-ID, 584 different tags, etc. There is no call set up directly between B and 585 C. No SIP extension or external signaling is needed. A merely 586 decides to locally join two call-legs. 588 [diagram] 590 A receives media streams from both B and C, and mixes them. A sends 591 a stream containing A's and C's streams to B, and a stream 592 containing A's and B's streams to C. Basically, user A handles both 593 signaling and media mixing. B and C are unaware of the multi-party 594 call, from a SIP perspective at least. From an RTP perspective, A is 595 a mixer, and so the RTCP reports from A will contain SDES 596 information that indicates the existence of an additional party in 597 the media stream. 599 4.3.2 Centralized Mixing 601 In a centralized mixing model, all participants have a pairwise SIP 602 and media relationship with the mixer. Three applications of 603 centralized mixing are also discussed below. 605 [diagram] 607 4.3.2.1 Dial-In Conference Servers 608 SIP Multiparty Framework 610 Dial-In conference servers closely mirror dial-in conference bridges 611 in the traditional PSTN. A dial-in conference server acts as a 612 normal SIP UA. Users call it, and the server maintains point to 613 point SIP relationships with each user that calls in. The server 614 takes the media from the users who dial into the same conference, 615 mixes them, and sends out the appropriate mixed stream to each 616 participant separately. The model is depicted in Figure 3. Note that 617 each UA (A,B,C,D) has a point to point SIP and RTP relationship with 618 the conference server. Each call has a different Call-ID. Each user 619 sends their own media to the server. The media delivered to user A 620 by the server is the media mixed from users B, C and D. The media 621 delivered to user B by the server is the media mixed from users A, C 622 and D. The media delivered to user C by the server is the media 623 mixed from users A, B and D. The media delivered to user D is the 624 media mixed from users A, B and C (this is also known as a mix-minus 625 configuration). 627 As in other applications of centralized mixing, the conference is 628 identified by the request URI of the calls from each participant. 629 This provides numerous advantages from a services and routing point 630 of view [9]. For example, one conference on the server might be 631 known as sip:conference34@servers.com. All users who call 632 sip:conference34@servers.com are mixed together. Dial-In conference 633 servers are usually associated with pre-arranged conferences. 634 However, the same model applies to ad-hoc conferences. An ad-hoc 635 conference server creates the conference state when the first user 636 joins, and destroys it when the last one leaves. The SIP and RTP 637 interfaces are identical to the pre-arranged case. 639 4.3.2.2 Ad-hoc Centralized Conferences 641 In an ad-hoc centralized conference, two users A and B start with a 642 normal SIP call. At some point later, they decide to add a third 643 party. Instead of using end system mixing, they would prefer to use 644 a conference server. Initially, A calls B. At some point, B decides 645 to add user C to the call, and begins the transition to a conference 646 server. The first step in this process is the discovery of a 647 conference server that supports ad-hoc conferences. This can be done 648 through static configuration, or through any of a number of standard 649 service discovery protocols, such as the Service Location Protocol 650 [SLP]. Once the server is discovered, a conference ID is chosen. 651 This ID must be globally unique. The conference ID is then prepended 652 to the server, and a SIP URL for the ad-hoc conference is formed. 653 For example, if the server "a.servers.com" is used, and the unique 654 ID is "a7hytaskp09878a", the SIP URL for this conference is 655 sip:a7hytaskp09878a@a.servers.com. The first participant to send an 656 INVITE to this URL creates the initial conference state in the 657 server. SIP dialogs are manipulated (using any combination of 3pcc 658 or peer-to-peer signaling) so that each participant is sending media 659 to the conference server. It is also possible to transition from a 660 end system mixed conference (even one with a complex connection 661 topology), to a centralized conference server. 663 SIP Multiparty Framework 665 4.3.2.3 Dial-Out Conferences 667 Dial-out conferences are a simple variation on dial-in conferences. 668 Instead of the users joining the conference by sending an INVITE to 669 the server, the server chooses the users who are to be members of 670 the conference, and then sends them the INVITE. Typically dial out 671 conferences are pre-arranged, with specific start times and an 672 initial group membership list. However, there are other means for 673 the dial-out server to determine the list of participants, including 674 user presence [13]. The model in no way limits the means by which 675 the server determines the set of users. Once the users accept or 676 reject the call from the dial out server, the behavior of this 677 system is identical to the dial-in server case of Section 4. Thus, a 678 dial-out conference server will generally need to support dial-in 679 access for the same conference, if it wishes to allow joining after 680 the conference begins. Note that, from the participants perspective, 681 they will learn the conference identity (the URL) from the From 682 field in the INVITE messages received from the server. 684 4.3.3 Multicast and Multi-unicast conferences 686 In these models, all endpoints send media to all other endpoints. 687 Consequently every endpoint mixes their own media from all the other 688 sources, and sends their own media to every other participant. 690 [diagrams] 692 4.3.3.1 Large-Scale Multicast Conferences 694 Large-scale multicast conferences were the original motivation for 695 both the Session Description Protocol [SDP] and SIP. In a large- 696 scale multicast conference, one or more multicast addresses are 697 allocated to the conference (more than one may be needed if layered 698 encodings are in use). Each participant joins that multicast groups, 699 and sends their media to those groups. Signaling is not sent to the 700 multicast groups. The sole purpose of the signaling is to inform 701 participants of which multicast groups to join. Large-scale 702 multicast conferences are usually pre-arranged, with specific start 703 and stop times (which is why this information exists in SDP). 704 Protocols such as the Session Announcement Protocol [SAP] are used 705 to announce these conferences. However, multicast conferences do not 706 need to be pre-arranged, so long as a mechanism exists to 707 dynamically obtain a multicast address. So, if there are N 708 participants, there will be point-to-point SIP relationships with 709 pairs of participants. Each participant sends a single media stream 710 to the group, and receives up to N-1 streams at any time. Note that 711 the number of streams that a user will receive depends on who is 712 actually sending at any given time. If the stream is audio, and 713 silence suppression is utilized, the number of streams a user will 714 receive at any given time is equal to the number of users talking at 715 any given time. Even for very large conferences, this is usually 716 just a small number of users. 718 SIP Multiparty Framework 720 4.3.3.2 Centralized Signaling, Distributed Media 722 In this conferencing model, there is a centralized controller, as in 723 the dial-in and dial-out cases. However, the centralized server 724 handles signaling only. The media is still sent directly between 725 participants, using either multicast or multi-unicast. Multi-unicast 726 is when a user sends multiple packets (one for each recipient, 727 addressed to that recipient). This is referred to as a 728 "Decentralized Multipoint Conference" in [H.323]. 730 4.3.3.3 Full Distributed Unicast Conferencing 732 In this conferencing model, each participant has both a pairwise 733 media relationship and a pairwise SIP relationship with every other 734 participant (a full mesh). This model requires a mechanism to 735 maintain a consistent view of distributed state across the group. 736 This is a classic hard problem in computer science. Also, this 737 model does not scale well for large numbers of participants. 738 bascause for participants the number of media and SIP 739 relationships is approximately n-squared. As a result, this model 740 is not generally available in commercial implementations; to the 741 contrary it is primarily the topic of research or experimental 742 implementations. Note that this model assumes peer-to-peer 743 signaling. 745 4.4 Conveying Information and Events 747 Participants should have access to information about the other 748 participants in a conversation space, so that this information can 749 be rendered to a human user or processed by an automaton. Although 750 some of this information may be available from the Request-URI or 751 To, From, Contact, or other SIP headers, another mechanism of 752 reporting this information is necessary. Note that the data 753 reported by RTCP is insufficient for these purposes, as deletions 754 and additions are not detectable in real-time, and SIP may setup 755 session which do not involve RTP media. 757 Many applications are driven by knowledge about the progress of 758 calls and conferences. In general these types of events allow for 759 the construction of distributed applications, where the application 760 requires information on dialog and conference state, but is not 761 necessarily co-resident with an endpoint user agent or conference 762 server. For example, a mixer involved in a conversation space may 763 wish to provide URLs for conference status, and/or conference/floor 764 control. 766 The SIP [Events] architecture defines general mechanisms for 767 subscription to and notification of events within SIP networks. It 768 introduces the notion of a package which is a specific 769 "instantiation" of the events mechanism for a well-defined set of 770 events. 772 New event packages should be able to 773 SIP Multiparty Framework 775 provide the status of a user's call-legs (dialogs), provide the 776 status of conferences and its participants, provide user presence 777 information, and provide the status of user's messages. While this 778 is not an exhaustive list, these are sufficient to enable the sample 779 features described in this document. 781 A conference event package allows users to subscribe to information 782 about an entire conference or conversation space. This conference 783 state could be provided by a conference server or mixing component 784 (described in Section 4.5) if centralized mixing is used, or 785 gathered from relevant peers and merged into a cohesive set of 786 state. Notifications would convey information about the 787 pariticipants such as: the SIP URL identifying each user, their 788 status in the space (active, declined, departed), URLs to invoke 789 other features (such as sidebar conversations), links to other 790 relevant information (such as floor control policies), and if floor 791 control policies are in place, the user's floor control status. A 792 "call-leg" event package would provide information about all the 793 dialogs the target user is maintaining, what conversations the user 794 in participating in, and how these are correlated. A concrete 795 proposal for both conference events and call-leg events is described 796 in [call-pkg]. 798 Note that user presence has a close relationship with these two 799 proposed event packages. It is fundamental to the presence model 800 that the information used to obtain user presence is constructed 801 from any number of different input sources. Examples of such sources 802 include SIP REGISTER requests and uploads of presence documents. 803 These two packages can be considered another mechanism that allows a 804 presence agent to determine the presence state of the user. 805 Specifically, a user presence server can act as a subscriber for the 806 call-leg and conference packages to obtain additional information 807 that can be used to construct a presence document. 809 The multi-party architecture should also provide a mechanism to get 810 information about the status /handling of a dialog (for example, 811 information about the history of other contacts attempted prior to 812 the current contact). Finally, the architecture should provide 813 ample opportunities to present informational URIs which relate to 814 calls, conversations, or dialogs in some way. For example, consider 815 the SIP Call-Info header, or Contact headers returned in a 300-class 816 response. Frequently additional information about a call or dialog 817 can be fetched via non-SIP URIs. For example, consider a web page 818 for package tracking when calling a delivery company, or a web page 819 with related documentation when joining a dial-in conference. The 820 use of URIs in the multiparty framework is discussed in more detail 821 in Section 4.6. 823 4.5 Componentization and Decomposition 825 This framework proposes a decomposed component architecture with a 826 very loose coupling of services and components. This means that a 827 service (such as a conferencing server or an auto-attendant) need 828 SIP Multiparty Framework 830 not be implemented as an actual server. Rather, these services can 831 be built by combining a few basic components in straightforward or 832 arbitrarily complex ways. 834 Since the components are easily deployed on separate boxes, by 835 separate vendors, or even with separate providers, we achieve a 836 separation of function that allows each piece to be developed in 837 complete isolation. We can also reuse existing components for new 838 applications. This allows rapid service creation, and the ability 839 for services to be distributed across organizational domains 840 anywhere in the Internet. 842 For many of these components it is also desirable to discover their 843 capabilities, for example querying the ability of a mixer to host a 844 10 dialog conference, or to reserve resources for a specific time. 845 These actions could be provided in the form of URLs, provided there 846 is an a priori means of understanding their semantics. For example 847 if there is a published dictionary of operations, a way to query the 848 service for the available operations and the associated URLs, the 849 URL can be the interface for providing these service operations. 850 This concept is described in more detail in the context of dialog 851 operations in section 4.6 853 4.5.1 Media Intermediaries 855 Media Intermediaries are not participants in any conversation space, 856 although an entity which is also a media translator may also have a 857 colocated participant component (for example a mixer which also 858 announces the arrival of a new participant; the announcement portion 859 is a participant, but the mixer itself is not). Media 860 intermediaries should be as transparent as possible to the end 861 users--offering a useful, fundamental service; without getting in 862 the way of new features implemented by participants. Some common 863 media intermediaries are desribed below. 865 4.5.1.1 Mixer 867 A mixer is a component that combines media from all call-legs in the 868 same conversation in a media specific way. For example, the default 869 combining for an audio conference would be an N-1 configuration. In 870 other words, each user receives a mixed media stream that represents 871 the combined audio of all the users except himself or herself. 873 For reference, the RTP definition of a mixer is included below. 874 Note that SIP multiparty applications may deal with media which is 875 not carried by RTP (for example Instant Messages). A mixer, as 876 defined above, can still combine these messages in a media specific 877 way and act as a SIP mixing component. 879 "Mixer: An intermediate system that recieves RTP packets from 880 one or more sources, ... combines the packets in some manner 881 and then forwards a new RTP packet. Since the timing across 882 multiple input sources will not generally be syncronized, the 883 mixer will make timing adjustments among the streams and 884 SIP Multiparty Framework 886 generate its own timing for the combined stream. Thus all data 887 packets originating from a mixer will be identified as having 888 the mixer as their syncronization source." 890 Conventions for specifying a mixing or conferencing service in a SIP 891 URI are proposed in [ms-uri]. 893 4.5.1.2 Media Translator 895 RTP also defines an entity called a translator. Like a mixer, this 896 concept is useful outside of the context of RTP and can be applied 897 to most other media types. 899 "Translator: An intermediate system that forwards RTP packets 900 with their syncronization source identifier intact. Examples 901 of translators include devices that convert encodings without 902 mixing, replicators from multicast to unicast, and application- 903 level firewalls." 905 4.5.1.3 Transcoder 907 A transcoder translates media from one encoding to another (for 908 example, GSM voice to G.711, or MPEG2 to H.261). A transcoder for 909 RTP media is a type of RTP translator. 911 4.5.1.4 Media Relay 913 A media relay terminates media and simply forwards it to a new 914 destination without changing the content in any way. Sometimes 915 media relays are used to provide source IP address anonymity, to 916 facilitate middlebox traversal, or to provide a trusted entity where 917 media can be forcefully disconnected. A media relay for RTP is also 918 a type of RTP Translator. 920 4.5.2 Queue Server 922 A queue server is a location where calls can be entered into one of 923 several FIFO (first-in, first-out) queues. A queue server would 924 subscribe to the presence of groups or individuals who are 925 interested in its queues. When detecting that a user is available 926 to service a queue, the server redirects or transfers the last call 927 in the relevant queue to the available user. On a queue-by-queue 928 basis, authorized users could also subscribe to the call state 929 (dialog information) of calls within a queue. Authorized users 930 could use this information to effectively pluck (take) a call out of 931 the queue (for example by sending an INVITE with a Replaces header 932 to one of the user agents in the queue). 934 4.5.3 Parking Place 936 A parking place is a location where calls can be terminated 937 temporarily and then retrieved later. While a call is "parked", it 938 can receive media "on-hold" such as music, announcements, or 939 SIP Multiparty Framework 941 advertisements. Such a service could be further decomposed such 942 that announcements or music are handled by a separate component. 944 4.5.4 Announcements and Voice Dialogs 946 An announcement server is a server which can play digitized media 947 (frequently audio), such as music or recorded speech. These servers 948 are typically accessible via SIP, HTTP, or RTSP. An analogous 949 service is a recording service which stores digitized media. A 950 convention for specifying announcements in SIP URIs is described in 951 [ms-uri]. Likewise the same server could easily provide a service 952 which records digitized media. 954 A "voice dialog" is a model of spoken interactive behavior between a 955 human and an automaton which can include synthesized speech, 956 digitized audio, recognition of spoken and DTMF key input, recording 957 of spoken input, and interaction with call control. Dialogs 958 frequently consist of forms or menus. Forms present information and 959 gather input; menus offer choices of what to do next. 961 Spoken dialogs are a basic building block of applications which use 962 voice. Consider for example that a voice mail system, the 963 conference-id and passcode collection system for a conferencing 964 system, and complicated voice portal applications all require a 965 voice dialog component. 967 4.5.4.1. Text-to-Speech and Automatic Speech Recognition 969 Text-to-Speech (TTS) is a service which converts text into digitized 970 audio. TTS is frequently integrated into other applications, but 971 when separated as a component, it provides greater opportunity for 972 broad reuse. Various interfaces to access standalone TTS services 973 via HTTP, RTSP (in [MRCP]), and SIP ([app-components], [ms-uri] and 974 [MRCP-SIP]) have been proposed. 976 Automatic Speech Recognition (ASR) is a service which attempts to 977 decipher digitized speech based on a proposed grammar. Like TTS, 978 ASR services can be embedded, or exposed so that many applications 979 can take advantage of such services. Various IP interfaces to ASR, 980 such as MRCP, have been proposed. 982 4.5.4.2. VoiceXML 984 [VoiceXML] is a W3C recommendation that was designed to give authors 985 control over the spoken dialog between users and applications. The 986 application and user take turns speaking: the application prompts 987 the user, and the user in turn responds. Its major goal is to bring 988 the advantages of web-based development and content delivery to 989 interactive voice response applications. We believe that VoiceXML 990 represents the ideal partner for SIP in the development of 991 distributed IVR servers. VoiceXML is an XML based scripting language 992 SIP Multiparty Framework 994 for describing IVR services at an abstract level. VoiceXML supports 995 DTMF recognition, speech recognition, text-to-speech, and playing 996 out of recorded media files. The results of the data collected from 997 the user are passed to a controlling entity through an HTTP POST 998 operation. The controller can then return another script, or 999 terminate the interaction with the IVR server. 1001 A VoiceXML server also need not be implemented as a monolithic 1002 server. Below is a diagram of a VoiceXML browser which is split 1003 into media and non-media handling parts. The VoiceXML interpreter 1004 handles SIP dialog state and state within a VoiceXML document, and 1005 sends requests to the media component over another protocol (for 1006 example RTSP). 1008 +-------------+ 1009 | | 1010 | VoiceXML | 1011 | Interpreter | 1012 | (signaling) | 1013 +-------------+ 1014 ^ ^ 1015 | | 1016 SIP | | [RTSP] 1017 | | 1018 | | 1019 v v 1020 +-------------+ +-------------+ 1021 | | | | 1022 | SIP UA | RTP | RTSP Server | 1023 | |<------>| (media) | 1024 | | | | 1025 +-------------+ +-------------+ 1027 Figure : Decomposed VoiceXML Server 1029 From a naming perspective, a critical issue when using VoiceXML is 1030 how a request URI is associated with a script to invoke when the 1031 call is answered. We see three primary mechanisms: 1) There is a 1032 one-to-one binding of the address in the request URI to a script to 1033 execute. These bindings are published by the provider of the IVR 1034 service. 2) The initial script to execute is actually carried as 1035 content in the body of the SIP INVITE request. The request URI 1036 indicates that the desired service is execution of content in the 1037 request (i.e., sip:executebody@servers.com). 3) The initial script 1038 to execute is fetched by the VoiceXML server; the URL to fetch it 1039 from is passed in the SIP INVITE message that initiates the IVR 1040 session. This can be accomplished either with the application/uri 1041 MIME type as a body, or using the *-Info headers defined in SIP 1042 which provide references to content to fetch. We believe that the 1043 third approach is probably the best one. SIP is not the ideal 1044 transfer mechanism. Passing a URI allows a far better transfer tool, 1045 for example HTTP, to be used to actually fetch the script back from 1046 SIP Multiparty Framework 1048 the controller. HTTP is then also used to pass back form data from 1049 the IVR to the controller. The results of the HTTP POST can also 1050 contain additional VoiceXML scripts to execute. More details about 1051 the integration of SIP with VoiceXML are provided in [sip-vxml] 1053 4.6 Use of URIs 1055 All naming in SIP uses URIs. URIs in SIP are used in a plethora of 1056 contexts: the Request-URI; Contact, To, From, and *-Info headers; 1057 application/uri bodies; and embedded in email, web pages, instant 1058 messages, and ENUM records. The request-URI identifies the user or 1059 service that the call is destined for. 1061 SIP URIs embedded in informational SIP headers, SIP bodies, and non- 1062 SIP content can also specify methods, special parameters, headers, 1063 and even bodies. For example: 1065 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 1066 &To=;tag=879738 1067 &From=;tag=023214 1069 sip:bob@babylon.biloxi.com;method=REFER? 1070 Refer-To= 1072 Throughout this draft we discuss call control primitive operations. 1073 One of the biggest problems is defining how these operations may be 1074 invoked. There are a number of ways to do this. One way is to 1075 define the primitives in the protocol itself such that SIP methods 1076 (for example REFER) or SIP headers (for example Replaces) indicate a 1077 specific call control action. Another way to invoke call control 1078 primitives is to define a specific Request-URI naming convention. 1079 Either these conventions must be shared between the client (the 1080 invoker) and the server, or published by or on behlf of the server. 1081 The former involves defining URL construction techniques (e.g. URL 1082 parameters and/or token conventions) as proposed in [ms-uri]. The 1083 latter technique usually involves discovering the URI via a SIP 1084 event package, a web page, a business card, or an Instant Message. 1085 Yet another means to acquire the URLs is to define a dictionary of 1086 primitives with well-defined semantics and provide a means to query 1087 the named primitives and corresponding URLs that may be invoked on 1088 the service or dialogs. 1090 4.6.1 Naming Users in SIP 1092 An address-of-record, or public SIP address, is a SIP (or SIPS) URI 1093 that points to a domain with a location server that can map the URI 1094 to set of Contact URIs where the user might be available. Typically 1095 the Contact URIs are populated via registration. 1097 Address of Record Contacts 1099 sip:bob@biloxi.com -> sip:bob@babylon.biloxi.com:5060 1100 sip:bbrown@mailbox.provider.net 1101 sip:+1.408.555.6789@mobile.net 1103 SIP Multiparty Framework 1105 [Caller-prefs] defines a set of additional parameters to the Contact 1106 header that define the characteristics of the user agent at the 1107 specified URI. For example, there is a mobility parameter which 1108 indicates whether the UA is fixed or mobile. When a user agent 1109 registers, it places these parameters in the Contact headers to 1110 characterize the URIs it is registering. This allows a proxy for 1111 that domain to have information about the contact addresses for that 1112 user. 1114 When a caller sends a request, it can optionally include the Accept- 1115 Contact and Reject-Contact headers which request certain handling by 1116 the proxy in the target domain. These headers contain preferences 1117 that describe the set of desired URIs to which the caller would like 1118 their request routed. The proxy in the target domain matches these 1119 preferences with the Contact characteristics originally registered 1120 by the target user. The target user can also choose to run 1121 arbitrarily complex "Find-me" feature logic on a proxy in the target 1122 domain. 1124 There is a strong asymmetry in how preferences for callers and 1125 callees can be presented to the network. While a caller takes an 1126 active role by initiating the request, the callee takes a passive 1127 role in waiting for requests. This motivates the use of callee- 1128 supplied scripts and caller preferences included in the call 1129 request. This asymmetry is also reflected in the appropriate 1130 relationship between caller and callee preferences. A server for a 1131 callee should respect the wishes of the caller to avoid certain 1132 locations, while the preferences among locations has to be the 1133 callee's choice, as it determines where, for example, the phone 1134 rings and whether the callee incurs mobile telephone charges for 1135 incoming calls. 1137 SIP User Agent implementations are encouraged to make intelligent 1138 decisions based on the type of participants (active/passive, hidden, 1139 human/robot) in a conversation space. This information is conveyed 1140 in a SIP URI parameter and communicated using an appropriate SIP 1141 header or event body. For example, a music on hold service may take 1142 the sensible approach that if there are two or more unhidden 1143 participants, it should not provide hold music; or that it will not 1144 send hold music to robots. 1146 Multiple participants in the same conversation space may represent 1147 the same human user. For example, the user may use one participant 1148 for video, chat, and whiteboard media on a PC and another for audio 1149 media on a SIP phone. In this case, the address-of-record is the 1150 same for both user agents, but the Contacts are different. In 1151 addition, human users may add robot participants which act on their 1152 behalf (for example a call recording service, or a calendar 1153 reminder). Call Control features in SIP should continue to function 1154 as expected in such an environment. 1156 SIP Multiparty Framework 1158 4.6.2 Naming Services with SIP URIs. 1160 A critical piece of defining a session level service that can be 1161 accessed by SIP is defining the naming of the resources within that 1162 service. This point cannot be overstated. 1164 In the context of SIP control of application components, we take 1165 advantage of the fact that the standard SIP URI has a user part. 1166 Most services may be thought of as user automatons that participate 1167 in SIP sessions. It naturally follows that the user address, or the 1168 left-hand-side of the URI, should be utilized as a service 1169 indicator. 1171 For example, media servers commonly offer multiple services at a 1172 single host address. Use of the user part as a service indicator 1173 enables service consumers to direct their requests without 1174 ambiguity. It has the added benefit of enabling media services to 1175 register their availability with SIP Registrars just as any "real" 1176 SIP user would. This maintains consistency and provides enhanced 1177 flexibility in the deployment of media services in the network. 1179 There has been much discussion about the potential for confusion if 1180 media services URIs are not readily distinguishable from other types 1181 of SIP UA's. The use of a service namespace provides a mechanism to 1182 unambiguously identify standard interfaces while not constraining 1183 the development of private or experimental services. 1185 In SIP, the request-URI identifies the user or service that the call 1186 is destined for. The great advantage of using URIs (specifically, 1187 the SIP request URI) as a service identifier comes because of the 1188 combination of two facts. First, unlike in the PSTN, where the 1189 namespace (dialable telephone numbers) are limited, URIs come from 1190 an infinite space. They are plentiful, and they are free. Secondly, 1191 the primary function of SIP is call routing through manipulations of 1192 the request URI. In the traditional SIP application, this URI 1193 represents people. However, the URI can also represent services, as 1194 we propose here. This means we can apply the routing services SIP 1195 provides to routing of calls to services. The result - the problem 1196 of service invocation and service location becomes a routing 1197 problem, for which SIP provides a scalable and flexible solution. 1198 Since there is such a vast namespace of services, we can explicitly 1199 name each service in a finely granular way. This allows the 1200 distribution of services across the network. 1202 Consider a conferencing service, where we have separated the names 1203 of ad-hoc conferences from scheduled conferences, we can program 1204 proxies to route calls for ad-hoc conferences to one set of servers, 1205 and calls for scheduled ones to another, possibly even in a 1206 different provider. In fact, since each conference itself is given a 1207 URI, we can distribute conferences across servers, and easily 1208 guarantee that calls for the same conference always get routed to 1209 the same server. This is in stark contrast to conferences in the 1210 telephone network, where the equivalent of the URI - the phone 1211 number - is scarce. An entire conferencing provider generally has 1212 SIP Multiparty Framework 1214 one or two numbers. Conference IDs must be obtained through IVR 1215 interactions with the caller, or through a human attendant. This 1216 makes it difficult to distribute conferences across servers all over 1217 the network, since the PSTN routing only knows about the dialed 1218 number. 1220 In the case of a dialog server, the voice dialog itself is the 1221 target for the call. As such, the request URI should contain the 1222 identifier for this spoken dialog. This is consistent with the 1223 Request-URI service invocation model of RFC 3087. This URL can be in 1224 one of two formats. In the first, the VoiceXML script is identified 1225 directly by an HTTP URL. In the second, the script is not specified. 1226 Rather, the dialog server uses its configuration to map the incoming 1227 request to a specific script. 1229 Since the request URI could indicate a request for a variety of 1230 different services, of which a dialog server is only one type, this 1231 example request URI first begins with a service identifier, that 1232 indicates the basic service required. For VoiceXML scripts, this 1233 identification information is a URL-encoded version of the URL which 1234 references the script to execute, or if not present, the dialog 1235 server uses server-specific configuration to determine which script 1236 to execute. 1238 Examples of URLs that invoke VoiceXML dialogs are: 1239 (line folding for clarity only) 1241 sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml 1242 @vxmlservers.com 1244 sip:dialog.vxml@vxmlservers.com 1246 The first of these indicates that the dialog server (located at 1247 vxmlservers.com) should invoke a VoiceXML script fetched from 1248 http://dialogs.server.com/script32.vxml. Since the user part of the 1249 SIP URL cannot contain the : character, this must be escaped to %3a. 1251 These types of conventions are not limited to application component 1252 servers. An ordinary SIP User Agent can have a special URIs as 1253 well, for example, one which is automatically answered by a 1254 speakerphone. Since URIs are so plentiful, using a separate URI for 1255 this service does not exhaust a valuable resource. The requested 1256 service is clear to the user agent receiving the request. This URI 1257 can also be included as part of another feature (for example, the 1258 Intercom feature described in Section 6.1.6). This feature can be 1259 specified with a SIP user parameter, since are part of the userpart 1260 of a SIP URI. 1262 Likewise a Request URI can fully describe an announcement service 1263 through the use of the user part of the address and additional URI 1264 parameters. In our example, the user portion of the address, 1265 "annc", specifies the announcement service on the media server. 1266 The two URI parameters "play=" and "early=" specify the audio 1267 resource to play and whether early media is desired. 1269 SIP Multiparty Framework 1271 sip:annc@ms2.carrier.net; 1272 play=http://audio.carrier.net/allcircuitsbusy.au;early=yes 1274 sip:annc@ms2.carrier.net; 1275 play=file://fileserver.carrier.net/geminii/yourHoroscope.wav 1277 In practical applications, it is important that an invoker does not 1278 necessarily apply semantic rules to various URIs it did not create. 1279 Instead, it should allow any arbitrary string to be provisioned, and 1280 map the string to the desired behavior. The administrator of a 1281 service may choose to provision specific conventions or mnemonic 1282 strings, but the application should not require it. In any large 1283 installation, the system owner is likely to have pre-existing rules 1284 for mnemonic URIs, and any attempt by an application to define its 1285 own rules may create a conflict. Implementations should allow an 1286 arbitrary mix of URLs from these schemes, or any other scheme that 1287 renders valid SIP URIs to be provisioned, rather than enforce only 1288 one particular scheme. 1290 For example, a voicemail application can be built using very 1291 different sets of URI conventions, as illustrated below: 1293 URI Identity Example Scheme 1 1294 Example Scheme 2 1295 Example Scheme 3 1297 Deposit with sip:sub-rjs-deposit@vm.wcom.com 1298 standard greeting sip:677283@vm.wcom.com 1299 sip:rjs@vm.wcom.com;mode=deposit 1301 Deposit with on sip:sub-rjs-deposit-busy.vm.wcom.com 1302 phone greeting sip:677372@vm.wcom.com 1303 sip:rjs@vm.wcom.com;mode=3991243 1305 Deposit with sip:sub-rjs-deposit-sg@vm.wcom.com 1306 special greeting sip:677384@vm.wcom.com 1307 sip:rjs@vm.wcom.com;mode=sg 1309 Retrieve - SIP sip:sub-rjs-retrieve@vm.wcom.com 1310 authentication sip:677405@vm.wcom.com 1311 sip:rjs@vm.wcom.com;mode=retrieve 1313 Retrieve - prompt sip:sub-rjs-retrieve-inpin.vm.wcom.com 1314 for PIN in-band sip:677415@vm.wcom.com 1315 sip:rjs@vm.wcom.com;mode=inpin 1317 As we have shown, SIP URIs represent an ideal, flexbile mechanism 1318 for describing and naming service resources, be they queues, 1319 conferences, voice dialogs, announcements, voicemail treatments, or 1320 phone features. 1322 SIP Multiparty Framework 1324 4.7 Invoker Independence 1326 Only the invoker of features in SIP know exactly which feature they 1327 are invoking. One of the primary benefits of this approach is that 1328 combinations of features should work in SIP call control. For 1329 example, let us examine the combination of a "transfer" of a call 1330 which is "conferenced". 1332 Alice calls Bob. Alice silently "conferences in" her robotic 1333 assistant Albert as a hidden party. Bob transfers Alice to Carol. 1334 If Bob asks Alice to Replace her leg with a new one to Carol then 1335 both Alice and Albert should be communicating with Carol 1336 (transparently). 1338 Using the peer-to-peer model, this combination of features works 1339 fine if A is doing local mixing (Alice replaces Bob's call-leg with 1340 Carol's), or if A is using a central mixer (the mixer replaces Bob's 1341 call leg with Carol's). A clever implementation using the 3pcc 1342 model can generate similar results. 1344 New extensions to the SIP Call Control Framework should attempt to 1345 preserve this property. 1347 4.8 Billing issues 1349 Billing in the PSTN is typically based on who initiated a call. At 1350 the moment billing in a SIP network is neither consistent with 1351 itself, nor with the PSTN. (A billing model for SIP should allow 1352 for both PSTN-style billing, and non-PSTN billing.) The example 1353 below demonstrates one such inconsistency. 1355 Alice places a call to Bob. Alice then blind transfers Bob to Carol 1356 through a PSTN gateway. In current usage of REFER and BYE/Also, Bob 1357 may be billed for a call he did not initiate (his UA originated the 1358 outgoing call leg however). This is not necessarily a terrible 1359 thing, but it demonstrates a security concern (Bob must have 1360 appropriate local policy to prevent fraud). Also, Alice may wish to 1361 pay for Bob's session with Carol. There should be a way to signal 1362 this in SIP. 1364 Likewise a Replacement call may maintain the same billing 1365 relationship as a Replaced call, so if Alice first calls Carol, then 1366 asks Bob to Replace this call, Alice may continue to receive a bill. 1368 Further work in SIP billing should define a way to set or discover 1369 the direction of billing. 1371 5 Catalog of call control actions and sample features 1373 Call control actions can be categorized by the dialogs upon which 1374 they operate. The actions may involve a single or multiple dialogs. 1375 These dialogs can be early or established. Multiple dialogs may be 1376 SIP Multiparty Framework 1378 related in a conversation space to form a conference or other 1379 interesting media topologies. 1381 It should be noted that it is desirable to provide a means by which 1382 a party can discover the actions which may be performed on a dialog. 1383 The interested party may be independent or related to the dialogs. 1384 One means of accomplishing this is through the ability to define and 1385 obtain URLs for these actions as described in section 4.6. 1387 Below are listed several call control "actions" which establish or 1388 modify dialogs and relate the participants in a conversation space. 1389 The names of the actions listed are for descriptive purposes only 1390 (they are not normative). This list of actions is not meant to be 1391 exhaustive. 1393 In the examples, all actions are initiated by the user "Alice" 1394 represented by UA "A". 1396 5.1 Early Dialog Actions 1398 The following are a set of actions that may be performed on a single 1399 early dialog. These actions can be thought of as a set of remote 1400 control operations. For example an automaton might perform the 1401 operation on behalf of a user. Alternatively a user might use the 1402 remote control in the form of an application to perform the action 1403 on the early dialog of a UA which may be out of reach. All of these 1404 actions correspond to telling the UA how to respond to a request to 1405 establish an early dialog. These actions provide useful 1406 functionality for PDA, PC and server based applications which desire 1407 the ability to control a UA. 1409 5.1.1 Remote Answer 1411 A dialog is in some early dialog state such as 180 Ringing. It may 1412 be desirable to tell the UA to answer the dialog. That is tell it 1413 to send a 200 Ok response to establish the dialog. 1415 5.1.2 Remote Forward or Put 1417 It may be desirable to tell the UA to respond with a 3xx class 1418 response to forward an early dialog to another UA. 1420 5.1.3 Remote Busy or Error Out 1422 It may be desirable to instruct the UA to send an error response 1423 such as 486 Busy Here. 1425 5.2 Single Dialog Actions 1427 There is another useful set of actions which operate on a single 1428 established dialog. These operations are useful in building 1429 productivity applications for aiding users to control their phone. 1430 For example a CRM application which sets up calls for a user 1431 SIP Multiparty Framework 1433 eliminating the need for the user to actually enter an address. 1434 These operations can also be thought of a remote control actions. 1436 5.2.1 Remote Dial 1438 This action instructs the UA to initiate a dialog. This action can 1439 be performed using the REFER method. 1441 5.2.2 Remote On and Off Hold 1443 This action instructs the UA to put an established dialog on hold. 1444 Though this operation can be conceptually be performed with the 1445 REFER method, there is no semantics defined as to what the referred 1446 party should do with the SDP. There is no way to distinguish between 1447 the desire to go on or off hold. 1449 5.2.3 Remote Hangup 1451 This action instructs the UA to terminate an early or established 1452 dialog. A REFER request with the following Refer-To URI performs 1453 this action. Note: this URL is not properly escaped. 1455 sip:bob@babylon.biloxi.com;method=BYE?Call-ID=13413098 1456 &To=;tag=879738 1457 &From=;tag=023214 1459 5.3 Multi-dialog actions 1461 These actions apply to a set of related dialogs. 1463 5.3.1 Transfer 1465 The conversation space changes as follows: 1467 before after 1468 { A , B } --> { C , B } 1470 A replaces itself with C. 1472 To make this happen using the peer-to-peer approach, "A" would send 1473 two SIP requests. A shorthand for those requests is shown below: 1474 REFER B Refer-To:C 1475 BYE B 1477 To make this happen instead using the 3pcc approach, the controller 1478 sends requests represented by the shorthand below: 1479 INVITE C (w/SDP of B) 1480 reINVITE B (w/SDP of C) 1481 BYE A 1483 Features enabled by this action: 1484 - blind transfer 1485 - transfer to a central mixer (some type of conference or forking) 1486 SIP Multiparty Framework 1488 - transfer to park server (park) 1489 - transfer to music on hold or announcement server 1490 - transfer to a "queue" 1491 - transfer to a service (such as Voice Dialogs service) 1492 - transition from local mixer to central mixer 1494 5.3.2 Take 1496 The conversation space changes as follows: 1498 { B , C } --> { B , A } 1500 A forcibly replaces C with itself. In most uses of this primitive, 1501 A is just "un-replacing" itself. 1503 Using the peer-to-peer approach, "A" sends: 1504 INVITE B Replaces: 1506 Using the 3pcc approach (all requests sent from controller) 1507 INVITE A (w/SDP of B) 1508 reINVITE B (w/SDP of A) 1509 BYE C 1511 Features enabled by this action: 1512 - transferee completes an attended transfer 1513 - retrieve from central mixer (not recommended) 1514 - retrieve from music on hold or park 1515 - retrieve from queue 1516 - call center take 1517 - voice portal resuming ownership of a call it originated 1518 - answering-machine style screening (pickup) 1519 - pickup of a ringing call (i.e. early dialog) 1521 Note: that pick up of a ringing call has perhaps some interesting 1522 additional requirements. First of all it is an early dialog as 1523 opposed to an established dialog. Secondly the party which is to 1524 pickup the call may only wish to do so only while it is an early 1525 dialog. That is in the race condition where the ringing UA accepts 1526 just before it receives signaling from the party wishing to take the 1527 call, the taking party wishes to yield or cancel the take. The goal 1528 is to avoid yanking an answered call from the called party. 1530 5.3.3 Add 1532 The conversation space changes as follows: 1534 { A , B } --> { A, B, C } 1536 A adds C to the conversation. 1538 Using the peer-to-peer approach, adding a party using local mixing 1539 requires no signaling. To transition from a 2-party call or a 1540 locally mixed conference to centrally mixing A could send the 1541 following requests: 1543 SIP Multiparty Framework 1545 REFER B Refer-To: mixer 1546 INVITE mixer 1547 BYE B 1549 To add a party to a central mixer: 1550 REFER C Refer-To: mixer 1551 or 1552 REFER mixer Refer-To: C 1554 Using the 3pcc approach to transition to centrally mixed, the 1555 controller would send: 1556 INVITE mixer leg 1 (w/SDP of A) 1557 INVITE mixer leg 2 (w/SDP of B) 1558 INVITE C (late SDP) 1559 reINVITE A (w/SDP of mixer leg 1) 1560 reINVITE B (w/SDP of mixer leg 2) 1561 INVITE mixer leg3 (w/SDP of C) 1563 To add a party to a central mixer: 1564 INVITE C (late SDP) 1565 INVITE mixer (w/SDP of C) 1567 Features enabled: 1568 - standard conference feature 1569 - call recording 1570 - answering-machine style screening (screening) 1572 5.3.4 Local Join 1574 The conversation space changes like this: 1576 { A, B} , {A, C} --> {A, B, C} 1578 or like this 1580 { A, B} , {C, D} --> {A, B, C, D} 1582 A takes two conversation spaces and joins them together into a 1583 single space. 1585 Using the peer-to-peer approach, A can mix locally, or REFER the 1586 participants of both conversation spaces to the same central mixer 1587 (as in 5.3) 1589 For the 3pcc approach, the call flows for inserting participants, 1590 and joining and splitting conversation spaces are tedious yet 1591 straightforward, so these are left as an exercise for the reader. 1593 Features enabled: 1594 - standard conference feature 1595 - leaving a sidebar to rejoin a larger conference 1597 5.3.5 Insert 1598 SIP Multiparty Framework 1600 The conversation space changes like this: 1602 { B , C } --> {A, B, C } 1604 A inserts itself into a conversation space. 1606 A proposed mechanism for signaling this using the peer-to-peer 1607 approach is to send a new header in an INVITE with "joining" 1608 semantics. For example: 1609 INVITE B Join: 1611 If B accepted the INVITE, B would accept responsibility to setup the 1612 call legs and mixing necessary (for example: to mix locally or to 1613 transfer the participants to a central mixer) 1615 Features enabled: 1616 - barge-in 1617 - call center monitoring 1618 - call recording 1620 5.3.6 Split 1621 { A, B, C, D } --> { A, B } , { C, D } 1623 If using a central mixer with peer-to-peer 1624 REFER C Refer-To: mixer (new URI) 1625 REFER D Refer-To: mixer (new URI) 1626 BYE C 1627 BYE D 1629 Features enabled: 1630 - sidebar conversations during a larger conference 1632 5.3.7 Near-fork 1634 A participates in two conversation spaces simultaneously: 1636 { A, B } --> { B , A } & { A , C } 1638 A is a participant in two conversation spaces such that A sends the 1639 same media to both spaces, and renders media from both spaces, 1640 presumably by mixing or rendering the media from both. We can 1641 define that A is the "anchor" point for both forks, each of which is 1642 a separate conversation space. 1644 This action is purely local implementation (it requires no special 1645 signaling). Local features such as switching calls between the 1646 background and foreground are possible using this media 1647 relationship. 1649 5.3.8 Far fork 1651 The conversation space diagram... 1653 SIP Multiparty Framework 1655 { A, B } --> { A , B } & { B , C } 1657 A requests B to be the "anchor" of two conversation spaces. 1659 For an example of using 3pcc to setup media forking, see [Media 1660 forking]. The session descriptions for forking are quite complex. 1661 Controllers should verify that endpoints can handle forked-media, by 1662 using some type of Requires header token. 1664 Two ways to setup this media relationship using peer-to-peer call 1665 control have been proposed: 1666 - the anchor receives a REFER with requires forked-media (implicit) 1667 - the anchor receives an INVITE with Fork-with header (explicit) 1669 Features enabled: 1670 - barge-in 1671 - voice portal services 1672 - whisper 1673 - hotword detection 1674 - sending DTMF somewhere else 1676 The above notation does not fully describe the media topology. Below 1677 are the four possible media topologies by which C might want to join 1678 the A-B dialog. For some of the above listed features there is a 1679 requirement to be able to specify any of these media topologies as 1680 part of joining. In addition it is also a requirement that it be 1681 possible to change the media topology after the initial setup (e.g. 1682 in a reINVITE). An example of this is a silent monitored 1683 conversation which is modified to be a full fledged conference to 1684 allow a call center supervisor to converse with the customer. 1686 The media topology can be separated into two perspectives. The 1687 topology for the send and receive media streams for C. For each of 1688 these streams C needs the ability to specify either point to point 1689 or mixed media. This works out to the matrix where the �send� 1690 column indicates what happens with the media from C at B. The 1691 �receive� column indicates what C wants to receive (mix or only B�s 1692 media). In the greater than 3 party case theoretically this cold be 1693 generalized to specify the set for the mix, however, from a 1694 pragmatic perspective the authors feel it is sufficient to constrain 1695 the description of the sets to all or nothing for now (i.e. point to 1696 point or max of all). 1698 Send Receive 1699 1 Pt2pt mix 1700 2 mix mix 1701 3 Pt2pt Pt2pt 1702 4 mix Pt2pt 1704 For following examples: 1705 A is the customer 1706 B is the agent 1707 C is the supervisor 1708 SIP Multiparty Framework 1710 => and <= indicate the direction of media flow 1712 1. Send: point to point, Receive: mix 1713 Example application: silent monitoring or coaching 1714 A <= B (point to point, only B hears C) 1715 A => B 1716 (A+B) => C (C gets mix of A + B) 1717 B <= C 1719 2. Send: mix, Receive: mix 1720 Example application: Normal Conference 1721 A <= (B+C) (mix, A gets mix of B+C) 1722 A => B 1723 (A+B) => C (C gets mix of A + B) 1724 B <= C 1726 3. Send: point to point, Receive: point to point 1727 Example application: Whisper/Sidebar 1728 A <= B (point to point, only B hears C) 1729 A => B 1730 B => C (point to point, C hears only B) 1731 B <= C 1733 4. Send: mix, Receive: point to point 1734 Example application: Recorded Conversation 1735 C � Voice Recorder 1736 A <= B (point to point, only B hears C) 1737 A => B 1738 (A+B) => C (C gets mix of A + B) 1739 B <= C 1741 6 Putting it all together 1743 These example features should require an amply robust set of 1744 services to demonstrate a useful set of primitives. A summary of 1745 these features is listed below. Implementation of features with an 1746 asterisk (*) are described briefly in Section 6.1. 1748 Example Features: 1749 Call Hold [Offer/Answer] for SIP 1750 Call Waiting Local Implementation 1751 Blind Transfer [cc-transfer] 1752 Attended Transfer [cc-transfer] 1753 Consultative transfer [cc-transfer] 1754 Conference Call [conf-models] 1755 Call Park *[examples] 1756 Call Pickup *[examples] 1757 Music on Hold *[examples] 1758 Call Monitoring *Insert 1759 Barge-in *Insert or Far-Fork 1760 Hotline Local Implementation 1761 Autoanswer Local URI convention 1762 Speed dial Local Implementation 1763 SIP Multiparty Framework 1765 Intercom *Speed dial + autoanswer 1766 Speakerphone paging *Speed dial + autoanswer 1767 Call Return Proxy feature 1768 Inbound Call Screening Proxy or Local implementation 1769 Outbound Call Screening Proxy feature 1770 Call Forwarding Proxy or Local implementation 1771 Message Waiting [msg-waiting] 1772 Do Not Disturb [presence] 1773 Distinctive ring *Proxy or Local implementation 1774 Automatic Callback 2 person presence-based conference 1775 Find-Me Proxy service based on presence 1776 Whispered call waiting Local implementation 1777 Voice message screening * 1778 Presence-based Conferencing*call when presence = available 1779 IM Conference Alerts subscribe to conference status 1780 Single Line Extension * 1781 Click-to-dial * 1782 Pre-paid calling * 1783 Voice Portal * 1785 6.1 Feature Solutions 1787 The following sections illustrates how some of the primitives can be 1788 put together to build some powerful and interesting features. 1790 6.1.1 Call Park 1792 Call park requires the ability to: put a dialog some place, 1793 advertise it to users in a pickup group and to uniquely identify it 1794 in a means that can be communicated (including human voice). The 1795 dialog can be held locally on the UA parking the dialog or 1796 alternatively transferred to the park service for the pickup group. 1797 The parked dialog then needs to be labeled (e.g. orbit 12) in a way 1798 that can be communicated to the party that is to pick up the call. 1799 The UAs in the pick up group discovers the parked dialog(s) via 1800 [call-leg] from the park service. If the dialog is parked locally 1801 the park service merely aggregates the parked call states from the 1802 set of UAs in the pickup up group. 1804 6.1.2 Call Pickup 1806 There are two different features which are called call pickup. The 1807 first is the pickup of a parked dialog. The UA from which the 1808 dialog is to be picked up subscribes to the call state [call-leg] of 1809 the park service or the UA which has locally parked the dialog. 1810 Dialogs which are parked should be labeled with an identifier. The 1811 labels are used by the UA to allow the user to indicate which dialog 1812 is to be picked up. The UA picking up the call invoked the URL in 1813 the call state which is labeled as replace-remote. 1815 The other call pickup feature involves picking up an early dialog 1816 (typically ringing). This feature uses some of the same primitives 1817 as the pick up of a parked call. The call state of the UA ringing 1818 SIP Multiparty Framework 1820 phone is advertised using [call-leg]. The UA which is to pickup the 1821 early dialog subscribes either directly to the ringing UA or to a 1822 service aggregating the states for UAs in the pickup group. The 1823 call state identifies early dialogs. The UA uses the call state(s) 1824 to help the user choose which early dialog that is to be picked up. 1825 The UA then invokes the URL in the call state labeled as replace- 1826 remote. 1828 6.1.3 Music on Hold 1830 Music on hold can be implemented a number of ways. One way is to 1831 transfer the held call to a holding service. When the UA wishes to 1832 take the call off hold it basically performs a take on the call from 1833 the holding service. This involves subscribing to call state on the 1834 holding service and then invoking the URL in the call state labeled 1835 as replace-remote. 1837 Alternatively music on hold can be performed as a local mixing 1838 operation. The UA holding the call can mix in the music from the 1839 music service via RTP (i.e. an additional dialog) or RTSP or other 1840 streaming media source. This approach is simpler (i.e. the held 1841 dialog does not move so there is less chance of loosing them) from a 1842 protocol perspective, however it does use more LAN bandwidth and 1843 resources on the UA. 1845 6.1.4 Call Monitoring 1847 Call monitoring is a [join] operation. The monitoring UA sends a 1848 Join to the dialog it wants to listen to. It is able to discover 1849 the dialog via the call state [call-leg] on the monitored UA. The 1850 monitoring UA sends SDP in the INVITE which indicates receive only 1851 media {offer/answer]. IN addition the monitoring UA should indicate 1852 that it wants to receive a mix (see Error! Reference source not 1853 found.). As the UA is monitoring only it does not matter whether 1854 the UA indicates it wishes the send stream be mix or point to point. 1856 6.1.5 Barge-in 1858 Barge-in works the same as call monitoring except that it must 1859 indicate that the send media stream to be mixed so that all of the 1860 other parties can hear the stream from UA barging in. 1862 6.1.6 Intercom 1864 The UA initiates a dialog using INVITE in the ordinary way [bis]. 1865 The calling UA then signals the paged UA to answer the call. The 1866 calling UA may discover the URL to answer the call via the call 1867 state [call-leg] of the called UA. The called UA accepts the INVITE 1868 with a 200 Ok and automatically enables the speakerphone. 1870 Alternatively this can be a local decision for the UA to answer 1871 based upon called party identification. 1873 SIP Multiparty Framework 1875 6.1.7 Speakerphone paging 1877 Speakerphone paging can be implemented using either multicast or 1878 through a simple multipoint mixer. In the multicast solution the 1879 paging UA sends a multicast INVITE [bis] with send only media in the 1880 [SDP] (see also [offer/answer]). The automatic answer and enabling 1881 of the speakerphone is a locally configured decision on the paged 1882 UAs. The paging UA sends RTP via the multicast address indicated in 1883 the SDP. 1885 The multipoint solution is accomplished by sending an INVITE to the 1886 multipoint mixer. The mixer is configured to automatically answer 1887 the dialog. The paging UA then sends [REFER] requests for each of 1888 the UAs that are to become paging speakers (The UA is likely to send 1889 out a single REFER which is parallel forked by the proxy server). 1890 The UAs performing as paging speakers are configured to 1891 automatically answer based upon caller identification (e.g. To 1892 field, URI or Referred-To headers). 1894 6.1.8 Distinctive ring 1896 The target UA either makes a local decision based on information in 1897 an incoming INVITE (To, From, Contact, Request-URI) or trusts an 1898 Alert-Info header provded by the caller or inserted by a trusted 1899 proxy. In the latter case, the UA fetches the content described in 1900 the URI (typically via http) and renders it to the user. 1902 6.1.9 Voice message screening 1904 At first, this is the same as call monitoring. In this case the 1905 voicemail service is one of the UAs. The UA screening the message 1906 monitors the call on the voicemail service, and also subscribes to 1907 call-leg information. If the user screening their messages decides 1908 to answer, they perform a Take from the voicemail system (for 1909 example, send an INVITE with Replaces to the UA leaving the message) 1911 6.1.10 Single Line Extension 1913 Incoming calls ring all the extensions through basic parallel 1914 forking [bis]. Each extension subscribes to call-leg events from 1915 each other extension. While one user has an active call, any other 1916 UA extension can insert itself into that conversation (it already 1917 knows the call-leg information)in the same way as barge-in. 1919 6.1.11 Click-to-dial 1921 The application or server which hosts the click-to-dial application 1922 captures the URL to be dialed and can setup the call using 3pcc or 1923 can send a [REFER] request to the UA which is to dial the address. 1924 As users sometimes change their mind or wish to give up listing to a 1925 ringing or voicemail answered phone, this application illustrates 1926 the need to also have the ability to remotely hangup a call. 1928 SIP Multiparty Framework 1930 6.1.12 Pre-paid calling 1932 For prepaid calling, the user's media always passes through a device 1933 which is trusted by the pre-paid provider. This may be the other 1934 endpoint (for example a PSTN gateway). In either case, an 1935 intermediary proxy or B2BUA can periodically verify the amount of 1936 time available on the pre-paid account, and use the session-timer 1937 extension to cause the trusted endpoint (gateway) or intermediary 1938 (media relay) to send a reINVITE before that time runs out. During 1939 the reINVITE, the SIP intermediary can reverify the account and 1940 insert another session-timer header. 1942 Note that while most pre-paid systems on the PSTN use an IVR to 1943 collect the account number and destination, this isn't strictly 1944 necessary for a SIP-originated prepaid call. SIP requests and SIP 1945 URIs are sufficiently expressive to convey the final destination, 1946 the provider of the prepaid service, the location from which the 1947 user is calling, and the prepaid account they want to use. If a 1948 pre-paid IVR is used, the mechanism described below (Voice Portals) 1949 can be combined as well. 1951 6.1.13 Voice Portal 1953 A voice portal is essentially a complex collection of voice dialogs 1954 used to access interesting content. One of the most desirable call 1955 control features of a Voice Portal is the ability to start a new 1956 outgoing call from within the context of the Portal (to make a 1957 restauraunt reservation, or return a voicemail message for example). 1958 Once the new call is over, the user should be able to return to the 1959 Portal by pressing a special key, using some DTMF sequence (ex: a 1960 very long pound or hash tone), or by speaking a hotword (ex: "Main 1961 Menu"). 1963 In order to accomplish this, the Voice Portal starts with the 1964 following media relationship: 1966 { User , Voice Portal } 1968 The user then asks to make an outgoing call. The Voice Portal asks 1969 the User to perform a Far-Fork. In other words the Voice Portal 1970 wants the following media relationship: 1972 { Target , User } & { User , Voice Portal } 1974 The Voice Portal is now just listening for a hotword or the 1975 appropriate DTMF. As soon as the user indicates they are done, the 1976 Voice Portal Takes the call from the old Target, and we are back to 1977 the original media relationship. 1979 This feature can also be used by the account number and phone number 1980 collection menu in a pre-paid calling service. A user can press a 1981 DTMF sequence which presents them with the a 1982 SIP Multiparty Framework 1984 7 Security Considerations 1986 Call Control primitives provide a powerful set of features that can 1987 be dangerous in the hands of an attacker. To complicate matters, 1988 call control primitives are likely to be automatically authorized 1989 without direct human oversight. 1991 The class of attacks which are possible using these tools include 1992 the ability to eavesdrop on calls, disconnect calls, redirect calls, 1993 render irritating content (including ringing) at a user agent, cause 1994 an action that has billing consequences, subvert billing (theft-of- 1995 service), and obtain private information. Call control extensions 1996 must take extra care to describe how these attacks will be 1997 prevented. 1999 We can also make some general observations about authorization and 2000 trust with respect to call control. The security model is 2001 dramatically dependent on the signaling model chosen (see section 2002 4.2) 2004 Let us first examine the security model used in the 3pcc approach. 2005 All signaling goes through the controller, which is a trusted 2006 entity. Traditional SIP authentication and hop-by-hop encrpytion 2007 and message integrity work fine in this environment, but end-to-end 2008 encrpytion and message integrity may not be possible. 2010 When using the peer-to-peer approach, call control actions and 2011 primitives can be legitimately initiated by a) an existing 2012 participant in the conversation space, b) a former participant in 2013 the conversation space, or c) an entity trusted by one of the 2014 participants. For example, a participant always initiates a 2015 transfer; a retrieve from Park (a take) is initiated on behalf of a 2016 former participant; and a barge-in (insert or far-fork) is initiated 2017 by a trusted entity (an operator for example). 2019 Authenticating requests by an existing participant or a trusted 2020 entity can be done with baseline SIP mechanisms. In the case of 2021 features initiated by a former participant, these should be 2022 protected against replay attacks by using a unique name or 2023 identifier per invocation. The Replaces header exhibits this 2024 behavior as a by-product of its operation (once a Replaces operation 2025 is successful, the call-leg being Replaced no longer exists). For 2026 other requests, a "one-time" Request-URI may be provided to the 2027 feature invoker. 2029 To authorize call control primitives that trigger special behavior 2030 (such as an INVITE with Replace, Join, or Fork semantics), the 2031 receiving user agent may have trouble finding appropriate 2032 credentials with which to challenge or authorize the request, as the 2033 sender may be completely unknown to the receiver, except through the 2034 introduction of a third party. These credentials need to be passed 2035 transitively in some way or fetched in an event body, for example. 2037 8 References 2039 [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session 2040 Initiation Protocol", RFC2543, Internet Engineering Task Force, 2041 Nov 1998. 2043 [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3", 2044 RFC2026 (BCP), IETF, October 1996. 2046 [RFC2119] S. Bradner, "Key words for use in RFCs to indicate 2047 requirement levels," Request for Comments (Best Current 2048 Practice) 2119, Internet Engineering Task Force, Mar. 1997. 2050 [REFER] R. Sparks, "The Refer Method", Internet Draft , IETF, October 30, 2001, Work in progress. 2053 [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo, 2054 "Third Party Call Control in SIP", Internet Draft , IETF; March 2001. Work in progress 2057 [transfer] R. Sparks, "SIP Call Control - Transfer", Internet Draft 2058 , IETF; Feb. 2001. Work in 2059 progress. 2061 [Replaces] B. Biggs, R. Dean, R. Mahy, "The SIP Replaces Header", 2062 Internet Draft , IETF, Nov. 2001. 2063 Work in progress. 2065 [conf-models] J. Rosenberg, H. Schulzrinne, "Models for Multi Party 2066 Conferencing in SIP", Internet Draft , IETF; Nov. 2000. Work in progress. 2069 [service examples] A. Johnston, R. Sparks, C. Cunningham, S. 2070 Donovan, K. Summers, "SIP Service Examples" Internet Draft , IETF, June 2002, Work in 2072 progress. 2074 [Join] R. Mahy, D. Petrie, "The SIP Join and Fork Headers", Internet 2075 Draft , IETF, November 2076 2001, Work in progress. 2078 [RTP] H. Schulzrinne , S. Casner , R. Frederick , V. Jacobson , 2079 "RTP: A Transport Protocol for Real-Time Applications", Request for 2080 Comments (Standards Track)1889, IETF, January 1996 2082 [SDP] H. Schulzrinne M. Handley, V. Jacobson, "SDP: Session 2083 Description Protocol", Request for Comments (Standards Track) 2327, 2084 Internet Engineering Task Force, April 1998 2086 [events] A. Roach, "SIP-Specific Event Notification",Internet Draft 2087 , IETF, February 2002, Work in 2088 progress. 2090 SIP Multiparty Framework 2092 [offer/answer] J. Rosenberg, H. Schulzrinne, "An Offer/Answer Model 2093 with SDP", Internet Draft , IETF, February 21, 2002, Work in progress. 2096 [caller prefs] J. Rosenberg, "SIP Caller Preferences and Callee 2097 Capabilities",Internet Draft , 2098 IETF, November 21, 2001, Work in progress. 2100 [msg waiting] R. Mahy, I. Slain, "Message Waiting in SIP",Internet 2101 Draft , IETF, July 2001, Work 2102 in progress. 2104 [Presence] Rosenberg et al., "SIP Extensions for Presence", Internet 2105 Draft , IETF, November 21, 2001, 2106 Work in progress. 2108 [visited] D. Oran, H. Schulzrinne, "The Visited Header",Internet 2109 Draft <>, IETF, date, Work in progress. 2111 [app components] , "",Internet Draft <>, IETF, date, Work in 2112 progress. 2114 [ms-uri] J. Van Dyke, E. Burger, "SIP URI Conventions for Media 2115 Servers",Internet Draft , IETF, 2116 November 21, 2001, Work in progress. 2118 [call-pkg] J. Rosenberg, H. Schulzrinne, "SIP Event Packages for 2119 Call Leg and Conference State", Internet Draft , IETF, July 13, 2001, Work in progress. 2122 [enum] , "",Internet Draft <>, IETF, date, Work in progress. 2124 [http] R. Fielding et al, "Hypertext Transfer Protocol -- 2125 HTTP/1.1", Request for Comments (Standards Track) 2616, Internet 2126 Engineering Task Force, June 1999 2128 [rtsp] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming 2129 Protocol (RTSP)", Request for Comments (Standards Track) 2326, 2130 Internet Engineering Task Force, April 1998 2132 [mrcp] S. Shanmugham, P. Monaco, B. Eberman, "MRCP: Media Resource 2133 Control Protocol", Internet Draft , 2134 IETF, November 20, 2001, Work in progress. 2136 [VoiceXML] S. McGlashan et al, �Voice Extensible Markup Language 2137 (VoiceXML) Version 2.0�, W3C Working Draft, 23 October 2001, Work in 2138 progress. 2140 [H.323] 2142 [tel URL] 2144 [caller-prefs] 2145 SIP Multiparty Framework 2147 [session timer] 2149 [service context] 2151 [avt tones] 2153 [GSM] 2155 [MPEG2] 2157 [G.711] 2159 [H.261] 2161 [H.450] 2163 [JTAPI] 2165 [CSTA] 2167 [mrcp-sip] , "",Internet Draft , 2168 IETF, date, Work in progress. 2170 [distributed full mesh conf] 2172 [Media forking] M. Shankar, "SIP Forked Media", Internet Draft 2173 , IETF, Feb. 2001. Work in 2174 progress. 2176 [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for 2177 Remote Phone Control", Internet Draft , 2178 IETF, Jan. 2001. Work in progress. 2180 9 To Do 2182 - Add diagrams to section 4.3.1, 4.3.2, and 4.3.3 2184 - Fix references 2186 - Define some semantics for authorization rules. For example one 2187 could define a dictionary of primitives and/or perhaps define sets 2188 or classes of these primitives, then configure who is allowed to use 2189 them 2191 10 Acknowledgments 2193 Thanks to all who attended the SIP interim meeting in February 2001 2194 for their support of the ideas behind this document. 2196 11 Author's Addresses 2198 Rohan Mahy 2199 SIP Multiparty Framework 2201 Cisco Systems 2202 170 West Tasman Dr, MS: SJC-21/3/3 2203 Phone: +1 408 526 8570 2204 Email: rohan@cisco.com 2206 Ben Campbell 2207 dynamicsoft 2208 5100 Tennyson Parkway 2209 Suite 1200 2210 Plano, Texas 75024 2211 Email: bcampbell@dynamicsoft.com 2213 Alan Johnston 2214 WorldCom 2215 100 S. 4th Street 2216 St. Louis, Missouri 63104 2217 Email: alan.johnston@wcom.com 2219 Daniel G. Petrie 2220 Pingtel Corp. 2221 400 W. Cummings Park 2222 Suite 2200 2223 Woburn, MA 01801 2224 Phone: +1 781 938 5306 2225 Email: dpetrie@pingtel.com 2227 Jonathan Rosenberg 2228 dynamicsoft 2229 72 Eagle Rock Avenue 2230 First Floor 2231 East Hanover, NJ 07936 2232 Email: jdrosen@dynamicsoft.com 2234 Robert J. Sparks 2235 dynamicsoft 2236 5100 Tennyson Parkway 2237 Suite 1200 2238 Plano, TX 75024 2239 Email: rsparks@dynamicsoft.com 2241 Full Copyright Statement 2243 "Copyright (C) The Internet Society (date). All Rights Reserved. 2244 This document and translations of it may be copied and furnished to 2245 others, and derivative works that comment on or otherwise explain it 2246 or assist in its implementation may be prepared, copied, published 2247 and distributed, in whole or in part, without restriction of any 2248 kind, provided that the above copyright notice and this paragraph 2249 are included on all such copies and derivative works. However, this 2250 document itself may not be modified in any way, such as by removing 2251 the copyright notice or references to the Internet Society or other 2252 Internet organizations, except as needed for the purpose of 2253 developing Internet standards in which case the procedures for 2254 SIP Multiparty Framework 2256 copyrights defined in the Internet Standards process must be 2257 followed, or as required to translate it into languages other than 2258 English. 2260 The limited permissions granted above are perpetual and will not be 2261 revoked by the Internet Society or its successors or assigns. 2262 This document and the information contained herein is provided on an 2263 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2264 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2265 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2266 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2267 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.