idnits 2.17.1 draft-ietf-sipping-app-interaction-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 657: '... application MAY push presentation f...' RFC 2119 keyword, line 666: '... the application MAY push presentation...' RFC 2119 keyword, line 679: '... An application MUST NOT attempt to p...' RFC 2119 keyword, line 682: '...t an application MUST NOT push a user ...' RFC 2119 keyword, line 685: '... MUST NOT push a user interface comp...' (39 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 16, 2004) is 7374 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3265 (ref. '2') (Obsoleted by RFC 6665) -- Possible downref: Non-RFC (?) normative reference: ref. '3' == Outdated reference: A later version (-08) exists of draft-ietf-sipping-kpml-02 == Outdated reference: A later version (-06) exists of draft-ietf-sip-identity-01 == Outdated reference: A later version (-03) exists of draft-ietf-sip-authid-body-02 == Outdated reference: A later version (-15) exists of draft-ietf-sip-gruu-00 == Outdated reference: A later version (-05) exists of draft-ietf-sipping-conferencing-framework-01 -- Obsolete informational reference (is this intentional?): RFC 2833 (ref. '15') (Obsoleted by RFC 4733, RFC 4734) Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SIPPING J. Rosenberg 2 Internet-Draft dynamicsoft 3 Expires: August 16, 2004 February 16, 2004 5 A Framework for Application Interaction in the Session Initiation 6 Protocol (SIP) 7 draft-ietf-sipping-app-interaction-framework-01 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at http:// 24 www.ietf.org/ietf/1id-abstracts.txt. 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 This Internet-Draft will expire on August 16, 2004. 31 Copyright Notice 33 Copyright (C) The Internet Society (2004). All Rights Reserved. 35 Abstract 37 This document describes a framework for the interaction between users 38 and Session Initiation Protocol (SIP) based applications. By 39 interacting with applications, users can guide the way in which they 40 operate. The focus of this framework is stimulus signaling, which 41 allows a user agent to interact with an application without knowledge 42 of the semantics of that application. Stimulus signaling can occur to 43 a user interface running locally with the client, or to a remote user 44 interface, through media streams. Stimulus signaling encompasses a 45 wide range of mechanisms, ranging from clicking on hyperlinks, to 46 pressing buttons, to traditional Dual Tone Multi Frequency (DTMF) 47 input. In all cases, stimulus signaling is supported through the use 48 of markup languages, which play a key role in this framework. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4 54 3. A Model for Application Interaction . . . . . . . . . . . . 7 55 3.1 Functional vs. Stimulus . . . . . . . . . . . . . . . . . . 8 56 3.2 Real-Time vs. Non-Real Time . . . . . . . . . . . . . . . . 9 57 3.3 Client-Local vs. Client-Remote . . . . . . . . . . . . . . . 9 58 3.4 Presentation Capable vs. Presentation Free . . . . . . . . . 10 59 3.5 Interaction Scenarios on Telephones . . . . . . . . . . . . 11 60 3.5.1 Client Remote . . . . . . . . . . . . . . . . . . . . . . . 11 61 3.5.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . . 11 62 3.5.3 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . 12 63 4. Framework Overview . . . . . . . . . . . . . . . . . . . . . 13 64 5. Application Behavior . . . . . . . . . . . . . . . . . . . . 16 65 5.1 Client Local Interfaces . . . . . . . . . . . . . . . . . . 16 66 5.1.1 Discovering Capabilities . . . . . . . . . . . . . . . . . . 16 67 5.1.2 Pushing an Initial Interface Component . . . . . . . . . . . 16 68 5.1.3 Updating an Interface Component . . . . . . . . . . . . . . 18 69 5.1.4 Terminating an Interface Component . . . . . . . . . . . . . 18 70 5.2 Client Remote Interfaces . . . . . . . . . . . . . . . . . . 19 71 5.2.1 Originating and Terminating Applications . . . . . . . . . . 19 72 5.2.2 Intermediary Applications . . . . . . . . . . . . . . . . . 19 73 6. User Agent Behavior . . . . . . . . . . . . . . . . . . . . 21 74 6.1 Advertising Capabilities . . . . . . . . . . . . . . . . . . 21 75 6.2 Receiving User Interface Components . . . . . . . . . . . . 21 76 6.3 Mapping User Input to User Interface Components . . . . . . 23 77 6.4 Receiving Updates to User Interface Components . . . . . . . 23 78 6.5 Terminating a User Interface Component . . . . . . . . . . . 23 79 7. Inter-Application Feature Interaction . . . . . . . . . . . 25 80 7.1 Client Local UI . . . . . . . . . . . . . . . . . . . . . . 25 81 7.2 Client-Remote UI . . . . . . . . . . . . . . . . . . . . . . 26 82 8. Intra Application Feature Interaction . . . . . . . . . . . 27 83 9. Example Call Flow . . . . . . . . . . . . . . . . . . . . . 28 84 10. Security Considerations . . . . . . . . . . . . . . . . . . 33 85 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 34 86 Normative References . . . . . . . . . . . . . . . . . . . . 35 87 Informative References . . . . . . . . . . . . . . . . . . . 36 88 Author's Address . . . . . . . . . . . . . . . . . . . . . . 36 89 Intellectual Property and Copyright Statements . . . . . . . 37 91 1. Introduction 93 The Session Initiation Protocol (SIP) [1] provides the ability for 94 users to initiate, manage, and terminate communications sessions. 95 Frequently, these sessions will involve a SIP application. A SIP 96 application is defined as a program running on a SIP-based element 97 (such as a proxy or user agent) that provides some value-added 98 function to a user or system administrator. Examples of SIP 99 applications include pre-paid calling card calls, conferencing, and 100 presence-based [10] call routing. 102 In order for most applications to properly function, they need input 103 from the user to guide their operation. As an example, a pre-paid 104 calling card application requires the user to input their calling 105 card number, their PIN code, and the destination number they wish to 106 reach. The process by which a user provides input to an application 107 is called "application interaction". 109 Application interaction can be either functional or stimulus. 110 Functional interaction requires the user agent to understand the 111 semantics of the application, whereas stimulus interaction does not. 112 Stimulus signaling allows for applications to be built without 113 requiring modifications to the client. Stimulus interaction is the 114 subject of this framework. The framework provides a model for how 115 users interact with applications through user interfaces, and how 116 user interfaces and applications can be distributed throughout a 117 network. This model is then used to describe how applications can 118 instantiate and manage user interfaces. 120 2. Definitions 122 SIP Application: A SIP application is defined as a program running on 123 a SIP-based element (such as a proxy or user agent) that provides 124 some value-added function to a user or system administrator. 125 Examples of SIP applications include pre-paid calling card calls, 126 conferencing, and presence-based [10] call routing. 128 Application Interaction: The process by which a user provides input 129 to an application. 131 Real-Time Application Interaction: Application interaction that takes 132 place while an application instance is executing. For example, 133 when a user enters their PIN number into a pre-paid calling card 134 application, this is real-time application interaction. 136 Non-Real Time Application Interaction: Application interaction that 137 takes place asynchronously with the execution of the application. 138 Generally, non-real time application interaction is accomplished 139 through provisioning. 141 Functional Application Interaction: Application interaction is 142 functional when the user device has an understanding of the 143 semantics of the interaction with the application. 145 Stimulus Application Interaction: Application interaction is 146 considered to be stimulus when the user device has no 147 understanding of the semantics of the interaction with the 148 application. 150 User Interface (UI): The user interface provides the user with 151 context in order to make decisions about what they want. The user 152 enters information into the user interface. The user interface 153 interprets the information, and passes it to the application. 155 User Interface Component: A piece of user interface which operates 156 independently of other pieces of the user interface. For example, 157 a user might have two separate web interfaces to a pre-paid 158 calling card application - one for hanging up and making another 159 call, and another for entering the username and PIN. 161 User Device: The software or hardware system that the user directly 162 interacts with in order to communicate with the application. An 163 example of a user device is a telephone. Another example is a PC 164 with a web browser. 166 User Input: The "raw" information passed from a user to a user 167 interface. Examples of user input include a spoken word or a click 168 on a hyperlink. 170 Client-Local User Interface: A user interface which is co-resident 171 with the user device. 173 Client Remote User Interface: A user interface which executes 174 remotely from the user device. In this case, a standardized 175 interface is needed between the user device and the user 176 interface. Typically, this is done through media sessions - audio, 177 video, or application sharing. 179 Media Interaction: A means of separating a user and a user interface 180 by connecting them with media streams. 182 Interactive Voice Response (IVR): An IVR is a type of user interface 183 that allows users to speak commands to the application, and hear 184 responses to those commands prompting for more information. 186 Prompt-and-Collect: The basic primitive of an IVR user interface. The 187 user is presented with a voice option, and the user speaks their 188 choice. 190 Barge-In: In an IVR user interface, a user is prompted to enter some 191 information. With some prompts, the user may enter the requested 192 information before the prompt completes. In that case, the prompt 193 ceases. The act of entering the information before completion of 194 the prompt is referred to as barge-in. 196 Focus: A user interface component has focus when user input is 197 provided fed to it, as opposed to any other user interface 198 components. This is not to be confused with the term focus within 199 the SIP conferencing framework, which refers to the center user 200 agent in a conference [12]. 202 Focus Determination: The process by which the user device determines 203 which user interface component will receive the user input. 205 Focusless User Interface: A user interface which has no ability to 206 perform focus determination. An example of a focusless user 207 interface is a keypad on a telephone. 209 Presentation Capable UI: A user interface which can prompt the user 210 with input, collect results, and then prompt the user with new 211 information based on those results. 213 Presentation Free UI: A user interface which cannot prompt the user 214 with information. 216 Feature Interaction: A class of problems which result when multiple 217 applications or application components are trying to provide 218 services to a user at the same time. 220 Inter-Application Feature Interaction: Feature interactions that 221 occur between applications. 223 DTMF: Dual-Tone Multi-Frequency. DTMF refer to a class of tones 224 generated by circuit switched telephony devices when the user 225 presses a key on the keypad. As a result, DTMF and keypad input 226 are often used synonymously, when in fact one of them (DTMF) is 227 merely a means of conveying the other (the keypad input) to a 228 client-remote user interface (the switch, for example). 230 Application Instance: A single execution path of a SIP application. 232 Originating Application: A SIP application which acts as a UAC, 233 calling the user. 235 Terminating Application: A SIP application which acts as a UAS, 236 answering a call generated by a user. IVR applications are 237 terminating applications. 239 Intermediary Application: A SIP application which is neither the 240 caller or callee, but rather, a third party involved in a call. 242 3. A Model for Application Interaction 244 +---+ +---+ +---+ +---+ 245 | | | | | | | | 246 | | | U | | U | | A | 247 | | Input | s | Input | s | Results | p | 248 | | ---------> | e | ---------> | e | ----------> | p | 249 | U | | r | | r | | l | 250 | s | | | | | | i | 251 | e | | D | | I | | c | 252 | r | Output | e | Output | f | Update | a | 253 | | <--------- | v | <--------- | a | <.......... | t | 254 | | | i | | c | | i | 255 | | | c | | e | | o | 256 | | | e | | | | n | 257 | | | | | | | | 258 +---+ +---+ +---+ +---+ 260 Figure 1: Model for Real-Time Interactions 262 Figure 1 presents a general model for how users interact with 263 applications. Generally, users interact with a user interface through 264 a user device. A user device can be a telephone, or it can be a PC 265 with a web browser. Its role is to pass the user input from the user, 266 to the user interface. The user interface provides the user with 267 context in order to make decisions about what they want. The user 268 enters information into the user interface. The user interface 269 interprets the information, and passes it to the application. The 270 application may be able to modify the user interface based on this 271 information. Whether or not this is possible depends on the type of 272 user interface. 274 User interfaces are fundamentally about rendering and interpretation. 275 Rendering refers to the way in which the user is provided context. 276 This can be through hyperlinks, images, sounds, videos, text, and so 277 on. Interpretation refers to the way in which the user interface 278 takes the "raw" data provided by the user, and returns the result to 279 the application in a meaningful format, abstracted from the 280 particulars of the user interface. As an example, consider a pre-paid 281 calling card application. The user interface worries about details 282 such as what prompt the user is provided, whether the voice is male 283 or female, and so on. It is concerned with recognizing the speech 284 that the user provides, in order to obtain the desired information. 285 In this case, the desired information is the calling card number, the 286 PIN code, and the destination number. The application needs that 287 data, and it doesn't matter to the application whether it was 288 collected using a male prompt or a female one. 290 User interfaces generally have real-time requirements towards the 291 user. That is, when a user interacts with the user interface, the 292 user interface needs to react quickly, and that change needs to be 293 propagated to the user right away. However, the interface between the 294 user interface and the application need not be that fast. Faster is 295 better, but the user interface itself can frequently compensate for 296 long latencies there. In the case of a pre-paid calling card 297 application, when the user is prompted to enter their PIN, the prompt 298 should generally stop immediately once the first digit of the PIN is 299 entered. This is referred to as barge-in. After the user-interface 300 collects the rest of the PIN, it can tell the user to "please wait 301 while processing". The PIN can then be gradually transmitted to the 302 application. In this example, the user interface has compensated for 303 a slow UI to application interface by asking the user to wait. 305 The separation between user interface and application is absolutely 306 fundamental to the entire framework provided in this document. Its 307 importance cannot be overstated. 309 With this basic model, we can begin to taxonomize the types of 310 systems that can be built. 312 3.1 Functional vs. Stimulus 314 The first way to taxonomize the system is to consider the interface 315 between the UI and the application. There are two fundamentally 316 different models for this interface. In a functional interface, the 317 user interface has detailed knowledge about the application, and is, 318 in fact, specific to the application. The interface between the two 319 components is through a functional protocol, capable of representing 320 the semantics which can be exposed through the user interface. 321 Because the user interface has knowledge of the application, it can 322 be optimally designed for that application. As a result, functional 323 user interfaces are almost always the most user friendly, the 324 fastest, the and the most responsive. However, in order to allow 325 interoperability between user devices and applications, the details 326 of the functional protocols need to be specified in standards. This 327 slows down innovation and limits the scope of applications that can 328 be built. 330 An alternative is a stimulus interface. In a stimulus interface, the 331 user interface is generic, totally ignorant of the details of the 332 application. Indeed, the application may pass instructions to the 333 user interface describing how it should operate. The user interface 334 translates user input into "stimulus" - which are data understood 335 only by the application, and not by the user interface. Because they 336 are generic, and because they require communications with the 337 application in order to change the way in which they render 338 information to the user, stimulus user interfaces are usually slower, 339 less user friendly, and less responsive than a functional 340 counterpart. However, they allow for substantial innovation in 341 applications, since no standardization activity is needed to build a 342 new application, as long as it can interact with the user within the 343 confines of the user interface mechanism. The web is an example of a 344 stimulus user interface to applications. 346 In SIP systems, functional interfaces are provided by extending the 347 SIP protocol to provide the needed functionality. For example, the 348 SIP caller preferences specification [13] provides a functional 349 interface that allows a user to request applications to route the 350 call to specific types of user agents. Functional interfaces are 351 important, but are not the subject of this framework. The primary 352 goal of this framework is to address the role of stimulus interfaces 353 to SIP applications. 355 3.2 Real-Time vs. Non-Real Time 357 Application interaction systems can also be real-time or 358 non-real-time. Non-real interaction allows the user to enter 359 information about application operation in asynchronously with its 360 invocation. Frequently, this is done through provisioning systems. As 361 an example, a user can set up the forwarding number for a 362 call-forward on no-answer application using a web page. Real-time 363 interaction requires the user to interact with the application at the 364 time of its invocation. 366 3.3 Client-Local vs. Client-Remote 368 Another axis in the taxonomization is whether the user interface is 369 co-resident with the user device (which we refer to as a client-local 370 user interface), or the user interface runs in a host separated from 371 the client (which we refer to as a client-remote user interface). In 372 a client-remote user interface, there exists some kind of protocol 373 between the client device and the UI that allows the client to 374 interact with the user interface over a network. 376 The most important way to separate the UI and the client device is 377 through media interaction. In media interaction, the interface 378 between the user and the user interface is through media - audio, 379 video, messaging, and so on. This is the classic mode of operation 380 for VoiceXML [3], where the user interface (also referred to as the 381 voice browser) runs on a platform in the network. Users communicate 382 with the voice browser through the telephone network (or using a SIP 383 session). The voice browser interacts with the application using HTTP 384 to convey the information collected from the user. 386 In the case of a client-local user interface, the user interface runs 387 co-located with the user device. The interface between them is 388 through the software that interprets the users input and passes them 389 to the user interface. The classic example of this is the web. In the 390 web, the user interface is a web browser, and the interface is 391 defined by the HTML document that it's rendering. The user interacts 392 directly with the user interface running in the browser. The results 393 of that user interface are sent to the application (running on the 394 web server) using HTTP. 396 It is important to note that whether or not the user interface is 397 local, or remote (in the case of media interaction), is not a 398 property of the modality of the interface, but rather a property of 399 the system. As an example, it is possible for a web-based user 400 interface to be provided with a client-remote user interface. In such 401 a scenario, video and application sharing media sessions can be used 402 between the user and the user interface. The user interface, still 403 guided by HTML, now runs "in the network", remote from the client. 404 Similarly, a VoiceXML document can be interpreted locally by a client 405 device, with no media streams at all. Indeed, the VoiceXML document 406 can be rendered using text, rather than media, with no impact on the 407 interface between the user interface and the application. 409 It is also important to note that systems can be hybrid. In a hybrid 410 user interface, some aspects of it (usually those associated with a 411 particular modality) run locally, and others run remotely. 413 3.4 Presentation Capable vs. Presentation Free 415 A user interface can be capable of presenting information to the user 416 (a presentation capable UI), or it can be capable only of collecting 417 user input (a presentation free UI). These are very different types 418 of user interfaces. A presentation capable UI can provide the user 419 with feedback after every input, providing the context for collecting 420 the next input. As a result, presentation capable user interfaces 421 require an update to the information provided to the user after each 422 input. The web is a classic example of this. After every input (i.e., 423 a click), the browser provides the input to the application and 424 fetches the next page to render. In a presentation free user 425 interface, this is not the case. Since the user is not provided with 426 feedback, these user interfaces tend to merely collect information as 427 its entered, and pass it to the application. 429 Another difference is that a presentation-free user interface cannot 430 support the concept of a focus. As a result, if multiple applications 431 wish to gather input from the user, there is no way for the user to 432 select which application the input is destined for. The input 433 provided to applications through presentation-free user interfaces is 434 more of a broadcast or notification operation, as a result. 436 3.5 Interaction Scenarios on Telephones 438 This same model can apply to a telephone. In a traditional telephone, 439 the user interface consists of a 12-key keypad, a speaker, and a 440 microphone. Indeed, from here forward, the term "telephone" is used 441 to represent any device that meets, at a minimum, the characteristics 442 described in the previous sentence. Circuit-switched telephony 443 applications are almost universally client-remote user interfaces. In 444 the Public Switched Telephone Network (PSTN), there is usually a 445 circuit interface between the user and the user interface. The user 446 input from the keypad is conveyed used Dual-Tone Multi-Frequency 447 (DTMF), and the microphone input as Pulse Code Modulated (PCM) 448 encoded voice. 450 In an IP-based system, there is more variability in how the system 451 can be instantiated. Both client-remote and client-local user 452 interfaces to a telephone can be provided. 454 In this framework, a PSTN gateway can be considered a "user proxy". 455 It is a proxy for the user because it can provide, to a user 456 interface on an IP network, input taken from a user on a circuit 457 switched telephone. The gateway may be able to run a client-local 458 user interface, just as an IP telephone might. 460 3.5.1 Client Remote 462 The most obvious instantiation is the "classic" circuit-switched 463 telephony model. In that model, the user interface runs remotely from 464 the client. The interface between the user and the user interface is 465 through media, set up by SIP and carried over the Real Time Transport 466 Protocol (RTP) [14]. The microphone input can be carried using any 467 suitable voice encoding algorithm. The keypad input can be conveyed 468 in one of two ways. The first is to convert the keypad input to DTMF, 469 and then convey that DTMF using a suitance encoding algorithm for it 470 (such as PCMU). An alternative, and generally the preferred approach, 471 is to transmit the keypad input using RFC 2833 [15], which provides 472 an encoding mechanism for carrying keypad input within RTP. 474 In this classic model, the user interface would run on a server in 475 the IP network. It would perform speech recognition and DTMF 476 recognition to derive the user intent, feed them through the user 477 interface, and provide the result to an application. 479 3.5.2 Client Local 481 An alternative model is for the entire user interface to reside on 482 the telephone. The user interface can be a VoiceXML browser, running 483 speech recognition on the microphone input, and feeding the keypad 484 input directly into the script. As discussed above, the VoiceXML 485 script could be rendered using text instead of voice, if the 486 telephone had a textual display. 488 3.5.3 Flip-Flop 490 A middle-ground approach is to flip back and forth between a 491 client-local and client-remote user interface. Many voice 492 applications are of the type which listen to the media stream and 493 wait for some specific trigger that kicks off a more complex user 494 interaction. The long pound in a pre-paid calling card application is 495 one example. Another example is a conference recording application, 496 where the user can press a key at some point in the call to begin 497 recording. When the key is pressed, the user hears a whisper to 498 inform them that recording has started. 500 The ideal way to support such an application is to install a 501 client-local user interface component that waits for the trigger to 502 kick off the real interaction. Once the trigger is received, the 503 application connects the user to a client-remote user interface that 504 can play announements, collect more information, and so on. 506 The benefit of flip-flopping between a client-local and client-remote 507 user interface is cost. The client-local user interface will 508 eliminate the need to send media streams into the network just to 509 wait for the user to press the pound key on the keypad. 511 The Keypad Markup Language (KPML) was designed to support exactly 512 this kind of need [6]. It models the keypad on a phone, and allows an 513 application to be informed when any sequence of keys have been 514 pressed. However, KPML has no presentation component. Since user 515 interfaces generally require a response to user input, the 516 presentation will need to be done using a client-remote user 517 interface that gets instantiated as a result of the trigger. 519 It is tempting to use a hybrid model, where a prompt-and-collect 520 application is implemented by using a client-remote user interface 521 that plays the prompts, and a client-local user interface, described 522 by KPML, that collects digits. However, this only complicates the 523 application. Firstly, the keypad input will be sent to both the media 524 stream and the KPML user interface. This requires the application to 525 sort out which user inputs are duplicates, a process that is very 526 complicated. Secondly, the primary benefit of KPML is to avoid having 527 a media stream towards a user interface. However, there is already a 528 media stream for the prompting, so there is no real savings. 530 4. Framework Overview 532 In this framework, we use the term "SIP application" to refer to a 533 broad set of functionality. A SIP application is a program running on 534 a SIP-based element (such as a proxy or user agent) that provides 535 some value-added function to a user or system administrator. SIP 536 applications can execute on behalf of a caller, a called party, or a 537 multitude of users at once. 539 Each application has a number of instances that are executing at any 540 given time. An instance represents a single execution path for an 541 application. Each instance has a well defined lifecycle. It is 542 established as a result of some event. That event can be a SIP event, 543 such as the reception of a SIP INVITE request, or it can be a non-SIP 544 event, such as a web form post or even a timer. Application instances 545 also have a specific end time. Some instances have a lifetime that is 546 coupled with a SIP transaction or dialog. For example, a proxy 547 application might begin when an INVITE arrives, and terminate when 548 the call is answered. Other applications have a lifetime that spans 549 multiple dialogs or transactions. For example, a conferencing 550 application instance may exist so long as there are any dialogs 551 connected to it. When the last dialog terminates, the application 552 instance terminates. Other applications have a liftime that is 553 completely decoupled from SIP events. 555 It is fundamental to the framework described here that multiple 556 application instances may interact with a user during a single SIP 557 transaction or dialog. Each instance may be for the same application, 558 or different applications. Each of the applications may be completely 559 independent, in that they may be owned by different providers, and 560 may not be aware of each others existence. Similarly, there may be 561 application instances interacting with the caller, and instances 562 interacting with the callee, both within the same transaction or 563 dialog. 565 The first step in the interaction with the user is to instantiate one 566 of more user interface components for the application instance. A 567 user interface component is a single piece of the user interface that 568 is defined by a logical flow that is not synchronously coupled with 569 any other component. In other words, each component runs more or less 570 independently. 572 A user interface component can be instantiated in one of the user 573 agents in a dialog (for a client-local user interface), or within a 574 network element (for a client-remote user interface). If a 575 client-local user interface is to be used, the application needs to 576 determine whether or not the user agent is capable of supporting a 577 client-local user interface, and in what format. In this framework, 578 all client-local user interface components are described by a markup 579 language. A markup language describes a logical flow of presentation 580 of information to the user, collection of information from the user, 581 and transmission of that information to an application. Examples of 582 markup languages include HTML, WML, VoiceXML, and the Keypad Markup 583 Language (KPML) [6]. 585 Unlike an application instance, which has very flexible lifetimes, a 586 user interface component has a very fixed lifetime. A user interface 587 component is always associated with a dialog. The user interface 588 component can be created at any point after the dialog (or early 589 dialog) is created. However, the user interface component terminates 590 when the dialog terminates. The user interface component can be 591 terminated earlier by the user agent, and possibly by the 592 application, but its lifetime never exceeds that of its associated 593 dialog. 595 There are two ways to create a client local interface component. For 596 interface components that are presentation capable, the application 597 sends a REFER [5] request to the user agent. The Refer-To header 598 field contains an HTTP URI that points to the markup for the user 599 interface. For interface components that are presentation free (such 600 as those defined by KPML), the application sends a SUBSCRIBE request 601 to the user agent. The body of the SUBSCRIBE request contains a 602 filter, which, in this case, is the markup that defines when 603 information is to be sent to the application in a NOTIFY. 605 If a user interface component is to be instantiated in the network, 606 there is no need to determine the capabilities of the device on which 607 the user interface is instantiated. Presumably, it is on a device on 608 which the application knows a UI can be created. However, the 609 application does need to connect the user device to the user 610 interface. This will require manipulation of media streams in order 611 to establish that connection. 613 The interface between the user interface component and the 614 application depends on the type of user interface. For presentation 615 capable user interfaces, such as those described by HTML and 616 VoiceXML, HTTP form POST operations are used. For presentation free 617 user interfaces, a SIP NOTIFY is used. The differing needs and 618 capabilities of these two user interfaces, as described in Section 619 3.4, is what drives the different choices for the interactions. Since 620 presentation capable user interfaces require an update to the 621 presentation every time user data is entered, they are a good match 622 for HTTP. Since presentation free user interfaces merely transmit 623 user input to the application, a NOTIFY is more appropriate. 625 Indeed, for presentation free user interfaces, there are two 626 different modalities of operation. The first is called "one shot". In 627 the one-shot role, the markup waits for a user to enter some 628 information, and when they do, reports this event to the application. 629 The application then does something, and the markup is no longer 630 used. In the other modality, called "monitor", the markup stays 631 permanently resident, and reports information back to an application 632 until termination of the associated dialog. 634 5. Application Behavior 636 The behavior of an application within this framework depends on 637 whether it seeks to use a client-local or client-remote user 638 interface. 640 5.1 Client Local Interfaces 642 One key component of this framework is support for client local user 643 interfaces. 645 5.1.1 Discovering Capabilities 647 A client local user interface can only be instantiated on a user 648 agent if the user agent supports that type of user interface 649 component. Support for client local user interface components is 650 declared by both the UAC and a UAS in its Accept, Allow, Contact and 651 Allow-Event header fields of dialog-initiating requests and 652 responses. If the Allow header field indicates support for the SIP 653 SUBSCRIBE method, and the Allow-Event header field indicates support 654 for the kpml package [6], and the Contact header field indicates that 655 its URI is a GRUU [9] it means that the UA can instantiate 656 presentation free user interface components. In this case, the 657 application MAY push presentation free user interface components 658 according to the rules of Section 5.1.2. The specific markup 659 languages that can be supported are indicated in the Accept header 660 field. 662 If the Allow header field indicates support for the SIP REFER method, 663 and the Contact header field contains UA capabilities [4] that 664 indicate support for the HTTP URI scheme, it means that the UA 665 supports presentation capable user interface components. In this 666 case, the application MAY push presentation capable user interface 667 components to the client according to the rules of Section 5.1.2. The 668 specific markups that are supported are indicated in the Accept 669 header field. 671 5.1.2 Pushing an Initial Interface Component 673 Generally, we anticipate that interface components will need to be 674 created at various different points in a SIP session. Clearly, they 675 will need to be pushed during session setup, or after the session is 676 established. A user interface component is always associated with a 677 specific dialog, however. 679 An application MUST NOT attempt to push a user interface component to 680 a user agent until it has determined that the user agent has the 681 neccesary capabilities and a dialog has been created. In the case of 682 a UAC, this means that an application MUST NOT push a user interface 683 component for an INVITE initiated dialog until the application has 684 seen a 200 OK followed by an ACK. For SUBSCRIBE initiated dialogs, it 685 MUST NOT push a user interface component until the application has 686 seen a 200 OK to the NOTIFY request. For a user interface component 687 on a UAS, the application MUST NOT push a user interface component 688 for an INVITE initiated dialog until it has seen a 200 OK from the 689 UAS. For a SUBSCRIBE initiated dialog, it MUST NOT push a user 690 interface component until it has seen a NOTIFY request from the 691 notifier. 693 To create a presentation capable UI component on the UA, the 694 application sends a REFER request to the UA. This REFER MUST be sent 695 to the Globally Routable UA URI (GRUU) [9] advertised by that UA in 696 the Contact header field of the dialog initiating request or response 697 sent by that UA. Note that this REFER request creates a separate 698 dialog between the application and the UA. The Refer-To header field 699 of the REFER request MUST contain an HTTP URI that references the 700 markup document to be fetched. 702 OPEN ISSUE: The refer needs to provide a context to the UA, and in 703 particular, identify the specific dialog that this component is 704 associated with. There is no obvious candidate for this when REFER 705 is used. The former proposal, of using a grid, cannot work because 706 of forking. 708 To create a presentation free user interface component, the 709 application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST 710 be sent to the GRUU advertised by the UA. This SUBSCRIBE request 711 creates a separate dialog. The SUBSCRIBE request MUST use the KPML 712 [6] event package. The Event header field MUST contain parameters 713 which identify the particular dialog that the interface component is 714 being instantiated against. The body of the SUBSCRIBE request 715 contains the markup document that defines the conditions under which 716 the application wishes to be notified of user input. 718 In both cases, the REFER or SUBSCRIBE request SHOULD include a 719 display name in the From header field which identifies the name of 720 the application. For example, a prepaid calling card might include a 721 From header field which looks like: 723 From: "Prepaid Calling Card" 725 To authenticate themselves, it is RECOMMENDED that applications use 726 the SIP identity mechanism [7] in the REFER or SUBSCRIBE requests 727 they generate. This mechanism has the benefit that the signature is 728 over an authenticated identity body [8], which includes the From 729 header field. As such, the client can obtain cryptographic assurances 730 about the service provider (the domain in the From header field) 731 along with the name of the application. 733 5.1.3 Updating an Interface Component 735 Once a user interface component has been created on a client, it can 736 be updated. The means for updating it depends on the type of UI 737 component. 739 Presentation capable UI components are updated using techniques 740 already in place for those markups. In particular, user input will 741 cause an HTTP POST operation to push the user input to the 742 application. The result of the POST operation is a new markup that 743 the UI is supposed to use. This allows the UI to updated in response 744 to user action. Some markups, such as HTML, provide the ability to 745 force a refresh after a certain period of time, so that the UI can be 746 updated without user input. Those mechanisms can be used here as 747 well. However, there is no support for an asynchronous push of an 748 updated UI component from the appliciation to the user agent. A new 749 REFER request to the same GRUU would create a new UI component rather 750 than updating any components already in place. 752 For presentation free UI, the story is different. The application MAY 753 update the filter at any time by generating a SUBSCRIBE refresh with 754 the new filter. The UA will immediately begin using this new filter. 756 5.1.4 Terminating an Interface Component 758 User interface components have a well defined lifetime. They are 759 created when the component is first pushed to the client. User 760 interface components are always associated with the SIP dialog on 761 which they were pushed. As such, their lifetime is bound by the 762 lifetime of the dialog. When the dialog ends, so does the interface 763 component. 765 However, there are some cases where the application would like to 766 terminate the user interface component before its natural termination 767 point. For presentation capable user interfaces, this is not 768 possible. For presentation free user interfaces, the application MAY 769 terminate the component by sending a SUBSCRIBE with Expires equal to 770 zero. This terminates the subscription, which removes the UI 771 component. 773 A client can remove a UI component at any time. For presentation 774 aware UI, this is analagous to the user dismissing the web form 775 window. There is no mechanism provided for reporting this kind of 776 event to the application. The applicatio MUST be prepared to time 777 out, and never receive input from a user. For presentation free user 778 interfaces, the UA can explicitly terminate the subscription. This 779 will result in the generation of a NOTIFY with a Subscription-State 780 header field equal to "terminated". 782 5.2 Client Remote Interfaces 784 As an alternative to, or in conjunction with client local user 785 interfaces, an application can make use of client remote user 786 interfaces. These user interfaces can execute co-resident with the 787 application itself (in which case no standardized interfaces between 788 the UI and the application need to be used), or it can run 789 separately. This framework assumes that the user interface runs on a 790 host that has a sufficient trust relationship with the application. 791 As such, the means for instantiating the user interface is not 792 considered here. 794 The primary issue is to connect the user device to the remote user 795 interface. Doing so requires the manipulation of media streams 796 between the client and the user interface. Such manipulation can only 797 be done by user agents. There are two types of user agent 798 applications within this framework - originating/terminating 799 applications, and intermediary applications. 801 5.2.1 Originating and Terminating Applications 803 Originating and terminating applications are applications which are 804 themselves the originator or the final recipient of a SIP invitation. 805 They are "pure" user agent applications - not back-to-back user 806 agents. The classic example of such an application is an interactive 807 voice response (IVR) application, which is typically a terminating 808 application. Its a terminating application because the user 809 explicitly calls it; i.e., it is the actual called party. An example 810 of an originating application is a wakeup call application, which 811 calls a user at a specified time in order to wake them up. 813 Because originating and terminating applications are a natural 814 termination point of the dialog, manipulation of the media session by 815 the application is trivial. Traditional SIP techniques for adding and 816 removing media streams, modifying codecs, and changing the address of 817 the recipient of the media streams, can be applied. Similarly, the 818 application can directly authenticate itself to the user through S/ 819 MIME, since it is the peer UA in the dialog. 821 5.2.2 Intermediary Applications 823 Intermediary applications are, at the same time, more common than 824 originating/terminating applications, and more complex. Intermediary 825 applications are applications that are neither the actual caller or 826 called party. Rather, they represent a "third party" that wishes to 827 interact with the user. The classic example is the ubiquitous 828 pre-paid calling card application. 830 In order for the intermediary application to add a client remote user 831 interface, it needs to manipulate the media streams of the user agent 832 to terminate on that user interface. This also introduces a 833 fundamental feature interaction issue. Since the intermediary 834 application is not an actual participant in the call, how does the 835 user interact with the intermediary application, and its actual peer 836 in the dialog, at the same time? This is discussed in more detail in 837 Section 7. 839 6. User Agent Behavior 841 6.1 Advertising Capabilities 843 In order to participate in applications that make use of stimulus 844 interfaces, a user agent needs to advertise its interaction 845 capabilities. 847 If a user agent supports presentation capable user interfaces, it 848 MUST support the REFER method. It MUST include, in all dialog 849 initiating requests and responses, an Allow header field that 850 includes the REFER method. Furthermore, the UA MUST support the SIP 851 user agent capabilities specification [4]. The UA MUST be capable of 852 being REFER'd to an HTTP URI. It MUST include, in the Contact header 853 field of its dialog initiating requests and responses, a "schemes" 854 Contact header field parameter include the http URI scheme. The UA 855 MUST include, in all dialog initiating requests and responses, an 856 Accept header field listing all of those markups supported by the UA. 857 It is RECOMMENDED that all user agents that support presentation 858 capable user interfaces support HTML. 860 If a user agent supports presentation free user interfaces, it MUST 861 support the SUBSCRIBE [2] method. It MUST support the KPML [6] event 862 package. It MUST include, in all dialog initiating requests and 863 responses, an Allow header field that includes the SUBSCRIBE method. 864 It MUST include, in all dialog initiating requests and responses, an 865 Allow-Events header field that lists the KPML event package. The UA 866 MUST include, in all dialog initiating requests and responses, an 867 Accept header field listing those event filters it supports. At a 868 minimum, a UA MUST support the "application/kpml+xml" MIME type. 870 For either presentation free or presentation capable user interfaces, 871 the user agent MUST support the GRUU [9] specification. The Contact 872 header field in all dialog initiating requests and responses MUST 873 contain a GRUU. The UA MUST include a Supported header field which 874 contains the gruu option tag. 876 Because these headers are examined by proxies which may be executing 877 applications, a UA that wishes to support client local user 878 interfaces should not encrypt them. 880 6.2 Receiving User Interface Components 882 Once the UA has created a dialog (in either the early or confirmed 883 states), it MUST be prepared to receive a SUBSCRIBE or REFER request 884 against its GRUU. If the UA receives such a request prior to the 885 establishment of a dialog, the UA MUST reject the request. 887 A user agent SHOULD attempt to authenticate the sender of the 888 request. The sender will generally be an application, and therefore 889 the user agent is unlikely to ever have a shared secret with it, 890 making digest authentication useless. However, the REFER or SUBSCRIBE 891 request should have a SIP authenticated identity body [8] that 892 conveys the identity of the application [7]. If such a body is not 893 present, and no alternative means of identification (such as 894 P-Asserted-ID [11]) is present, the user agent MAY reject the request 895 with a 403 response. 897 Next, the user agent authorizes the application. An application is 898 authorized to instantiate a user interface component if the 899 application was resident within an element on the path of the dialog 900 initiating request. An application proves to the user agent that it 901 was on the path by presenting it with the dialog identifiers in the 902 SUBSCRIBE or REFER request. In the case of SUBSCRIBE, those 903 identifiers are present in the Event header field [6]. [[EDITORS 904 NOTE: Fill in here once we know how this is done for REFER.]] 906 Because of the dialog identifiers serve as a tool for authorization, 907 a user agent compliant to this framework MUST use dialog identifiers 908 that are cryptographically random, with at least 128 bits of 909 randomness. It is recommended that this randomness be split between 910 the Call-ID and From header field tag in the case of a UAC. 912 Furthermore, to ensure that only applications resident in on-path 913 elements can instantiate a user interface component, a user agent 914 compliant to this specification SHOULD use the sips URI scheme for 915 all dialogs it initiates. This will guarantee secure links between 916 all of the elements on the signaling path. 918 If an application does not present a valid dialog identifier in its 919 REFER or SUBSCRIBE request, the user agent MUST reject the request 920 with a 403 response. A user agent MAY apply any other policies in 921 addition to (but not instead of) the ones specified here in order to 922 authorize the creation of the user interface component. One such 923 mechanism would be to prompt the user, informing them of the identity 924 of the application. If an authorization policy requires user 925 interaction, the user agent SHOULD respond to the SUBSCRIBE or REFER 926 request with a 202. In the case of SUBSCRIBE, if authorization is not 927 granted, the user agent SHOULD generate a NOTIFY to terminate the 928 subscription. In the case of REFER, the user agent MUST NOT act upon 929 the URI in the Refer-To header field until user authorization was 930 obtained. 932 If a REFER request to an HTTP URI was authorized, the UA executes the 933 URI and fetches the content to be rendered to the user. This 934 instantiates a presentation capable user interface component. If a 935 SUBSCRIBE was authorized, a presentation free user interface 936 component was instantiated. 938 6.3 Mapping User Input to User Interface Components 940 Once the user interface components are instantiated, the user agent 941 must direct user input to the appropriate component. In the case of 942 presentation capable user interfaces, this process is known as focus 943 selection. It is done by means that are specific to the user 944 interface on the device. In the case of a PC, for example, the window 945 manager would allow the user to select the appropriate user interface 946 component that their input is directed to. 948 For presentation free user interfaces, the situation is more 949 complicated. In some cases, the device may support a mechanism that 950 allows the user to select a "line", and thus the associated dialog. 951 Any user input on the keypad while this line is selected are fed to 952 the user interface components associated with that dialog. 954 TODO: Need to consider the case where the user interface is 955 co-resident with the UAC, but the user device is separated from 956 the UAC, and occurs through some other protocol, and the user 957 interface and application are semi-trusted. Classic case is when 958 the UAC is a PSTN gateway. 960 6.4 Receiving Updates to User Interface Components 962 For presentation capable user interfaces, updates to the user 963 interface occur in ways specific to that user interface component. In 964 the case of HTML, for example, the document can tell the client to 965 fetch a new document periodically. However, this framework does not 966 provide any additional machinery to asynchronously push a new user 967 interface component to the client. 969 For presentation free user interfaces, an application can push an 970 update to a component by sending a SUBSCRIBE refresh with a new 971 filter. The user agent will process these according to the rules of 972 the event package. 974 6.5 Terminating a User Interface Component 976 Termination of a presentation capable user interface component is a 977 trivial procedure. The user agent merely dismisses the window (or 978 equivalent). The fact that the component is dismissed is not 979 communicated to the application. As such, it is purely a local 980 matter. 982 In the case of a presentation free user interface, if the user wishes 983 to cease interacting with the application, it SHOULD generate a 984 NOTIFY request with a Subscription-State equal to "terminated" and a 985 reason of "rejected". This tells the application that the component 986 has been removed, and that it should not attempt to re-subscribe. 988 7. Inter-Application Feature Interaction 990 The inter-application feature interaction problem is inherent to 991 stimulus signaling. Whenever there are multiple applications, there 992 are multiple user interfaces. When the user provides an input, to 993 which user interface is the input destined? That question is the 994 essence of the inter-application feature interaction problem. 996 Inter-application feature interaction is not an easy problem to 997 resolve. For now, we consider separately the issues for client-local 998 and client-remote user interface components. 1000 7.1 Client Local UI 1002 When the user interface itself resides locally on the client device, 1003 the feature interaction problem is actually much simpler. The end 1004 device knows explicitly about each application, and therefore can 1005 present the user with each one separately. When the user provides 1006 input, the client device can determine to which user interface the 1007 input is destined. The user interface to which input is destined is 1008 referred to as the application in focus, and the means by which the 1009 focused application is selected is called focus determination. 1011 Generally speaking, focus determination is purely a local operation. 1012 In the PC universe, focus determination is provided by window 1013 managers. Each application does not know about focus, it merely 1014 receives the user input that has been targeted to it when its in 1015 focus. This basic concept applies to SIP-based applications as well. 1017 Focus determination will frequently be trivial, depending on the user 1018 interface type. Consider a user that makes a call from a PC. The call 1019 passes through a pre-paid calling card application, and a call 1020 recording application. Both of these wish to interact with the user. 1021 Both push an HTML-based user interface to the user. On the PC, each 1022 user interface would appear as a separate window. The user interacts 1023 with the call recording application by selecting its window, and with 1024 the pre-paid calling card application by selecting its window. Focus 1025 determination is literally provided by the PC window manager. It is 1026 clear to which application the user input is targeted. 1028 As another example, consider the same two applications, but on a 1029 "smart phone" that has a set of buttons, and next to each button, an 1030 LCD display that can provide the user with an option. This user 1031 interface can be represented using the Wireless Markup Language 1032 (WML). 1034 The phone would allocate some number of buttons to each application. 1035 The prepaid calling card would get one button for its "hangup" 1036 command, and the recording application would get one for its "start/ 1037 stop" command. The user can easily determine which application to 1038 interact with by pressing the appropriate button. Pressing a button 1039 determines focus and provides user input, both at the same time. 1041 Unfortunately, not all devices will have these advanced displays. A 1042 PSTN gateway, or a basic IP telephone, may only have a 12-key keypad. 1043 The user interfaces for these devices are provided through the Keypad 1044 Markup Language (KPML). Considering once again the feature 1045 interaction case above, the pre-paid calling card application and the 1046 call recording application would both pass a KPML document to the 1047 device. When the user presses a button on the keypad, to which 1048 document does the input apply? The user interface does not allow the 1049 user to select. A user interface where the user cannot provide focus 1050 is called a focusless user interface. This is quite a hard problem to 1051 solve. This framework does not make any explicit normative 1052 recommendation, but concludes that the best option is to send the 1053 input to both user interfaces unless the markup in one interface has 1054 indicated that it should be suppressed from others. This is a 1055 sensible choice by analogy - its exactly what the existing circuit 1056 switched telephone network will do. It is an explicit non-goal to 1057 provide a better mechanism for feature interaction resolution than 1058 the PSTN on devices which have the same user interface as they do on 1059 the PSTN. Devices with better displays, such as PCs or screen phones, 1060 can benefit from the capabilities of this framework, allowing the 1061 user to determine which application they are interacting with. 1063 Indeed, when a user provides input on a focusless device, the input 1064 must be passed to all client local user interfaces, AND all client 1065 remote user interfaces, unless the markup tells the UI to suppress 1066 the media. In the case of KPML, key events are passed to remote user 1067 interfaces by encoding them in RFC 2833 [15]. Of course, since a 1068 client cannot determine if a media stream terminates in a remote user 1069 interface or not, these key events are passed in all audio media 1070 streams unless the "Q" digit is used to suppress. 1072 7.2 Client-Remote UI 1074 When the user interfaces run remotely, the determination of focus can 1075 be much, much harder. There are many architectures that can be 1076 deployed to handle the interaction. None are ideal. However, all are 1077 beyond the scope of this specification. 1079 8. Intra Application Feature Interaction 1081 An application can instantiate a multiplicity of user interface 1082 components. For example, a single application can instantiate two 1083 separate HTML components and one WML component. Furthermore, an 1084 application can instantiate both client local and client remote user 1085 interfaces. 1087 The feature interaction issues between these components within the 1088 same application are less severe. If an application has multiple 1089 client user interface components, their interaction is resolved 1090 identically to the inter-application case - through focus 1091 determination. However, the problems in focusless user interfaces 1092 (such as a keypad) generally won't exist, since the application can 1093 generate user interfaces which do not overlap in their usage of an 1094 input. 1096 The real issue is that the optimal user experience frequently 1097 requires some kind of coupling between the differing user interface 1098 components. This is a classic problem in multi-modal user interfaces, 1099 such as those described by Speech Application Language Tags (SALT). 1100 As an example, consider a user interface where a user can either 1101 press a labeled button to make a selection, or listen to a prompt, 1102 and speak the desired selection. Ideally, when the user presses the 1103 button, the prompt should cease immediately, since both of them were 1104 targeted at collecting the same information in parallel. Such 1105 interactions are best handled by markups which natively support such 1106 interactions, such as SALT, and thus require no explicit support from 1107 this framework. 1109 9. Example Call Flow 1111 This section shows the operation of a call recording application. 1112 This application allows a user to record the media in their call by 1113 clicking on a button in a web form. The application uses a 1114 presentation capable user interface component that is pushed to the 1115 caller. 1117 A Recording App B 1118 |(1) INVITE | | 1119 |----------------------->| | 1120 | |(2) INVITE | 1121 | |----------------------->| 1122 | |(3) 200 OK | 1123 | |<-----------------------| 1124 |(4) 200 OK | | 1125 |<-----------------------| | 1126 |(5) ACK | | 1127 |----------------------->| | 1128 | |(6) ACK | 1129 | |----------------------->| 1130 |(7) REFER | | 1131 |<-----------------------| | 1132 |(8) 200 OK | | 1133 |----------------------->| | 1134 |(9) NOTIFY | | 1135 |----------------------->| | 1136 |(10) 200 OK | | 1137 |<-----------------------| | 1138 |(11) HTTP GET | | 1139 |----------------------->| | 1140 |(12) 200 OK | | 1141 |<-----------------------| | 1142 |(13) HTTP POST | | 1143 |----------------------->| | 1144 |(14) 200 OK | | 1145 |<-----------------------| | 1147 Figure 3 1149 First, the caller, A, sends an INVITE to setup a call (message 1). 1150 Since the caller supports the framework, and can handle presentation 1151 capable user interface components, it includes the Supported header 1152 field indicating the GRUU is understood, Allow indicating that REFER 1153 is understood, and a Contact header field that includes the "schemes" 1154 header field parameter. 1156 INVITE sip:B@example.com SIP/2.0 1157 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8 1158 From: Caller ;tag=kkaz- 1159 To: Callee 1160 Call-ID: faif9a@host.example.com 1161 CSeq: 1 INVITE 1162 Supported: gruu 1163 Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER 1164 Contact: ;schemes="http,sip" 1165 Content-Length: ... 1166 Content-Type: application/sdp 1168 --SDP not shown-- 1170 The proxy acts as a recording server, and forwards the INVITE to the 1171 called party (message 2): 1173 INVITE sip:B@pc.example.com SIP/2.0 1174 Record-Route: 1175 Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh 1176 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8 1177 From: Caller ;tag=kkaz- 1178 To: Callee 1179 Call-ID: faif9a@host.example.com 1180 CSeq: 1 INVITE 1181 Supported: gruu 1182 Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER 1183 Contact: ;schemes="http,sip" 1184 Content-Length: ... 1185 Content-Type: application/sdp 1187 --SDP not shown-- 1189 B accepts the call with a 200 OK (message 3). It does not support the 1190 framework, and so the various header fields are not present. 1192 SIP/2.0 200 OK 1193 Record-Route: 1194 Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh 1195 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8 1196 From: Caller ;tag=kkaz- 1197 To: Callee ;tag=7777 1198 Call-ID: faif9a@host.example.com 1199 CSeq: 1 INVITE 1200 Contact: 1201 Content-Length: ... 1203 Content-Type: application/sdp 1205 --SDP not shown-- 1207 This 200 OK is passed back to the caller (message 4): 1209 SIP/2.0 200 OK 1210 Record-Route: 1211 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8 1212 From: Caller ;tag=kkaz- 1213 To: Callee ;tag=7777 1214 Call-ID: faif9a@host.example.com 1215 CSeq: 1 INVITE 1216 Contact: 1217 Content-Length: ... 1218 Content-Type: application/sdp 1220 --SDP not shown-- 1222 The caller generates an ACK (message 5). 1224 ACK sip:B@pc.example.com 1225 Route: 1226 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9 1227 From: Caller ;tag=kkaz- 1228 To: Callee ;tag=7777 1229 Call-ID: faif9a@host.example.com 1230 CSeq: 1 ACK 1232 The ACK is forwarded to the called party (message 6). 1234 ACK sip:B@pc.example.com 1235 Via: SIP/2.0/UDP app.example.com;branch=z9hG4bKh7s 1236 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9 1237 From: Caller ;tag=kkaz- 1238 To: Callee ;tag=7777 1239 Call-ID: faif9a@host.example.com 1240 CSeq: 1 ACK 1242 Now, the application decides to push a user interface component to 1243 user A. So, it sends it a REFER request (message 7): 1245 REFER sip:bad998asd8asd0000a@example.com SIP/2.0 1246 Refer-To: http://app.example.com/script.pl 1247 Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6 1248 From: Recorder Application ;tag=jhgf 1249 To: Caller 1250 Call-ID: 66676776767@app.example.com 1251 CSeq: 1 REFER 1252 Event: refer 1253 Contact: 1255 The REFER is answered by a 200 OK (message 8). 1257 SIP/2.0 200 OK 1258 Refer-To: http://app.example.com/script.pl 1259 Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6 1260 From: Recorder Application ;tag=jhgf 1261 To: Caller ;tag=pqoew 1262 Call-ID: 66676776767@app.example.com 1263 Supported: gruu 1264 Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER 1265 Contact: ;schemes="http,sip" 1266 CSeq: 1 REFER 1268 User A sends a NOTIFY (message 9): 1270 NOTIFY sip:app.example.com SIP/2.0 1271 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995 1272 To: Recorder Application ;tag=jhgf 1273 From: Caller ;tag=pqoew 1274 Call-ID: 66676776767@app.example.com 1275 CSeq: 1 NOTIFY 1276 Max-Forwards: 70 1277 Event: refer;id=93809824 1278 Subscription-State: active;expires=3600 1279 Contact: ;schemes="http,sip" 1280 Content-Type: message/sipfrag;version=2.0 1281 Content-Length: 20 1283 SIP/2.0 100 Trying 1285 And the recording server responds with a 200 OK (message 10) 1287 SIP/2.0 200 OK 1288 Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995 1289 To: Recorder Application ;tag=jhgf 1290 From: Caller ;tag=pqoew 1291 Call-ID: 66676776767@app.example.com 1292 CSeq: 1 NOTIFY 1294 The caller, A, authorizes the application. It then acts on the 1295 Refer-To URI, fetching the script from app.example.com (message 11). 1296 The response, message 12, contains a web application that the user 1297 can click on to enable recording. When the user clicks on the link 1298 (message 13), the results are posted to the server, and an updated 1299 display is provided (message 14). 1301 10. Security Considerations 1303 There are many security considerations associated with this 1304 framework. It allows applications in the network to instantiate user 1305 interface components on a client device. Such instantiations need to 1306 be from authenticated applications, and also need to be authorized to 1307 place a UI into the client. Indeed, the stronger requirement is 1308 authorization. It is not so important to know that name of the 1309 provider of the application, but rather, that the provider is 1310 authorized to instantiate components. 1312 Generally, an application should be considered authorized if it was 1313 an application that was legitimately part of the call setup path. 1314 With this definition, authorization can be enforced using the sips 1315 URI scheme when the call is initiated. 1317 11. Contributors 1319 This document was produced as a result of discussions amongst the 1320 application interaction design team. All members of this team 1321 contributed significantly to the ideas embodied in this document. The 1322 members of this team were: 1324 Eric Burger 1325 Cullen Jennings 1326 Robert Fairlie-Cuninghame 1328 Normative References 1330 [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1331 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 1332 Session Initiation Protocol", RFC 3261, June 2002. 1334 [2] Roach, A., "Session Initiation Protocol (SIP)-Specific Event 1335 Notification", RFC 3265, June 2002. 1337 [3] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D., 1338 Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup 1339 Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220, 1340 February 2003. 1342 [4] Rosenberg, J., "Indicating User Agent Capabilities in the 1343 Session Initiation Protocol (SIP)", 1344 draft-ietf-sip-callee-caps-03 (work in progress), January 2004. 1346 [5] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1347 Method", RFC 3515, April 2003. 1349 [6] Burger, E., "Keypad Stimulus Protocol (KPML)", 1350 draft-ietf-sipping-kpml-02 (work in progress), February 2004. 1352 [7] Peterson, J., "Enhancements for Authenticated Identity 1353 Management in the Session Initiation Protocol (SIP)", 1354 draft-ietf-sip-identity-01 (work in progress), March 2003. 1356 [8] Peterson, J., "SIP Authenticated Identity Body (AIB) Format", 1357 draft-ietf-sip-authid-body-02 (work in progress), July 2003. 1359 [9] Rosenberg, J., "Obtaining and Using Globally Routable User Agent 1360 (UA) URIs (GRUU) in the Session Initiation Protocol (SIP)", 1361 draft-ietf-sip-gruu-00 (work in progress), January 2004. 1363 Informative References 1365 [10] Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and 1366 Instant Messaging", RFC 2778, February 2000. 1368 [11] Jennings, C., Peterson, J. and M. Watson, "Private Extensions 1369 to the Session Initiation Protocol (SIP) for Asserted Identity 1370 within Trusted Networks", RFC 3325, November 2002. 1372 [12] Rosenberg, J., "A Framework for Conferencing with the Session 1373 Initiation Protocol", 1374 draft-ietf-sipping-conferencing-framework-01 (work in 1375 progress), October 2003. 1377 [13] Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller 1378 Preferences for the Session Initiation Protocol (SIP)", 1379 draft-ietf-sip-callerprefs-10 (work in progress), October 2003. 1381 [14] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, 1382 "RTP: A Transport Protocol for Real-Time Applications", RFC 1383 3550, July 2003. 1385 [15] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, 1386 Telephony Tones and Telephony Signals", RFC 2833, May 2000. 1388 Author's Address 1390 Jonathan Rosenberg 1391 dynamicsoft 1392 600 Lanidex Plaza 1393 Parsippany, NJ 07054 1394 US 1396 Phone: +1 973 952-5000 1397 EMail: jdrosen@dynamicsoft.com 1398 URI: http://www.jdrosen.net 1400 Intellectual Property Statement 1402 The IETF takes no position regarding the validity or scope of any 1403 intellectual property or other rights that might be claimed to 1404 pertain to the implementation or use of the technology described in 1405 this document or the extent to which any license under such rights 1406 might or might not be available; neither does it represent that it 1407 has made any effort to identify any such rights. Information on the 1408 IETF's procedures with respect to rights in standards-track and 1409 standards-related documentation can be found in BCP-11. Copies of 1410 claims of rights made available for publication and any assurances of 1411 licenses to be made available, or the result of an attempt made to 1412 obtain a general license or permission for the use of such 1413 proprietary rights by implementors or users of this specification can 1414 be obtained from the IETF Secretariat. 1416 The IETF invites any interested party to bring to its attention any 1417 copyrights, patents or patent applications, or other proprietary 1418 rights which may cover technology that may be required to practice 1419 this standard. Please address the information to the IETF Executive 1420 Director. 1422 Full Copyright Statement 1424 Copyright (C) The Internet Society (2004). All Rights Reserved. 1426 This document and translations of it may be copied and furnished to 1427 others, and derivative works that comment on or otherwise explain it 1428 or assist in its implementation may be prepared, copied, published 1429 and distributed, in whole or in part, without restriction of any 1430 kind, provided that the above copyright notice and this paragraph are 1431 included on all such copies and derivative works. However, this 1432 document itself may not be modified in any way, such as by removing 1433 the copyright notice or references to the Internet Society or other 1434 Internet organizations, except as needed for the purpose of 1435 developing Internet standards in which case the procedures for 1436 copyrights defined in the Internet Standards process must be 1437 followed, or as required to translate it into languages other than 1438 English. 1440 The limited permissions granted above are perpetual and will not be 1441 revoked by the Internet Society or its successors or assignees. 1443 This document and the information contained herein is provided on an 1444 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1445 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1446 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1447 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1448 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1450 Acknowledgment 1452 Funding for the RFC Editor function is currently provided by the 1453 Internet Society.