Internet Draft Robert Fairlie-Cuninghame draft-fairlie-sipping-netapp-session-00.txt Nuera Communications, Inc. June 6, 2002 Expires: November, 2002 Network Application Session Framework Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This particular draft is intended to be discussed in the SIPPING Working Group. Discussion of it therefore belongs on that list. The charter for SIPPING working group may be found at http://www.ietf.org/html.charters/sipping-charter.html Abstract This document defines a framework for using Session Initiation Protocol sessions to create user interface components for network application interaction. These Network Applications Sessions provide a number of services to the application server and user agent including application identification, co-operation, association, and authentication. In addition to these services the framework allows for differing modes of interaction to be managed in a homogenous manner with similar expectations for security, establishment, update, closure and the handling of certain user interface operations. R. Fairle-Cuninghame [Page 1] Internet Draft Network Application Session Framework June 6, 2002 1. Motivation Today most network-based applications use only a single mode of user interaction, for instance, web/HTML-based applications, VoiceXML systems or media-based automated menu systems. There is a need for a framework to allow applications to use multiple modes of user interaction in a seamless manner. An example of a multi-modal interactive application is an application that allows the user to make selections by either clicking on an HTML-based menu or pressing a device key (e.g., a keypad) or speaking the command to achieve the same result. One difficulty of multi-modal interaction is synchronizing and updating all interaction modes as a result of user actions in only one mode. For instance, this may require "pushing" a new HTML menu down to a device after an audio selection has been made. A framework is needed to ensure that different modes of interaction will:- - be identifiable as belonging to the same application instance, - function together in a seamless and predictable manner, - provide a basic set of functional expectations for all modes, and - provide a minimum set of security expectations for all modes. The basis of this framework is that each user interface component should be thought of as a session whose content can be established, updated and terminated. Specifically, that each user interface component is negotiated as content within the context of a SIP dialog [2]. Many advantages of this approach stem from leveraging SIP's considerable session management capabilities. Advantages of the framework: - Differing modes of interaction can be managed in a homogenous manner. This should greatly simplify user agent interface management and representation. - The framework encompasses the existing usage of SIP dialogs for media-based session negotiation. Audio & video media-based user interface components are now just one of number of possible network application interaction modes. - The identification of user interface components is now consistent across all interaction modes. - The functional expectation of some basic user interface component operations is now consistent across all interaction modes (for instance, establishing an association between various user interface components). - The time-bounded nature of SIP sessions ensures that interaction resources are terminated when either party terminates the Network Application Session. - SIP's security and authentication framework can be leveraged to provide a consistent level of security for the establishment of all modes of interaction. - SIP call control primitives can be used to build relationships between various user interface components in a content agnostic manner. R. Fairlie-Cuninghame [Page 2] Internet Draft Network Application Session Framework June 6, 2002 2. Terminology In addition to the terms defined in the Network Application Interaction Requirements document [3], the following terms are defined: Network Application Session: A SIP dialog that is established between a User Agent and an Application Server (also acting as a User Agent). Specifically, a Network Application Session is a SIP dialog where the Application Server has identified the application instance. Application Instance: The application instance uniquely identifies the instantiation of the application or distributed application. Distributed Application: A set of co-operating Application Servers or Network Application Sessions working together to provide a cohesive set of services to a User Agent. All participating Network Application Sessions must share the same application instance. Distributed Application user interface components should be rendered to the user as though they were negotiated in a single Network Application Session. User Interface Component: With respect to this framework, the term User Interface Component (as defined in [3]) refers to content established within a Network Application Session. Network Application Session Content: A general term to describe a mode of interaction established through a Network Application Session. The content can refer to any type of user interaction, for instance, audio or video media streams, HTML, VoiceXML, device- specific user interface actions, etc. 3. Framework Objectives This section lists the services that this framework aims to provide to Application Servers and User Agents. It is important to note that all objectives are achieved purely through informational headers that do not change the protocol behavior of either entity. 3.1. Session Creation, Update and Closure In this framework, an Application Server can always control the update and closure of a created component as each user interface component is associated with a SIP dialog. Additionally, it is also always possible for a User Agent to indicate its desire to close a particular Network Application Session (without necessarily ending all Distributed Application sessions). R. Fairlie-Cuninghame [Page 3] Internet Draft Network Application Session Framework June 6, 2002 This approach also allows a user agent to ensure that all application interaction resources are terminated after the application server or (more importantly) the user agent has requested to terminate the session. 3.2. Identification This framework allows the User Agent to identify the type of application and application instance associated with each Network Application Session. In addition to the new headers, the framework allows a consistent level of authentication/identification of the Application Server and User Agent across all user interface components by leveraging the existing authentication schemes in SIP. 3.3. Co-operation This framework allows Distributed Application Servers to co-operate and create a number of Network Application Sessions that refer to the same application instance and share the same virtual user interface. The functional effect of this objective is to allow user interface components carried within different Network Application Sessions to be rendered as if all user agent components were carried within a single Network Application Session. OPEN ISSUE: Is the complication introduced by co-operation (i.e., that a single application can use multiple Network Application Sessions) justifiable? Should each network applet be restricted to a single SIP dialog? 3.4. Association By definition, Network Application Sessions are only permitted between an Application Server and a User-Agent, however there are many instances where an Application many wish to have access to user input and actions associated with a SIP session between the user and a third party (which could be another user or application). An example, is a voice recorder or pre-paid calling card server associating a Network Application Session with another end-to-end SIP session. It is the responsibility of the user-agent to authorize such associations. The functional effect of an association is that any user interface component content negotiated in the Network Application Session (e.g., media-, presentation- or input- based interaction) SHOULD collect user input as though it was negotiated in the associated SIP dialog/conversation space. The does not infer that a User Agent should misrepresent the separate (but linked) identity of the Application Server user interface components. R. Fairlie-Cuninghame [Page 4] Internet Draft Network Application Session Framework June 6, 2002 OPEN ISSUES: What (if any) restrictions should be placed on the number of dialogs that a single Distributed Application can associate with? SUGGESTION: Restrict each (Distributed) Application to only associating with one other dialog. This simplifies the otherwise potentially complicated relationship management logic required by User Agent. 3.5. Interaction Preferences The framework should allow the User Agent to indicate the user's preference for different modes of interaction, e.g., voice, video, point-and-click based, etc. This should not merely be a replacement of session content codec negotiation but rather indicate a higher level of user-centric application interaction preference. This objective will be addressed in a future revision. DISCUSSION: Is an ordered wildcard MIME type list appropriate? For instance: App-Preference: audio/*;q=0.7, application/DML;q=0.7, application/HTML;q=0.3; video/*;q=0.1 This would indicate that the user most wants to use both audio and DTMF for application interaction, followed by HTML-based interaction and then lastly streaming video. This is really a misuse of the MIME type tree but is it adequate for solving the intended problem? What other information is useful? For instance, the maximum number of video, markup or audio components supported in a single application instance? 4. New SIP Headers In order to accomplish the above objectives new informational headers have been defined. 4.1. P-App-Id Header The P-App-Id header identifies the application type and application instance to the User Agent. The presence of the P-App-Id header in an INVITE request or response indicates that the dialog is a Network Application Sessions as described by this framework. The header may only be placed in INVITE requests and 1xx or 2xx responses by Application Servers. R. Fairlie-Cuninghame [Page 5] Internet Draft Network Application Session Framework June 6, 2002 app-id-header = "P-App-Id" ":" [display-name] "<" app-id-info ">" app-id-info = "name" "=" name-attribute ";" "pvdr" "=" provider-attribute ";" "id" "=" global-id-attribute *( ";" "contact" "=" contact-attribute ) *( ";" token "=" future-extension-attribute ) name-attribute = application-name [ ":" application-version ] "@" author-domain-name application-name = token application version = 1*DIGIT "." token author-domain-name = FQDN provider-attribute = [ provider-instance "@" ] provider-domain-name provider-instance = token provider-domain-name = FQDN global-id-attribute = UUID | token "@" FQDN contact-attribute = URI The name-attribute MUST contain a globally scoped identifier (with respect to the application) to identify the application's name, version and author. The provider-attribute MUST provide a globally scoped identifier (with respect to the provider) to identify the organization hosting the application. The optional contact-attribute is a URI that providers a user relevant reference to contact the provider organization with respect to the application, for instance, a http: or mailto: URL address. The id-attribute is a globally unique identifier (for the application instance) to identify the application instance. In the case of a Distributed Application (where multiple Network Application Sessions refer to the same logical application instance), the global-id-attribute must match for all Network Application Sessions. Once the P-App-Id header has been added to an INVITE request or response by the Application Server, the header MUST be added UNCHANGED to all subsequent INVITE requests or responses. P-App-Id headers MAY NOT be added, modified or removed from re-INVITE requests or responses. Only one P-App-Id header is allowed per SIP message. P-App-Id headers added in response to a new User Agent initiated INVITE request MUST always use a new global-id-attribute value thereby forming a new application instance. This restriction means that a User Agent does not have to shuffle or merge (possibly) pre- existing user interface constructs into another (distributed) application. Header Example: P-App-Id: "Acme Voice Recorder v1.22b" < app=voicerec:1.22b@coolsoftware.com; pvdr=services@joes-isp.com; R. Fairlie-Cuninghame [Page 6] Internet Draft Network Application Session Framework June 6, 2002 id=2893472834@srv12.joes-isp.com; contact=http://www.joes- isp.com/voicerec/about.html > 5. Session Association And Co-operation Session Co-operation and Association within this framework is achieved by recognizing that the two operations are in fact equivalent to forming a conversation space through a (Local) Join operation [4]. Co-operation is achieved by performing a Join operation between Network Application Sessions within the same application instance; association is achieved by a Join operation between a Network Application Session and a SIP dialog which is either not a Network Application Session or does not belong to the same application instance. 5.1. Co-operation Co-operating Network Application Sessions refer to sessions that belong to the same Distributed Application instance. The application instance is identified by the global "id" attribute in the P-App-Id header. On a User Agent, session co-operation is achieved by implicitly performing a Join operation on all Network Application Sessions possessing same application "id" attribute value (application instance). This has the result that, with respect to user input and presentation, all user interface components are treated as though they were negotiated in a single Network Application Session. 5.2. Association Association is achieved by explicitly performing a Join operation between any Network Application Session and a target SIP dialog. This is achieved by: - The Application Server placing a Join header [5] in an INVITE request (or re-INVITE request) of the Network Application Session dialog. - The third-party placing a Join header in an INVITE request (or re- INVITE request) of the target SIP dialog (this assumes co- ordination between the Application Server and the third party). Association with any Network Application Session dialog in a Distributed Application is equivalent to a Join with all dialogs, as all the Network Application Sessions are also implicitly joined. The User Agent MUST authorize the execution of the Join operation either by consulting local policy or querying the user. OPEN ISSUE: How do you securely ensure that an Application-Server has authorized a third-party associate itself to an application instance? Perhaps a Join-Requested-By header is needed? R. Fairlie-Cuninghame [Page 7] Internet Draft Network Application Session Framework June 6, 2002 The association is maintained for as long as the Join header is placed in re-INVITE requests. The absence of a previously present Join header removes the association. OPEN ISSUE: Or should an [as yet undefined] explicit Split operation be required? 6. Application Design Considerations 7. Framework Usage 7.1. User Agent Behavior 1. How does a User Agent react when receiving an INVITE request with a P-App-Id header? This framework does not change protocol behavior it is only aimed at influencing how a User Agent renders and identifies the session content to the user. On receiving a new Network Application Session INVITE request, the User Agent should check other pre-established sessions for a matching P-App-Id header. If a match exists then the new session is added to the existing set of sessions for this distributed application. In this case, the user agent SHOULD render all component content as though it is contained within a single Network Application Session. The User Agent MUST authenticate and/or authorize Application Servers before establishing the Network Application Session. Likewise the User Agent MUST authorize any co-operation between Network Application Sessions. If the INVITE request contains a Join header, then the User Agent MUST authorize the association between the Application Server and the dialog using local policy, cryptographic means or user consultation. If the association is rejected then the User Agent MUST send a (TBD) error response to the INVITE request. 2. How does a User Agent react when receiving a P-App-Id header in an INVITE response (for a generated INVITE request)? If the response establishes an early dialog (i.e., a provisional response with To tag present) or the response is a final response, then the user agent can use the P-App-Id header information for application identification; otherwise the header can be ignored. For simplicity, Application Servers MAY NOT merge (through co- operation) user initiated SIP dialogs into pre-existing application instances. The INVITE response MAY however include a Join header to associate the dialog to another pre-existing SIP dialog. The same R. Fairlie-Cuninghame [Page 8] Internet Draft Network Application Session Framework June 6, 2002 authorization steps are taken as for a Join header in a request. If the association is rejected the User Agent MUST terminate the dialog by sending a BYE request with (TBD) Reason code. 3. How does a User Agent react when receiving a re-INVITE request or response for a Network Application Session? The Application Server MAY change the Join header in a re-INVITE request or response. The User Agent MUST update the active association. The handling is almost identical to the acceptance and authorization of the initial request or response. 4. How does a User Agent determine which Network Application Sessions belong to the same (distributed) application set? The sessions use the same global-id-attribute value in the P-App-Id header. 5. How does a User Agent determine how Network Application Sessions should be displayed on the user interface? The exact nature of the dialog association and co-operation will depend on the user agent's user interface design and limitations. A few examples: Multi-line SIP phones often assign user interface components (i.e., conversation spaces) to separate virtual "lines" or calls. One possible approach (where screen space is perhaps limited to displaying only one item) is that when each line or call is selected the user interface displays the primary session content and a selectable list of associated application instances presenting visual output. A desktop PC client may assign user interface components to an entire virtual desktop or a "parent" session window. In this case, the client could activate the desktop or window when it is in focus. When activated, the primary session content is displayed possibly along with all associated application instances presenting visual output. A simple POTS handset (connected to a SIP-PSTN gateway) is limited to a single conversation space [assuming no clever call parking mechanisms on the SIP-PSTN gateway]. In this case, in order for the User Agent (i.e., the SIP-PSTN gateway) to accept a new incoming Network Application Session or end-to-end dialog, the dialog must be co-operating or associated with the existing conversation space. 6. How does a User Agent react to the termination of a SIP dialog that is associated to a Network Application Session? The associated Network Application Session dialog is not terminated; the association is simply internally removed. R. Fairlie-Cuninghame [Page 9] Internet Draft Network Application Session Framework June 6, 2002 7.2. Application Server Behavior 8. Network Application Sessions Content Examples A Network Application Session can contain the information to establish one or more user interface components. Although not necessary, it is RECOMMENDED that user interface components be established through Session Description Protocol [6] profiles. This approach leverages the well-defined SIP offer-answer SDP negotiation model [7]. This also allows a single Network Application Session to describe multiple user interface components. The Network Application Interaction Requirements draft [3] divides user interface components into three categories of interaction: o Media-based interaction (such as audio and video streams): The use of SDP for establishing this type of user interface component is well established. For instance, the establishment of audio and video media sessions is defined by the RTP/AVP SDP profile [8]. o Presentation-based interaction (such as HTML or VoiceXML): Although no SDP profiles exist for the establishment of markup content sessions, it would not be difficult to write an SDP profile for HTTP [9] to exchange HTTP references. Session update allows one endpoint to "push" a new HTTP reference to the remote endpoint and session closure would terminate associated user interface resources. o Input-based interaction (such as key-based stimulus collection): Once again no SDP profiles are currently available but [10] proposes DML (Digit Markup Language) to handle the collection of DTMF digits and could also be negotiated using the HTTP SDP profile suggested above. 9. Usage Examples 9.1. Pre-Paid Calling Card (B2BUA) Caller AS PSTN-Gateway +--------+ +---------+ +--------+ | UAC |<=== SIP ===>| UAS/UAC |<--- SIP --->| UAS | +--------+ [Network +---------+ +--------+ || Application | App | | || Session] | Control | | || +---------+ | | ---- RTP/HTTP ---|Media Svr| | | +---------+ | | | +-------------------- RTP ----------------------+ R. Fairlie-Cuninghame [Page 10] Internet Draft Network Application Session Framework June 6, 2002 Figure 1. B2BUA Application Server Example In this example the caller sends a request to the Application Server to initiate a SIP->PSTN call. As the Application Server is a B2BUA, the AS does not need to use the Join header - all user interface components can be established in the Network Application Session(s) for this application instance. The Application Server can create user interface components to collect DTMF digits, play announcements and to open status/command windows. 9.2. Pre-Paid Calling Card (Proxy) Caller AS Callee +--------+ +---------+ +--------+ | UAC |<--- SIP --->| Proxy |<--- SIP --->| UAS | +--------+<= +---------+ +--------+ || "= SIP ===>| App | | || [Network App | Control | | || Sessions] | | | || +---------+ | |+-- RTP/HTTP -----|Media Svr| | | +---------+ | | | +-------------------- SRTP ---------------------+ Figure 2. Proxy Application Server Example In this example the Caller initiates an INVITE request to the Callee where the Application Server is only a proxy on the end-to-end call. This could be initiated directly by the user or "launched" by the Application Server through a separate REFER request [11]. The advantage of this approach is that the Caller can negotiate content directly with the Callee whilst still allowing the services provided by the Application Server. This is especially useful when the Caller & Caller have an end-to-end security/trust relationship but the Application Server only has a security/trust relationship with the Caller. Assume in the above example that the end-to-end SRTP session is established using encrypted S/MIME SDP bodies which are unreadable by the Application Server. Once the end-to-end SIP dialog (or early dialog) is established the Application-Server will send a (re-)INVITE request to establish/renegotiate the Network Application Session. The INVITE request will need to include a Join header to create the association with the end-to-end dialog. R. Fairlie-Cuninghame [Page 11] Internet Draft Network Application Session Framework June 6, 2002 In this example, the trust relationship between the Application Server and the Caller permits the Caller's User Agent to authorize the association. The combination of the end-to-end dialog and the Network Application Session dialog now function in a similar manner to the previous example. 9.3. Web-based Application Launcher Caller AS1 Callee +--------+ +---------+ +--------+ | UAC |<--- SIP --->| Proxy |<--- SIP --->| UAS | +--------+<= +---------+ +--------+ ||| ^ "= SIP ===>| App | | ||| "[Net App Sess.]| Control | | ||| " | | | ||+-"---- HTTP -----| | | || " +---------+ | || " AS2 | || " +---------+ | || "==============>| App | | || [Net App Sess.]| Control | | || +---------+ | |+------ RTP -------|Media Svr| | | +---------+ | | | +--------------------- SRTP ---------------------+ Figure 3. Multiple Application Servers Example In this example the first Application Server (AS1) is configured as the Caller's default outbound proxy, thus the Application Server will receive the INVITE request when the Caller attempts to communicate with the Callee. Acting simply as a SIP proxy, the Application server forwards the request (after record-routing). After forwarding the initial INVITE request, the Application Server also attempts to create a Network Application Session with the Caller's User Agent to present the user with an HTML "launchpad" of call-related applications, for instance, a voice recorder, a call timer, a text to speech service, etc. This Network Application Session INVITE request will have a Join header to associate the launchpad display with the corresponding SIP conversation space. In this example an independent server hosts each application. When the user clicks on an application in the launchpad, the first Application Server instructs one of the other servers - which application to launch, - how to contact the User Agent and - which dialog to associate with. Each application creates its own Network Application Session by sending a SIP INVITE with a P-App-Id and Join header. R. Fairlie-Cuninghame [Page 12] Internet Draft Network Application Session Framework June 6, 2002 10. Security Considerations The chief security consideration created by this proposal is the security issues arising from allowing a Network Application Session to associate itself with another SIP dialog (or vice versa). However these issues relate to the larger issues of authenticating distributed call control operations (in this case the Join operation). In the absence of any local policy, the user MUST be consulted before performing these operations. The authenticated identities of the requestor will be required to make an intelligent decision. User Agents MUST perform some form of authentication and authorization of the Application Server creating the Network Application Session and ensure that the transport of any User Interface Component content arising from the association is also sufficiently secured (as per local policy). Application Servers MUST perform some form of authentication of the User Agent. 11. Author's Address Robert Fairlie-Cuninghame Nuera Communications, Inc. 50 Victoria Rd Farnborough, Hants GU14-7PG United Kingdom Phone: +44-1252-548200 Email: rfairlie@nuera.com 12. References 1 S. Bradner, "The Internet Standards Process -- Revision 3", BCP 9, RFC2026, October 1996. 2 M. Handley et. al., "SIP: Session Initiation Protocol", RFC2543, March 1999. 3 B. Culpepper, R. Fairlie-Cuninghame, "Network Application Interaction Requirements", Internet-Draft, March 2002, Work in progress. 4 R. Mahy, "A Call Control Model for SIP", Internet-Draft, November 2002, Work in progress. 5 R. Mahy, D. Petrie, "The SIP Join and Fork Headers", Internet- Draft, November 2002, Work in progress. R. Fairlie-Cuninghame [Page 13] Internet Draft Network Application Session Framework June 6, 2002 6 M. Handley, V. Jacobson, "Session Description Protocol", RFC 2327, April 1998. 7 J. Rosenburg, H. Schulzrinne, "An Offer/Answer Model with SDP", Internet-Draft, Febuary 2002, Work in progress. 8 H. Schulzrinne, "RTP Profile for Audio and Video Conferences with Minimal Control", RFC1890, January 1996. 9 R. Fielding et. al., "Hypertext Transfer Protocol -- HTTP/1.1", RFC2068, January 1997. 10 J. Rosenberg, "A Framework for Stimulus Signaling in SIP Using Markup", Internet-Draft, April 2002, Work in progress. 11 R. Sparks, "The Refer Method", Internet-Draft, May 2002, Work in progress. R. Fairlie-Cuninghame [Page 14]