idnits 2.17.1 

draft-ietf-sipping-app-interaction-framework-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 657: '...   application MAY push presentation f...'
     RFC 2119 keyword, line 666: '... the application MAY push presentation...'
     RFC 2119 keyword, line 679: '...  An application MUST NOT attempt to p...'
     RFC 2119 keyword, line 682: '...t an application MUST NOT push a user ...'
     RFC 2119 keyword, line 685: '...   MUST NOT push a user interface comp...'
     (39 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 16, 2004) is 7374 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 3265 (ref. '2') (Obsoleted by RFC 6665)

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  == Outdated reference: A later version (-08) exists of
     draft-ietf-sipping-kpml-02

  == Outdated reference: A later version (-06) exists of
     draft-ietf-sip-identity-01

  == Outdated reference: A later version (-03) exists of
     draft-ietf-sip-authid-body-02

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sip-gruu-00

  == Outdated reference: A later version (-05) exists of
     draft-ietf-sipping-conferencing-framework-01

  -- Obsolete informational reference (is this intentional?): RFC 2833 (ref.
     '15') (Obsoleted by RFC 4733, RFC 4734)


     Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	SIPPING                                                     J. Rosenberg
2	Internet-Draft                                               dynamicsoft
3	Expires: August 16, 2004                               February 16, 2004

5	   A Framework for Application Interaction in the Session Initiation
6	                             Protocol (SIP)
7	            draft-ietf-sipping-app-interaction-framework-01

9	Status of this Memo

11	   This document is an Internet-Draft and is in full conformance with
12	   all provisions of Section 10 of RFC2026.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups. Note that other
16	   groups may also distribute working documents as Internet-Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time. It is inappropriate to use Internet-Drafts as reference
21	   material or to cite them other than as "work in progress."

23	   The list of current Internet-Drafts can be accessed at http://
24	   www.ietf.org/ietf/1id-abstracts.txt.

26	   The list of Internet-Draft Shadow Directories can be accessed at
27	   http://www.ietf.org/shadow.html.

29	   This Internet-Draft will expire on August 16, 2004.

31	Copyright Notice

33	   Copyright (C) The Internet Society (2004). All Rights Reserved.

35	Abstract

37	   This document describes a framework for the interaction between users
38	   and Session Initiation Protocol (SIP) based applications. By
39	   interacting with applications, users can guide the way in which they
40	   operate. The focus of this framework is stimulus signaling, which
41	   allows a user agent to interact with an application without knowledge
42	   of the semantics of that application. Stimulus signaling can occur to
43	   a user interface running locally with the client, or to a remote user
44	   interface, through media streams. Stimulus signaling encompasses a
45	   wide range of mechanisms, ranging from clicking on hyperlinks, to
46	   pressing buttons, to traditional Dual Tone Multi Frequency (DTMF)
47	   input. In all cases, stimulus signaling is supported through the use
48	   of markup languages, which play a key role in this framework.

50	Table of Contents

52	   1.    Introduction . . . . . . . . . . . . . . . . . . . . . . . .  3
53	   2.    Definitions  . . . . . . . . . . . . . . . . . . . . . . . .  4
54	   3.    A Model for Application Interaction  . . . . . . . . . . . .  7
55	   3.1   Functional vs. Stimulus  . . . . . . . . . . . . . . . . . .  8
56	   3.2   Real-Time vs. Non-Real Time  . . . . . . . . . . . . . . . .  9
57	   3.3   Client-Local vs. Client-Remote . . . . . . . . . . . . . . .  9
58	   3.4   Presentation Capable vs. Presentation Free . . . . . . . . . 10
59	   3.5   Interaction Scenarios on Telephones  . . . . . . . . . . . . 11
60	   3.5.1 Client Remote  . . . . . . . . . . . . . . . . . . . . . . . 11
61	   3.5.2 Client Local . . . . . . . . . . . . . . . . . . . . . . . . 11
62	   3.5.3 Flip-Flop  . . . . . . . . . . . . . . . . . . . . . . . . . 12
63	   4.    Framework Overview . . . . . . . . . . . . . . . . . . . . . 13
64	   5.    Application Behavior . . . . . . . . . . . . . . . . . . . . 16
65	   5.1   Client Local Interfaces  . . . . . . . . . . . . . . . . . . 16
66	   5.1.1 Discovering Capabilities . . . . . . . . . . . . . . . . . . 16
67	   5.1.2 Pushing an Initial Interface Component . . . . . . . . . . . 16
68	   5.1.3 Updating an Interface Component  . . . . . . . . . . . . . . 18
69	   5.1.4 Terminating an Interface Component . . . . . . . . . . . . . 18
70	   5.2   Client Remote Interfaces . . . . . . . . . . . . . . . . . . 19
71	   5.2.1 Originating and Terminating Applications . . . . . . . . . . 19
72	   5.2.2 Intermediary Applications  . . . . . . . . . . . . . . . . . 19
73	   6.    User Agent Behavior  . . . . . . . . . . . . . . . . . . . . 21
74	   6.1   Advertising Capabilities . . . . . . . . . . . . . . . . . . 21
75	   6.2   Receiving User Interface Components  . . . . . . . . . . . . 21
76	   6.3   Mapping User Input to User Interface Components  . . . . . . 23
77	   6.4   Receiving Updates to User Interface Components . . . . . . . 23
78	   6.5   Terminating a User Interface Component . . . . . . . . . . . 23
79	   7.    Inter-Application Feature Interaction  . . . . . . . . . . . 25
80	   7.1   Client Local UI  . . . . . . . . . . . . . . . . . . . . . . 25
81	   7.2   Client-Remote UI . . . . . . . . . . . . . . . . . . . . . . 26
82	   8.    Intra Application Feature Interaction  . . . . . . . . . . . 27
83	   9.    Example Call Flow  . . . . . . . . . . . . . . . . . . . . . 28
84	   10.   Security Considerations  . . . . . . . . . . . . . . . . . . 33
85	   11.   Contributors . . . . . . . . . . . . . . . . . . . . . . . . 34
86	         Normative References . . . . . . . . . . . . . . . . . . . . 35
87	         Informative References . . . . . . . . . . . . . . . . . . . 36
88	         Author's Address . . . . . . . . . . . . . . . . . . . . . . 36
89	         Intellectual Property and Copyright Statements . . . . . . . 37

91	1. Introduction

93	   The Session Initiation Protocol (SIP) [1] provides the ability for
94	   users to initiate, manage, and terminate communications sessions.
95	   Frequently, these sessions will involve a SIP application. A SIP
96	   application is defined as a program running on a SIP-based element
97	   (such as a proxy or user agent) that provides some value-added
98	   function to a user or system administrator. Examples of SIP
99	   applications include pre-paid calling card calls, conferencing, and
100	   presence-based [10] call routing.

102	   In order for most applications to properly function, they need input
103	   from the user to guide their operation. As an example, a pre-paid
104	   calling card application requires the user to input their calling
105	   card number, their PIN code, and the destination number they wish to
106	   reach. The process by which a user provides input to an application
107	   is called "application interaction".

109	   Application interaction can be either functional or stimulus.
110	   Functional interaction requires the user agent to understand the
111	   semantics of the application, whereas stimulus interaction does not.
112	   Stimulus signaling allows for applications to be built without
113	   requiring modifications to the client. Stimulus interaction is the
114	   subject of this framework. The framework provides a model for how
115	   users interact with applications through user interfaces, and how
116	   user interfaces and applications can be distributed throughout a
117	   network. This model is then used to describe how applications can
118	   instantiate and manage user interfaces.

120	2. Definitions

122	   SIP Application: A SIP application is defined as a program running on
123	      a SIP-based element (such as a proxy or user agent) that provides
124	      some value-added function to a user or system administrator.
125	      Examples of SIP applications include pre-paid calling card calls,
126	      conferencing, and presence-based [10] call routing.

128	   Application Interaction: The process by which a user provides input
129	      to an application.

131	   Real-Time Application Interaction: Application interaction that takes
132	      place while an application instance is executing. For example,
133	      when a user enters their PIN number into a pre-paid calling card
134	      application, this is real-time application interaction.

136	   Non-Real Time Application Interaction: Application interaction that
137	      takes place asynchronously with the execution of the application.
138	      Generally, non-real time application interaction is accomplished
139	      through provisioning.

141	   Functional Application Interaction: Application interaction is
142	      functional when the user device has an understanding of the
143	      semantics of the interaction with the application.

145	   Stimulus Application Interaction: Application interaction is
146	      considered to be stimulus when the user device has no
147	      understanding of the semantics of the interaction with the
148	      application.

150	   User Interface (UI): The user interface provides the user with
151	      context in order to make decisions about what they want. The user
152	      enters information into the user interface. The user interface
153	      interprets the information, and passes it to the application.

155	   User Interface Component: A piece of user interface which operates
156	      independently of other pieces of the user interface. For example,
157	      a user might have two separate web interfaces to a pre-paid
158	      calling card application - one for hanging up and making another
159	      call, and another for entering the username and PIN.

161	   User Device: The software or hardware system that the user directly
162	      interacts with in order to communicate with the application. An
163	      example of a user device is a telephone. Another example is a PC
164	      with a web browser.

166	   User Input: The "raw" information passed from a user to a user
167	      interface. Examples of user input include a spoken word or a click
168	      on a hyperlink.

170	   Client-Local User Interface: A user interface which is co-resident
171	      with the user device.

173	   Client Remote User Interface: A user interface which executes
174	      remotely from the user device. In this case, a standardized
175	      interface is needed between the user device and the user
176	      interface. Typically, this is done through media sessions - audio,
177	      video, or application sharing.

179	   Media Interaction: A means of separating a user and a user interface
180	      by connecting them with media streams.

182	   Interactive Voice Response (IVR): An IVR is a type of user interface
183	      that allows users to speak commands to the application, and hear
184	      responses to those commands prompting for more information.

186	   Prompt-and-Collect: The basic primitive of an IVR user interface. The
187	      user is presented with a voice option, and the user speaks their
188	      choice.

190	   Barge-In: In an IVR user interface, a user is prompted to enter some
191	      information. With some prompts, the user may enter the requested
192	      information before the prompt completes. In that case, the prompt
193	      ceases. The act of entering the information before completion of
194	      the prompt is referred to as barge-in.

196	   Focus: A user interface component has focus when user input is
197	      provided fed to it, as opposed to any other user interface
198	      components. This is not to be confused with the term focus within
199	      the SIP conferencing framework, which refers to the center user
200	      agent in a conference [12].

202	   Focus Determination: The process by which the user device determines
203	      which user interface component will receive the user input.

205	   Focusless User Interface: A user interface which has no ability to
206	      perform focus determination. An example of a focusless user
207	      interface is a keypad on a telephone.

209	   Presentation Capable UI: A user interface which can prompt the user
210	      with input, collect results, and then prompt the user with new
211	      information based on those results.

213	   Presentation Free UI: A user interface which cannot prompt the user
214	      with information.

216	   Feature Interaction: A class of problems which result when multiple
217	      applications or application components are trying to provide
218	      services to a user at the same time.

220	   Inter-Application Feature Interaction: Feature interactions that
221	      occur between applications.

223	   DTMF: Dual-Tone Multi-Frequency. DTMF refer to a class of tones
224	      generated by circuit switched telephony devices when the user
225	      presses a key on the keypad. As a result, DTMF and keypad input
226	      are often used synonymously, when in fact one of them (DTMF) is
227	      merely a means of conveying the other (the keypad input) to a
228	      client-remote user interface (the switch, for example).

230	   Application Instance: A single execution path of a SIP application.

232	   Originating Application: A SIP application which acts as a UAC,
233	      calling the user.

235	   Terminating Application: A SIP application which acts as a UAS,
236	      answering a call generated by a user. IVR applications are
237	      terminating applications.

239	   Intermediary Application: A SIP application which is neither the
240	      caller or callee, but rather, a third party involved in a call.

242	3. A Model for Application Interaction

244	         +---+            +---+            +---+             +---+
245	         |   |            |   |            |   |             |   |
246	         |   |            | U |            | U |             | A |
247	         |   |   Input    | s |   Input    | s |   Results   | p |
248	         |   | ---------> | e | ---------> | e | ----------> | p |
249	         | U |            | r |            | r |             | l |
250	         | s |            |   |            |   |             | i |
251	         | e |            | D |            | I |             | c |
252	         | r |   Output   | e |   Output   | f |   Update    | a |
253	         |   | <--------- | v | <--------- | a | <.......... | t |
254	         |   |            | i |            | c |             | i |
255	         |   |            | c |            | e |             | o |
256	         |   |            | e |            |   |             | n |
257	         |   |            |   |            |   |             |   |
258	         +---+            +---+            +---+             +---+

260	               Figure 1: Model for Real-Time Interactions

262	   Figure 1 presents a general model for how users interact with
263	   applications. Generally, users interact with a user interface through
264	   a user device. A user device can be a telephone, or it can be a PC
265	   with a web browser. Its role is to pass the user input from the user,
266	   to the user interface. The user interface provides the user with
267	   context in order to make decisions about what they want. The user
268	   enters information into the user interface. The user interface
269	   interprets the information, and passes it to the application. The
270	   application may be able to modify the user interface based on this
271	   information. Whether or not this is possible depends on the type of
272	   user interface.

274	   User interfaces are fundamentally about rendering and interpretation.
275	   Rendering refers to the way in which the user is provided context.
276	   This can be through hyperlinks, images, sounds, videos, text, and so
277	   on. Interpretation refers to the way in which the user interface
278	   takes the "raw" data provided by the user, and returns the result to
279	   the application in a meaningful format, abstracted from the
280	   particulars of the user interface. As an example, consider a pre-paid
281	   calling card application. The user interface worries about details
282	   such as what prompt the user is provided, whether the voice is male
283	   or female, and so on. It is concerned with recognizing the speech
284	   that the user provides, in order to obtain the desired information.
285	   In this case, the desired information is the calling card number, the
286	   PIN code, and the destination number. The application needs that
287	   data, and it doesn't matter to the application whether it was
288	   collected using a male prompt or a female one.

290	   User interfaces generally have real-time requirements towards the
291	   user. That is, when a user interacts with the user interface, the
292	   user interface needs to react quickly, and that change needs to be
293	   propagated to the user right away. However, the interface between the
294	   user interface and the application need not be that fast. Faster is
295	   better, but the user interface itself can frequently compensate for
296	   long latencies there. In the case of a pre-paid calling card
297	   application, when the user is prompted to enter their PIN, the prompt
298	   should generally stop immediately once the first digit of the PIN is
299	   entered. This is referred to as barge-in. After the user-interface
300	   collects the rest of the PIN, it can tell the user to "please wait
301	   while processing". The PIN can then be gradually transmitted to the
302	   application. In this example, the user interface has compensated for
303	   a slow UI to application interface by asking the user to wait.

305	   The separation between user interface and application is absolutely
306	   fundamental to the entire framework provided in this document. Its
307	   importance cannot be overstated.

309	   With this basic model, we can begin to taxonomize the types of
310	   systems that can be built.

312	3.1 Functional vs. Stimulus

314	   The first way to taxonomize the system is to consider the interface
315	   between the UI and the application. There are two fundamentally
316	   different models for this interface. In a functional interface, the
317	   user interface has detailed knowledge about the application, and is,
318	   in fact, specific to the application. The interface between the two
319	   components is through a functional protocol, capable of representing
320	   the semantics which can be exposed through the user interface.
321	   Because the user interface has knowledge of the application, it can
322	   be optimally designed for that application. As a result, functional
323	   user interfaces are almost always the most user friendly, the
324	   fastest, the and the most responsive. However, in order to allow
325	   interoperability between user devices and applications, the details
326	   of the functional protocols need to be specified in standards. This
327	   slows down innovation and limits the scope of applications that can
328	   be built.

330	   An alternative is a stimulus interface. In a stimulus interface, the
331	   user interface is generic, totally ignorant of the details of the
332	   application. Indeed, the application may pass instructions to the
333	   user interface describing how it should operate. The user interface
334	   translates user input into "stimulus" - which are data understood
335	   only by the application, and not by the user interface. Because they
336	   are generic, and because they require communications with the
337	   application in order to change the way in which they render
338	   information to the user, stimulus user interfaces are usually slower,
339	   less user friendly, and less responsive than a functional
340	   counterpart. However, they allow for substantial innovation in
341	   applications, since no standardization activity is needed to build a
342	   new application, as long as it can interact with the user within the
343	   confines of the user interface mechanism. The web is an example of a
344	   stimulus user interface to applications.

346	   In SIP systems, functional interfaces are provided by extending the
347	   SIP protocol to provide the needed functionality. For example, the
348	   SIP caller preferences specification [13] provides a functional
349	   interface that allows a user to request applications to route the
350	   call to specific types of user agents. Functional interfaces are
351	   important, but are not the subject of this framework. The primary
352	   goal of this framework is to address the role of stimulus interfaces
353	   to SIP applications.

355	3.2 Real-Time vs. Non-Real Time

357	   Application interaction systems can also be real-time or
358	   non-real-time. Non-real interaction allows the user to enter
359	   information about application operation in asynchronously with its
360	   invocation. Frequently, this is done through provisioning systems. As
361	   an example, a user can set up the forwarding number for a
362	   call-forward on no-answer application using a web page. Real-time
363	   interaction requires the user to interact with the application at the
364	   time of its invocation.

366	3.3 Client-Local vs. Client-Remote

368	   Another axis in the taxonomization is whether the user interface is
369	   co-resident with the user device (which we refer to as a client-local
370	   user interface), or the user interface runs in a host separated from
371	   the client (which we refer to as a client-remote user interface). In
372	   a client-remote user interface, there exists some kind of protocol
373	   between the client device and the UI that allows the client to
374	   interact with the user interface over a network.

376	   The most important way to separate the UI and the client device is
377	   through media interaction. In media interaction, the interface
378	   between the user and the user interface is through media - audio,
379	   video, messaging, and so on. This is the classic mode of operation
380	   for VoiceXML [3], where the user interface (also referred to as the
381	   voice browser) runs on a platform in the network. Users communicate
382	   with the voice browser through the telephone network (or using a SIP
383	   session). The voice browser interacts with the application using HTTP
384	   to convey the information collected from the user.

386	   In the case of a client-local user interface, the user interface runs
387	   co-located with the user device. The interface between them is
388	   through the software that interprets the users input and passes them
389	   to the user interface. The classic example of this is the web. In the
390	   web, the user interface is a web browser, and the interface is
391	   defined by the HTML document that it's rendering. The user interacts
392	   directly with the user interface running in the browser. The results
393	   of that user interface are sent to the application (running on the
394	   web server) using HTTP.

396	   It is important to note that whether or not the user interface is
397	   local, or remote (in the case of media interaction), is not a
398	   property of the modality of the interface, but rather a property of
399	   the system. As an example, it is possible for a web-based user
400	   interface to be provided with a client-remote user interface. In such
401	   a scenario, video and application sharing media sessions can be used
402	   between the user and the user interface. The user interface, still
403	   guided by HTML, now runs "in the network", remote from the client.
404	   Similarly, a VoiceXML document can be interpreted locally by a client
405	   device, with no media streams at all. Indeed, the VoiceXML document
406	   can be rendered using text, rather than media, with no impact on the
407	   interface between the user interface and the application.

409	   It is also important to note that systems can be hybrid. In a hybrid
410	   user interface, some aspects of it (usually those associated with a
411	   particular modality) run locally, and others run remotely.

413	3.4 Presentation Capable vs. Presentation Free

415	   A user interface can be capable of presenting information to the user
416	   (a presentation capable UI), or it can be capable only of collecting
417	   user input (a presentation free UI). These are very different types
418	   of user interfaces. A presentation capable UI can provide the user
419	   with feedback after every input, providing the context for collecting
420	   the next input. As a result, presentation capable user interfaces
421	   require an update to the information provided to the user after each
422	   input. The web is a classic example of this. After every input (i.e.,
423	   a click), the browser provides the input to the application and
424	   fetches the next page to render. In a presentation free user
425	   interface, this is not the case. Since the user is not provided with
426	   feedback, these user interfaces tend to merely collect information as
427	   its entered, and pass it to the application.

429	   Another difference is that a presentation-free user interface cannot
430	   support the concept of a focus. As a result, if multiple applications
431	   wish to gather input from the user, there is no way for the user to
432	   select which application the input is destined for. The input
433	   provided to applications through presentation-free user interfaces is
434	   more of a broadcast or notification operation, as a result.

436	3.5 Interaction Scenarios on Telephones

438	   This same model can apply to a telephone. In a traditional telephone,
439	   the user interface consists of a 12-key keypad, a speaker, and a
440	   microphone. Indeed, from here forward, the term "telephone" is used
441	   to represent any device that meets, at a minimum, the characteristics
442	   described in the previous sentence. Circuit-switched telephony
443	   applications are almost universally client-remote user interfaces. In
444	   the Public Switched Telephone Network (PSTN), there is usually a
445	   circuit interface between the user and the user interface. The user
446	   input from the keypad is conveyed used Dual-Tone Multi-Frequency
447	   (DTMF), and the microphone input as Pulse Code Modulated (PCM)
448	   encoded voice.

450	   In an IP-based system, there is more variability in how the system
451	   can be instantiated. Both client-remote and client-local user
452	   interfaces to a telephone can be provided.

454	   In this framework, a PSTN gateway can be considered a "user proxy".
455	   It is a proxy for the user because it can provide, to a user
456	   interface on an IP network, input taken from a user on a circuit
457	   switched telephone. The gateway may be able to run a client-local
458	   user interface, just as an IP telephone might.

460	3.5.1 Client Remote

462	   The most obvious instantiation is the "classic" circuit-switched
463	   telephony model. In that model, the user interface runs remotely from
464	   the client. The interface between the user and the user interface is
465	   through media, set up by SIP and carried over the Real Time Transport
466	   Protocol (RTP) [14]. The microphone input can be carried using any
467	   suitable voice encoding algorithm. The keypad input can be conveyed
468	   in one of two ways. The first is to convert the keypad input to DTMF,
469	   and then convey that DTMF using a suitance encoding algorithm for it
470	   (such as PCMU). An alternative, and generally the preferred approach,
471	   is to transmit the keypad input using RFC 2833 [15], which provides
472	   an encoding mechanism for carrying keypad input within RTP.

474	   In this classic model, the user interface would run on a server in
475	   the IP network. It would perform speech recognition and DTMF
476	   recognition to derive the user intent, feed them through the user
477	   interface, and provide the result to an application.

479	3.5.2 Client Local

481	   An alternative model is for the entire user interface to reside on
482	   the telephone. The user interface can be a VoiceXML browser, running
483	   speech recognition on the microphone input, and feeding the keypad
484	   input directly into the script. As discussed above, the VoiceXML
485	   script could be rendered using text instead of voice, if the
486	   telephone had a textual display.

488	3.5.3 Flip-Flop

490	   A middle-ground approach is to flip back and forth between a
491	   client-local and client-remote user interface. Many voice
492	   applications are of the type which listen to the media stream and
493	   wait for some specific trigger that kicks off a more complex user
494	   interaction. The long pound in a pre-paid calling card application is
495	   one example. Another example is a conference recording application,
496	   where the user can press a key at some point in the call to begin
497	   recording. When the key is pressed, the user hears a whisper to
498	   inform them that recording has started.

500	   The ideal way to support such an application is to install a
501	   client-local user interface component that waits for the trigger to
502	   kick off the real interaction. Once the trigger is received, the
503	   application connects the user to a client-remote user interface that
504	   can play announements, collect more information, and so on.

506	   The benefit of flip-flopping between a client-local and client-remote
507	   user interface is cost. The client-local user interface will
508	   eliminate the need to send media streams into the network just to
509	   wait for the user to press the pound key on the keypad.

511	   The Keypad Markup Language (KPML) was designed to support exactly
512	   this kind of need [6]. It models the keypad on a phone, and allows an
513	   application to be informed when any sequence of keys have been
514	   pressed. However, KPML has no presentation component. Since user
515	   interfaces generally require a response to user input, the
516	   presentation will need to be done using a client-remote user
517	   interface that gets instantiated as a result of the trigger.

519	   It is tempting to use a hybrid model, where a prompt-and-collect
520	   application is implemented by using a client-remote user interface
521	   that plays the prompts, and a client-local user interface, described
522	   by KPML, that collects digits. However, this only complicates the
523	   application. Firstly, the keypad input will be sent to both the media
524	   stream and the KPML user interface. This requires the application to
525	   sort out which user inputs are duplicates, a process that is very
526	   complicated. Secondly, the primary benefit of KPML is to avoid having
527	   a media stream towards a user interface. However, there is already a
528	   media stream for the prompting, so there is no real savings.

530	4. Framework Overview

532	   In this framework, we use the term "SIP application" to refer to a
533	   broad set of functionality. A SIP application is a program running on
534	   a SIP-based element (such as a proxy or user agent) that provides
535	   some value-added function to a user or system administrator. SIP
536	   applications can execute on behalf of a caller, a called party, or a
537	   multitude of users at once.

539	   Each application has a number of instances that are executing at any
540	   given time. An instance represents a single execution path for an
541	   application. Each instance has a well defined lifecycle. It is
542	   established as a result of some event. That event can be a SIP event,
543	   such as the reception of a SIP INVITE request, or it can be a non-SIP
544	   event, such as a web form post or even a timer. Application instances
545	   also have a specific end time. Some instances have a lifetime that is
546	   coupled with a SIP transaction or dialog. For example, a proxy
547	   application might begin when an INVITE arrives, and terminate when
548	   the call is answered. Other applications have a lifetime that spans
549	   multiple dialogs or transactions. For example, a conferencing
550	   application instance may exist so long as there are any dialogs
551	   connected to it. When the last dialog terminates, the application
552	   instance terminates. Other applications have a liftime that is
553	   completely decoupled from SIP events.

555	   It is fundamental to the framework described here that multiple
556	   application instances may interact with a user during a single SIP
557	   transaction or dialog. Each instance may be for the same application,
558	   or different applications. Each of the applications may be completely
559	   independent, in that they may be owned by different providers, and
560	   may not be aware of each others existence. Similarly, there may be
561	   application instances interacting with the caller, and instances
562	   interacting with the callee, both within the same transaction or
563	   dialog.

565	   The first step in the interaction with the user is to instantiate one
566	   of more user interface components for the application instance. A
567	   user interface component is a single piece of the user interface that
568	   is defined by a logical flow that is not synchronously coupled with
569	   any other component. In other words, each component runs more or less
570	   independently.

572	   A user interface component can be instantiated in one of the user
573	   agents in a dialog (for a client-local user interface), or within a
574	   network element (for a client-remote user interface). If a
575	   client-local user interface is to be used, the application needs to
576	   determine whether or not the user agent is capable of supporting a
577	   client-local user interface, and in what format. In this framework,
578	   all client-local user interface components are described by a markup
579	   language. A markup language describes a logical flow of presentation
580	   of information to the user, collection of information from the user,
581	   and transmission of that information to an application. Examples of
582	   markup languages include HTML, WML, VoiceXML, and the Keypad Markup
583	   Language (KPML) [6].

585	   Unlike an application instance, which has very flexible lifetimes, a
586	   user interface component has a very fixed lifetime. A user interface
587	   component is always associated with a dialog. The user interface
588	   component can be created at any point after the dialog (or early
589	   dialog) is created. However, the user interface component terminates
590	   when the dialog terminates. The user interface component can be
591	   terminated earlier by the user agent, and possibly by the
592	   application, but its lifetime never exceeds that of its associated
593	   dialog.

595	   There are two ways to create a client local interface component. For
596	   interface components that are presentation capable, the application
597	   sends a REFER [5] request to the user agent. The Refer-To header
598	   field contains an HTTP URI that points to the markup for the user
599	   interface. For interface components that are presentation free (such
600	   as those defined by KPML), the application sends a SUBSCRIBE request
601	   to the user agent. The body of the SUBSCRIBE request contains a
602	   filter, which, in this case, is the markup that defines when
603	   information is to be sent to the application in a NOTIFY.

605	   If a user interface component is to be instantiated in the network,
606	   there is no need to determine the capabilities of the device on which
607	   the user interface is instantiated. Presumably, it is on a device on
608	   which the application knows a UI can be created. However, the
609	   application does need to connect the user device to the user
610	   interface. This will require manipulation of media streams in order
611	   to establish that connection.

613	   The interface between the user interface component and the
614	   application depends on the type of user interface. For presentation
615	   capable user interfaces, such as those described by  HTML and
616	   VoiceXML, HTTP form POST operations are used. For presentation free
617	   user interfaces, a SIP NOTIFY is used. The differing needs and
618	   capabilities of these two user interfaces, as described in Section
619	   3.4, is what drives the different choices for the interactions. Since
620	   presentation capable user interfaces require an update to the
621	   presentation every time user data is entered, they are a good match
622	   for HTTP. Since presentation free user interfaces merely transmit
623	   user input to the application, a NOTIFY is more appropriate.

625	   Indeed, for presentation free user interfaces, there are two
626	   different modalities of operation. The first is called "one shot". In
627	   the one-shot role, the markup waits for a user to enter some
628	   information, and when they do, reports this event to the application.
629	   The application then does something, and the markup is no longer
630	   used. In the other modality, called "monitor", the markup stays
631	   permanently resident, and reports information back to an application
632	   until termination of the associated dialog.

634	5. Application Behavior

636	   The behavior of an application within this framework depends on
637	   whether it seeks to use a client-local or client-remote user
638	   interface.

640	5.1 Client Local Interfaces

642	   One key component of this framework is support for client local user
643	   interfaces.

645	5.1.1 Discovering Capabilities

647	   A client local user interface can only be instantiated on a user
648	   agent if the user agent supports that type of user interface
649	   component. Support for client local user interface components is
650	   declared by both the UAC and a UAS in its Accept, Allow, Contact and
651	   Allow-Event header fields of dialog-initiating requests and
652	   responses. If the Allow header field indicates support for the SIP
653	   SUBSCRIBE method, and the Allow-Event header field indicates support
654	   for the kpml package [6], and the Contact header field indicates that
655	   its URI is a GRUU [9] it means that the UA can instantiate
656	   presentation free user interface components. In this case, the
657	   application MAY push presentation free user interface components
658	   according to the rules of Section 5.1.2. The specific markup
659	   languages that can be supported are indicated in the Accept header
660	   field.

662	   If the Allow header field indicates support for the SIP REFER method,
663	   and the Contact header field contains UA capabilities [4] that
664	   indicate support for the HTTP URI scheme, it means that the UA
665	   supports presentation capable user interface components. In this
666	   case, the application MAY push presentation capable user interface
667	   components to the client according to the rules of Section 5.1.2. The
668	   specific markups that are supported are indicated in the Accept
669	   header field.

671	5.1.2 Pushing an Initial Interface Component

673	   Generally, we anticipate that interface components will need to be
674	   created at various different points in a SIP session. Clearly, they
675	   will need to be pushed during session setup, or after the session is
676	   established. A user interface component is always associated with a
677	   specific dialog, however.

679	   An application MUST NOT attempt to push a user interface component to
680	   a user agent until it has determined that the user agent has the
681	   neccesary capabilities and a dialog has been created. In the case of
682	   a UAC, this means that an application MUST NOT push a user interface
683	   component for an INVITE initiated dialog until the application has
684	   seen a 200 OK followed by an ACK. For SUBSCRIBE initiated dialogs, it
685	   MUST NOT push a user interface component until the application has
686	   seen a 200 OK to the NOTIFY request. For a user interface component
687	   on a UAS, the application MUST NOT push a user interface component
688	   for an INVITE initiated dialog until it has seen a 200 OK from the
689	   UAS. For a SUBSCRIBE initiated dialog, it MUST NOT push a user
690	   interface component until it has seen a NOTIFY request from the
691	   notifier.

693	   To create a presentation capable UI component on the UA, the
694	   application sends a REFER request to the UA. This REFER MUST be sent
695	   to the Globally Routable UA URI (GRUU) [9] advertised by that UA in
696	   the Contact header field of the dialog initiating request or response
697	   sent by that UA.  Note that this REFER request creates a separate
698	   dialog between the application and the UA. The Refer-To header field
699	   of the REFER request MUST contain an HTTP URI that references the
700	   markup document to be fetched.

702	      OPEN ISSUE: The refer needs to provide a context to the UA, and in
703	      particular, identify the specific dialog that this component is
704	      associated with. There is no obvious candidate for this when REFER
705	      is used. The former proposal, of using a grid, cannot work because
706	      of forking.

708	   To create a presentation free user interface component, the
709	   application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST
710	   be sent to the GRUU advertised by the UA. This SUBSCRIBE request
711	   creates a separate dialog. The SUBSCRIBE request MUST use the KPML
712	   [6] event package. The Event header field MUST contain parameters
713	   which identify the particular dialog that the interface component is
714	   being instantiated against. The body of the SUBSCRIBE request
715	   contains the markup document that defines the conditions under which
716	   the application wishes to be notified of user input.

718	   In both cases, the REFER or SUBSCRIBE request SHOULD include a
719	   display name in the From header field which identifies the name of
720	   the application. For example, a prepaid calling card might include a
721	   From header field which looks like:

723	   From: "Prepaid Calling Card" <sip:prepaid@example.com>

725	   To authenticate themselves, it is RECOMMENDED that applications use
726	   the SIP identity mechanism [7] in the REFER or SUBSCRIBE requests
727	   they generate. This mechanism has the benefit that the signature is
728	   over an authenticated identity body [8], which includes the From
729	   header field. As such, the client can obtain cryptographic assurances
730	   about the service provider (the domain in the From header field)
731	   along with the name of the application.

733	5.1.3 Updating an Interface Component

735	   Once a user interface component has been created on a client, it can
736	   be updated. The means for updating it depends on the type of UI
737	   component.

739	   Presentation capable UI components are updated using techniques
740	   already in place for those markups. In particular, user input will
741	   cause an HTTP POST operation to push the user input to the
742	   application. The result of the POST operation is a new markup that
743	   the UI is supposed to use. This allows the UI to updated in response
744	   to user action. Some markups, such as HTML, provide the ability to
745	   force a refresh after a certain period of time, so that the UI can be
746	   updated without user input. Those mechanisms can be used here as
747	   well. However, there is no support for an asynchronous push of an
748	   updated UI component from the appliciation to the user agent. A new
749	   REFER request to the same GRUU would create a new UI component rather
750	   than updating any components already in place.

752	   For presentation free UI, the story is different. The application MAY
753	   update the filter at any time by generating a SUBSCRIBE refresh with
754	   the new filter. The UA will immediately begin using this new filter.

756	5.1.4 Terminating an Interface Component

758	   User interface components have a well defined lifetime. They are
759	   created when the component is first pushed to the client. User
760	   interface components are always associated with the SIP dialog on
761	   which they were pushed. As such, their lifetime is bound by the
762	   lifetime of the dialog. When the dialog ends, so does the interface
763	   component.

765	   However, there are some cases where the application would like to
766	   terminate the user interface component before its natural termination
767	   point. For presentation capable user interfaces, this is not
768	   possible. For presentation free user interfaces, the application MAY
769	   terminate the component by sending a SUBSCRIBE with Expires equal to
770	   zero. This terminates the subscription, which removes the UI
771	   component.

773	   A client can remove a UI component at any time. For presentation
774	   aware UI, this is analagous to the user dismissing the web form
775	   window. There is no mechanism provided for reporting this kind of
776	   event to the application. The applicatio MUST be prepared to time
777	   out, and never receive input from a user. For presentation free user
778	   interfaces, the UA can explicitly terminate the subscription. This
779	   will result in the generation of a NOTIFY with a Subscription-State
780	   header field equal to "terminated".

782	5.2 Client Remote Interfaces

784	   As an alternative to, or in conjunction with client local user
785	   interfaces, an application can make use of client remote user
786	   interfaces. These user interfaces can execute co-resident with the
787	   application itself (in which case no standardized interfaces between
788	   the UI and the application need to be used), or it can run
789	   separately. This framework assumes that the user interface runs on a
790	   host that has a sufficient trust relationship with the application.
791	   As such, the means for instantiating the user interface is not
792	   considered here.

794	   The primary issue is to connect the user device to the remote user
795	   interface. Doing so requires the manipulation of media streams
796	   between the client and the user interface. Such manipulation can only
797	   be done by user agents. There are two types of user agent
798	   applications within this framework - originating/terminating
799	   applications, and intermediary applications.

801	5.2.1 Originating and Terminating Applications

803	   Originating and terminating applications are applications which are
804	   themselves the originator or the final recipient of a SIP invitation.
805	   They are "pure" user agent applications - not back-to-back user
806	   agents. The classic example of such an application is an interactive
807	   voice response (IVR) application, which is typically a terminating
808	   application. Its a terminating application because the user
809	   explicitly calls it; i.e., it is the actual called party. An example
810	   of an originating application is a wakeup call application, which
811	   calls a user at a specified time in order to wake them up.

813	   Because originating and terminating applications are a natural
814	   termination point of the dialog, manipulation of the media session by
815	   the application is trivial. Traditional SIP techniques for adding and
816	   removing media streams, modifying codecs, and changing the address of
817	   the recipient of the media streams, can be applied. Similarly, the
818	   application can directly authenticate itself to the user through S/
819	   MIME, since it is the peer UA in the dialog.

821	5.2.2 Intermediary Applications

823	   Intermediary applications are, at the same time, more common than
824	   originating/terminating applications, and more complex. Intermediary
825	   applications are applications that are neither the actual caller or
826	   called party. Rather, they represent a "third party" that wishes to
827	   interact with the user. The classic example is the ubiquitous
828	   pre-paid calling card application.

830	   In order for the intermediary application to add a client remote user
831	   interface, it needs to manipulate the media streams of the user agent
832	   to terminate on that user interface. This also introduces a
833	   fundamental feature interaction issue. Since the intermediary
834	   application is not an actual participant in the call, how does the
835	   user interact with the intermediary application, and its actual peer
836	   in the dialog, at the same time? This is discussed in more detail in
837	   Section 7.

839	6. User Agent Behavior

841	6.1 Advertising Capabilities

843	   In order to participate in applications that make use of stimulus
844	   interfaces, a user agent needs to advertise its interaction
845	   capabilities.

847	   If a user agent supports presentation capable user interfaces, it
848	   MUST support the REFER method. It MUST include, in all dialog
849	   initiating requests and responses, an Allow header field that
850	   includes the REFER method. Furthermore, the UA MUST support the SIP
851	   user agent capabilities specification [4]. The UA MUST be capable of
852	   being REFER'd to an HTTP URI. It MUST include, in the Contact header
853	   field of its dialog initiating requests and responses, a "schemes"
854	   Contact header field parameter include the http URI scheme. The UA
855	   MUST include, in all dialog initiating requests and responses, an
856	   Accept header field listing all of those markups supported by the UA.
857	   It is RECOMMENDED that all user agents that support presentation
858	   capable user interfaces support HTML.

860	   If a user agent supports presentation free user interfaces, it MUST
861	   support the SUBSCRIBE [2] method. It MUST support the KPML [6] event
862	   package. It MUST include, in all dialog initiating requests and
863	   responses, an Allow header field that includes the SUBSCRIBE method.
864	   It MUST include, in all dialog initiating requests and responses, an
865	   Allow-Events header field that lists the KPML event package. The UA
866	   MUST include, in all dialog initiating requests and responses, an
867	   Accept header field listing those event filters it supports. At a
868	   minimum, a UA MUST support the "application/kpml+xml" MIME type.

870	   For either presentation free or presentation capable user interfaces,
871	   the user agent MUST support the GRUU [9] specification. The Contact
872	   header field in all dialog initiating requests and responses MUST
873	   contain a GRUU. The UA MUST include a Supported header field which
874	   contains the gruu option tag.

876	   Because these headers are examined by proxies which may be executing
877	   applications, a UA that wishes to support client local user
878	   interfaces should not encrypt them.

880	6.2 Receiving User Interface Components

882	   Once the UA has created a dialog (in either the early or confirmed
883	   states), it MUST be prepared to receive a SUBSCRIBE or REFER request
884	   against its GRUU. If the UA receives such a request prior to the
885	   establishment of a dialog, the UA MUST reject the request.

887	   A user agent SHOULD attempt to authenticate the sender of the
888	   request. The sender will generally be an application, and therefore
889	   the user agent is unlikely to ever have a shared secret with it,
890	   making digest authentication useless. However, the REFER or SUBSCRIBE
891	   request should have a SIP authenticated identity body [8] that
892	   conveys the identity of the application [7]. If such a body is not
893	   present, and no alternative means of identification (such as
894	   P-Asserted-ID [11]) is present, the user agent MAY reject the request
895	   with a 403 response.

897	   Next, the user agent authorizes the application. An application is
898	   authorized to instantiate a user interface component if the
899	   application was resident within an element on the path of the dialog
900	   initiating request. An application proves to the user agent that it
901	   was on the path by presenting it with the dialog identifiers in the
902	   SUBSCRIBE or REFER request. In the case of SUBSCRIBE, those
903	   identifiers are present in the Event header field [6]. [[EDITORS
904	   NOTE: Fill in here once we know how this is done for REFER.]]

906	   Because of the dialog identifiers serve as a tool for authorization,
907	   a user agent compliant to this framework MUST use dialog identifiers
908	   that are cryptographically random, with at least 128 bits of
909	   randomness. It is recommended that this randomness be split between
910	   the Call-ID and From header field tag in the case of a UAC.

912	   Furthermore, to ensure that only applications resident in on-path
913	   elements can instantiate a user interface component, a user agent
914	   compliant to this specification SHOULD use the sips URI scheme for
915	   all dialogs it initiates. This will guarantee secure links between
916	   all of the elements on the signaling path.

918	   If an application does not present a valid dialog identifier in its
919	   REFER or SUBSCRIBE request, the user agent MUST reject the request
920	   with a 403 response. A user agent MAY apply any other policies in
921	   addition to (but not instead of) the ones specified here in order to
922	   authorize the creation of the user interface component. One such
923	   mechanism would be to prompt the user, informing them of the identity
924	   of the application. If an authorization policy requires user
925	   interaction, the user agent SHOULD respond to the SUBSCRIBE or REFER
926	   request with a 202. In the case of SUBSCRIBE, if authorization is not
927	   granted, the user agent SHOULD generate a NOTIFY to terminate the
928	   subscription. In the case of REFER, the user agent MUST NOT act upon
929	   the URI in the Refer-To header field until user authorization was
930	   obtained.

932	   If a REFER request to an HTTP URI was authorized, the UA executes the
933	   URI and fetches the content to be rendered to the user. This
934	   instantiates a presentation capable user interface component. If a
935	   SUBSCRIBE was authorized, a presentation free user interface
936	   component was instantiated.

938	6.3 Mapping User Input to User Interface Components

940	   Once the user interface components are instantiated, the user agent
941	   must direct user input to the appropriate component. In the case of
942	   presentation capable user interfaces, this process is known as focus
943	   selection. It is done by means that are specific to the user
944	   interface on the device. In the case of a PC, for example, the window
945	   manager would allow the user to select the appropriate user interface
946	   component that their input is directed to.

948	   For presentation free user interfaces, the situation is more
949	   complicated. In some cases, the device may support a mechanism that
950	   allows the user to select a "line", and thus the associated dialog.
951	   Any user input on the keypad while this line is selected are fed to
952	   the user interface components associated with that dialog.

954	      TODO: Need to consider the case where the user interface is
955	      co-resident with the UAC, but the user device is separated from
956	      the UAC, and occurs through some other protocol, and the user
957	      interface and application are semi-trusted. Classic case is when
958	      the UAC is a PSTN gateway.

960	6.4 Receiving Updates to User Interface Components

962	   For presentation capable user interfaces, updates to the user
963	   interface occur in ways specific to that user interface component. In
964	   the case of HTML, for example, the document can tell the client to
965	   fetch a new document periodically. However, this framework does not
966	   provide any additional machinery to asynchronously push a new user
967	   interface component to the client.

969	   For presentation free user interfaces, an application can push an
970	   update to a component by sending a SUBSCRIBE refresh with a new
971	   filter. The user agent will process these according to the rules of
972	   the event package.

974	6.5 Terminating a User Interface Component

976	   Termination of a presentation capable user interface component is a
977	   trivial procedure. The user agent merely dismisses the window (or
978	   equivalent). The fact that the component is dismissed is not
979	   communicated to the application. As such, it is purely a local
980	   matter.

982	   In the case of a presentation free user interface, if the user wishes
983	   to cease interacting with the application, it SHOULD generate a
984	   NOTIFY request with a Subscription-State equal to "terminated" and a
985	   reason of "rejected". This tells the application that the component
986	   has been removed, and that it should not attempt to re-subscribe.

988	7. Inter-Application Feature Interaction

990	   The inter-application feature interaction problem is inherent to
991	   stimulus signaling. Whenever there are multiple applications, there
992	   are multiple user interfaces. When the user provides an input, to
993	   which user interface is the input destined? That question is the
994	   essence of the inter-application feature interaction problem.

996	   Inter-application feature interaction is not an easy problem to
997	   resolve. For now, we consider separately the issues for client-local
998	   and client-remote user interface components.

1000	7.1 Client Local UI

1002	   When the user interface itself resides locally on the client device,
1003	   the feature interaction problem is actually much simpler. The end
1004	   device knows explicitly about each application, and therefore can
1005	   present the user with each one separately. When the user provides
1006	   input, the client device can determine to which user interface the
1007	   input is destined. The user interface to which input is destined is
1008	   referred to as the application in focus, and the means by which the
1009	   focused application is selected is called focus determination.

1011	   Generally speaking, focus determination is purely a local operation.
1012	   In the PC universe, focus determination is provided by window
1013	   managers. Each application does not know about focus, it merely
1014	   receives the user input that has been targeted to it when its in
1015	   focus. This basic concept applies to SIP-based applications as well.

1017	   Focus determination will frequently be trivial, depending on the user
1018	   interface type. Consider a user that makes a call from a PC. The call
1019	   passes through a pre-paid calling card application, and a call
1020	   recording application. Both of these wish to interact with the user.
1021	   Both push an HTML-based user interface to the user. On the PC, each
1022	   user interface would appear as a separate window. The user interacts
1023	   with the call recording application by selecting its window, and with
1024	   the pre-paid calling card application by selecting its window. Focus
1025	   determination is literally provided by the PC window manager. It is
1026	   clear to which application the user input is targeted.

1028	   As another example, consider the same two applications, but on a
1029	   "smart phone" that has a set of buttons, and next to each button, an
1030	   LCD display that can provide the user with an option. This user
1031	   interface can be represented using the Wireless Markup Language
1032	   (WML).

1034	   The phone would allocate some number of buttons to each application.
1035	   The prepaid calling card would get one button for its "hangup"
1036	   command, and the recording application would get one for its "start/
1037	   stop" command. The user can easily determine which application to
1038	   interact with by pressing the appropriate button. Pressing a button
1039	   determines focus and provides user input, both at the same time.

1041	   Unfortunately, not all devices will have these advanced displays. A
1042	   PSTN gateway, or a basic IP telephone, may only have a 12-key keypad.
1043	   The user interfaces for these devices are provided through the Keypad
1044	   Markup Language (KPML). Considering once again the feature
1045	   interaction case above, the pre-paid calling card application and the
1046	   call recording application would both pass a KPML document to the
1047	   device. When the user presses a button on the keypad, to which
1048	   document does the input apply? The user interface does not allow the
1049	   user to select. A user interface where the user cannot provide focus
1050	   is called a focusless user interface. This is quite a hard problem to
1051	   solve. This framework does not make any explicit normative
1052	   recommendation, but concludes that the best option is to send the
1053	   input to both user interfaces unless the markup in one interface has
1054	   indicated that it should be suppressed from others. This is a
1055	   sensible choice by analogy - its exactly what the existing circuit
1056	   switched telephone network will do. It is an explicit non-goal to
1057	   provide a better mechanism for feature interaction resolution than
1058	   the PSTN on devices which have the same user interface as they do on
1059	   the PSTN. Devices with better displays, such as PCs or screen phones,
1060	   can benefit from the capabilities of this framework, allowing the
1061	   user to determine which application they are interacting with.

1063	   Indeed, when a user provides input on a focusless device, the input
1064	   must be passed to all client local user interfaces, AND all client
1065	   remote user interfaces, unless the markup tells the UI to suppress
1066	   the media. In the case of KPML, key events are passed to remote user
1067	   interfaces by encoding them in RFC 2833 [15]. Of course, since a
1068	   client cannot determine if a media stream terminates in a remote user
1069	   interface or not, these key events are passed in all audio media
1070	   streams unless the "Q" digit is used to suppress.

1072	7.2 Client-Remote UI

1074	   When the user interfaces run remotely, the determination of focus can
1075	   be much, much harder. There are many architectures that can be
1076	   deployed to handle the interaction. None are ideal. However, all are
1077	   beyond the scope of this specification.

1079	8. Intra Application Feature Interaction

1081	   An application can instantiate a multiplicity of user interface
1082	   components. For example, a single application can instantiate two
1083	   separate HTML components and one WML component. Furthermore, an
1084	   application can instantiate both client local and client remote user
1085	   interfaces.

1087	   The feature interaction issues between these components within the
1088	   same application are less severe. If an application has multiple
1089	   client user interface components, their interaction is resolved
1090	   identically to the inter-application case - through focus
1091	   determination. However, the problems in focusless user interfaces
1092	   (such as a keypad) generally won't exist, since the application can
1093	   generate user interfaces which do not overlap in their usage of an
1094	   input.

1096	   The real issue is that the optimal user experience frequently
1097	   requires some kind of coupling between the differing user interface
1098	   components. This is a classic problem in multi-modal user interfaces,
1099	   such as those described by Speech Application Language Tags (SALT).
1100	   As an example, consider a user interface where a user can either
1101	   press a labeled button to make a selection, or listen to a prompt,
1102	   and speak the desired selection. Ideally, when the user presses the
1103	   button, the prompt should cease immediately, since both of them were
1104	   targeted at collecting the same information in parallel. Such
1105	   interactions are best handled by markups which natively support such
1106	   interactions, such as SALT, and thus require no explicit support from
1107	   this framework.

1109	9. Example Call Flow

1111	   This section shows the operation of a call recording application.
1112	   This application allows a user to record the media in their call by
1113	   clicking on a button in a web form. The application uses a
1114	   presentation capable user interface component that is pushed to the
1115	   caller.

1117	             A                  Recording App                  B
1118	             |(1) INVITE              |                        |
1119	             |----------------------->|                        |
1120	             |                        |(2) INVITE              |
1121	             |                        |----------------------->|
1122	             |                        |(3) 200 OK              |
1123	             |                        |<-----------------------|
1124	             |(4) 200 OK              |                        |
1125	             |<-----------------------|                        |
1126	             |(5) ACK                 |                        |
1127	             |----------------------->|                        |
1128	             |                        |(6) ACK                 |
1129	             |                        |----------------------->|
1130	             |(7) REFER               |                        |
1131	             |<-----------------------|                        |
1132	             |(8) 200 OK              |                        |
1133	             |----------------------->|                        |
1134	             |(9) NOTIFY              |                        |
1135	             |----------------------->|                        |
1136	             |(10) 200 OK             |                        |
1137	             |<-----------------------|                        |
1138	             |(11) HTTP GET           |                        |
1139	             |----------------------->|                        |
1140	             |(12) 200 OK             |                        |
1141	             |<-----------------------|                        |
1142	             |(13) HTTP POST          |                        |
1143	             |----------------------->|                        |
1144	             |(14) 200 OK             |                        |
1145	             |<-----------------------|                        |

1147	                                Figure 3

1149	   First, the caller, A, sends an INVITE to setup a call (message 1).
1150	   Since the caller supports the framework, and can handle presentation
1151	   capable user interface components, it includes the Supported header
1152	   field indicating the GRUU is understood, Allow indicating that REFER
1153	   is understood, and a Contact header field that includes the "schemes"
1154	   header field parameter.

1156	   INVITE sip:B@example.com SIP/2.0
1157	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
1158	   From: Caller <sip:A@example.com>;tag=kkaz-
1159	   To: Callee <sip:B@example.com>
1160	   Call-ID: faif9a@host.example.com
1161	   CSeq: 1 INVITE
1162	   Supported: gruu
1163	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1164	   Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
1165	   Content-Length: ...
1166	   Content-Type: application/sdp

1168	   --SDP not shown--

1170	   The proxy acts as a recording server, and forwards the INVITE to the
1171	   called party (message 2):

1173	   INVITE sip:B@pc.example.com SIP/2.0
1174	   Record-Route: <sip:app.example.com;lr>
1175	   Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh
1176	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
1177	   From: Caller <sip:A@example.com>;tag=kkaz-
1178	   To: Callee <sip:B@example.com>
1179	   Call-ID: faif9a@host.example.com
1180	   CSeq: 1 INVITE
1181	   Supported: gruu
1182	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1183	   Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
1184	   Content-Length: ...
1185	   Content-Type: application/sdp

1187	   --SDP not shown--

1189	   B accepts the call with a 200 OK (message 3). It does not support the
1190	   framework, and so the various header fields are not present.

1192	   SIP/2.0 200 OK
1193	   Record-Route: <sip:app.example.com;lr>
1194	   Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK97sh
1195	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
1196	   From: Caller <sip:A@example.com>;tag=kkaz-
1197	   To: Callee <sip:B@example.com>;tag=7777
1198	   Call-ID: faif9a@host.example.com
1199	   CSeq: 1 INVITE
1200	   Contact: <sip:B@pc.example.com>
1201	   Content-Length: ...

1203	   Content-Type: application/sdp

1205	   --SDP not shown--

1207	   This 200 OK is passed back to the caller (message 4):

1209	   SIP/2.0 200 OK
1210	   Record-Route: <sip:app.example.com;lr>
1211	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz8
1212	   From: Caller <sip:A@example.com>;tag=kkaz-
1213	   To: Callee <sip:B@example.com>;tag=7777
1214	   Call-ID: faif9a@host.example.com
1215	   CSeq: 1 INVITE
1216	   Contact: <sip:B@pc.example.com>
1217	   Content-Length: ...
1218	   Content-Type: application/sdp

1220	   --SDP not shown--

1222	   The caller generates an ACK (message 5).

1224	   ACK sip:B@pc.example.com
1225	   Route: <sip:app.example.com;lr>
1226	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9
1227	   From: Caller <sip:A@example.com>;tag=kkaz-
1228	   To: Callee <sip:B@example.com>;tag=7777
1229	   Call-ID: faif9a@host.example.com
1230	   CSeq: 1 ACK

1232	   The ACK is forwarded to the called party (message 6).

1234	   ACK sip:B@pc.example.com
1235	   Via: SIP/2.0/UDP app.example.com;branch=z9hG4bKh7s
1236	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9zz9
1237	   From: Caller <sip:A@example.com>;tag=kkaz-
1238	   To: Callee <sip:B@example.com>;tag=7777
1239	   Call-ID: faif9a@host.example.com
1240	   CSeq: 1 ACK

1242	   Now, the application decides to push a user interface component to
1243	   user A. So, it sends it a REFER request (message 7):

1245	   REFER sip:bad998asd8asd0000a@example.com SIP/2.0
1246	   Refer-To: http://app.example.com/script.pl
1247	   Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6
1248	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1249	   To: Caller <sip:A@example.com>
1250	   Call-ID: 66676776767@app.example.com
1251	   CSeq: 1 REFER
1252	   Event: refer
1253	   Contact: <sip:sip:app.example.com>

1255	   The REFER is answered by a 200 OK (message 8).

1257	   SIP/2.0 200 OK
1258	   Refer-To: http://app.example.com/script.pl
1259	   Via: SIP/2.0/UDP app.example.com;branch=z9hG4bK9zh6
1260	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1261	   To: Caller <sip:A@example.com>;tag=pqoew
1262	   Call-ID: 66676776767@app.example.com
1263	   Supported: gruu
1264	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1265	   Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
1266	   CSeq: 1 REFER

1268	   User A sends a NOTIFY (message 9):

1270	   NOTIFY sip:app.example.com SIP/2.0
1271	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995
1272	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1273	   From: Caller <sip:A@example.com>;tag=pqoew
1274	   Call-ID: 66676776767@app.example.com
1275	   CSeq: 1 NOTIFY
1276	   Max-Forwards: 70
1277	   Event: refer;id=93809824
1278	   Subscription-State: active;expires=3600
1279	   Contact: <sip:bad998asd8asd0000a@example.com>;schemes="http,sip"
1280	   Content-Type: message/sipfrag;version=2.0
1281	   Content-Length: 20

1283	   SIP/2.0 100 Trying

1285	   And the recording server responds with a 200 OK (message 10)

1287	   SIP/2.0 200 OK
1288	   Via: SIP/2.0/UDP host.example.com;branch=z9hG4bK9320394238995
1289	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1290	   From: Caller <sip:A@example.com>;tag=pqoew
1291	   Call-ID: 66676776767@app.example.com
1292	   CSeq: 1 NOTIFY

1294	   The caller, A, authorizes the application. It then acts on the
1295	   Refer-To URI, fetching the script from app.example.com (message 11).
1296	   The response, message 12, contains a web application that the user
1297	   can click on to enable recording. When the user clicks on the link
1298	   (message 13), the results are posted to the server, and an updated
1299	   display is provided (message 14).

1301	10. Security Considerations

1303	   There are many security considerations associated with this
1304	   framework. It allows applications in the network to instantiate user
1305	   interface components on a client device. Such instantiations need to
1306	   be from authenticated applications, and also need to be authorized to
1307	   place a UI into the client. Indeed, the stronger requirement is
1308	   authorization. It is not so important to know that name of the
1309	   provider of the application, but rather, that the provider is
1310	   authorized to instantiate components.

1312	   Generally, an application should be considered authorized if it was
1313	   an application that was legitimately part of the call setup path.
1314	   With this definition, authorization can be enforced using the sips
1315	   URI scheme when the call is initiated.

1317	11. Contributors

1319	   This document was produced as a result of discussions amongst the
1320	   application interaction design team. All members of this team
1321	   contributed significantly to the ideas embodied in this document. The
1322	   members of this team were:

1324	   Eric Burger
1325	   Cullen Jennings
1326	   Robert Fairlie-Cuninghame

1328	Normative References

1330	   [1]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
1331	        Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
1332	        Session Initiation Protocol", RFC 3261, June 2002.

1334	   [2]  Roach, A., "Session Initiation Protocol (SIP)-Specific Event
1335	        Notification", RFC 3265, June 2002.

1337	   [3]  McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
1338	        Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup
1339	        Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220,
1340	        February 2003.

1342	   [4]  Rosenberg, J., "Indicating User Agent Capabilities in the
1343	        Session Initiation Protocol  (SIP)",
1344	        draft-ietf-sip-callee-caps-03 (work in progress), January 2004.

1346	   [5]  Sparks, R., "The Session Initiation Protocol (SIP) Refer
1347	        Method", RFC 3515, April 2003.

1349	   [6]  Burger, E., "Keypad Stimulus Protocol (KPML)",
1350	        draft-ietf-sipping-kpml-02 (work in progress), February 2004.

1352	   [7]  Peterson, J., "Enhancements for Authenticated Identity
1353	        Management in the Session  Initiation Protocol (SIP)",
1354	        draft-ietf-sip-identity-01 (work in progress), March 2003.

1356	   [8]  Peterson, J., "SIP Authenticated Identity Body (AIB) Format",
1357	        draft-ietf-sip-authid-body-02 (work in progress), July 2003.

1359	   [9]  Rosenberg, J., "Obtaining and Using Globally Routable User Agent
1360	        (UA) URIs (GRUU) in the  Session Initiation Protocol (SIP)",
1361	        draft-ietf-sip-gruu-00 (work in progress), January 2004.

1363	Informative References

1365	   [10]  Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and
1366	         Instant Messaging", RFC 2778, February 2000.

1368	   [11]  Jennings, C., Peterson, J. and M. Watson, "Private Extensions
1369	         to the Session Initiation Protocol (SIP) for Asserted Identity
1370	         within Trusted Networks", RFC 3325, November 2002.

1372	   [12]  Rosenberg, J., "A Framework for Conferencing with the Session
1373	         Initiation Protocol",
1374	         draft-ietf-sipping-conferencing-framework-01 (work in
1375	         progress), October 2003.

1377	   [13]  Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller
1378	         Preferences for the Session Initiation Protocol (SIP)",
1379	         draft-ietf-sip-callerprefs-10 (work in progress), October 2003.

1381	   [14]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
1382	         "RTP: A Transport Protocol for Real-Time Applications", RFC
1383	         3550, July 2003.

1385	   [15]  Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
1386	         Telephony Tones and Telephony Signals", RFC 2833, May 2000.

1388	Author's Address

1390	   Jonathan Rosenberg
1391	   dynamicsoft
1392	   600 Lanidex Plaza
1393	   Parsippany, NJ  07054
1394	   US

1396	   Phone: +1 973 952-5000
1397	   EMail: jdrosen@dynamicsoft.com
1398	   URI:   http://www.jdrosen.net

1400	Intellectual Property Statement

1402	   The IETF takes no position regarding the validity or scope of any
1403	   intellectual property or other rights that might be claimed to
1404	   pertain to the implementation or use of the technology described in
1405	   this document or the extent to which any license under such rights
1406	   might or might not be available; neither does it represent that it
1407	   has made any effort to identify any such rights. Information on the
1408	   IETF's procedures with respect to rights in standards-track and
1409	   standards-related documentation can be found in BCP-11. Copies of
1410	   claims of rights made available for publication and any assurances of
1411	   licenses to be made available, or the result of an attempt made to
1412	   obtain a general license or permission for the use of such
1413	   proprietary rights by implementors or users of this specification can
1414	   be obtained from the IETF Secretariat.

1416	   The IETF invites any interested party to bring to its attention any
1417	   copyrights, patents or patent applications, or other proprietary
1418	   rights which may cover technology that may be required to practice
1419	   this standard. Please address the information to the IETF Executive
1420	   Director.

1422	Full Copyright Statement

1424	   Copyright (C) The Internet Society (2004). All Rights Reserved.

1426	   This document and translations of it may be copied and furnished to
1427	   others, and derivative works that comment on or otherwise explain it
1428	   or assist in its implementation may be prepared, copied, published
1429	   and distributed, in whole or in part, without restriction of any
1430	   kind, provided that the above copyright notice and this paragraph are
1431	   included on all such copies and derivative works. However, this
1432	   document itself may not be modified in any way, such as by removing
1433	   the copyright notice or references to the Internet Society or other
1434	   Internet organizations, except as needed for the purpose of
1435	   developing Internet standards in which case the procedures for
1436	   copyrights defined in the Internet Standards process must be
1437	   followed, or as required to translate it into languages other than
1438	   English.

1440	   The limited permissions granted above are perpetual and will not be
1441	   revoked by the Internet Society or its successors or assignees.

1443	   This document and the information contained herein is provided on an
1444	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1445	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1446	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1447	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1448	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1450	Acknowledgment

1452	   Funding for the RFC Editor function is currently provided by the
1453	   Internet Society.