idnits 2.17.1 

draft-ietf-sipping-app-interaction-framework-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1.a on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1744.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1721.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1728.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1734.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement. 

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        This document is an Internet-Draft and is subject to all provisions of
        Section 3 of RFC 3667.

        By submitting this Internet-Draft, each author represents that any
        applicable patent or other IPR claims of which he or she is aware
        have been or will be disclosed, and any of which he or she
        becomes aware will be disclosed, in accordance with Section 6 of
        BCP 79.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 6 instances of too long lines in the document, the longest one
     being 4 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 795: '...ription).  As such, user agents SHOULD...'
     RFC 2119 keyword, line 850: '... the application MAY push presentation...'
     RFC 2119 keyword, line 860: '... the application MAY push presentation...'
     RFC 2119 keyword, line 880: '...  An application MUST NOT attempt to p...'
     RFC 2119 keyword, line 883: '...t an application MUST NOT push a user ...'
     (49 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 16, 2005) is 7007 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 3265 (ref. '3') (Obsoleted by RFC 6665)

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  == Outdated reference: A later version (-08) exists of
     draft-ietf-sipping-kpml-07

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sip-gruu-02

  == Outdated reference: A later version (-06) exists of
     draft-ietf-sip-identity-03

  == Outdated reference: A later version (-05) exists of
     draft-ietf-sipping-conferencing-framework-03

  == Outdated reference: A later version (-06) exists of
     draft-ietf-sipping-dialog-package-05

  -- Obsolete informational reference (is this intentional?): RFC 2833 (ref.
     '17') (Obsoleted by RFC 4733, RFC 4734)


     Summary: 8 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	SIPPING                                                     J. Rosenberg
2	Internet-Draft                                             Cisco Systems
3	Expires: August 17, 2005                               February 16, 2005

5	   A Framework for Application Interaction in the Session Initiation
6	                             Protocol (SIP)
7	            draft-ietf-sipping-app-interaction-framework-04

9	Status of this Memo

11	   This document is an Internet-Draft and is subject to all provisions
12	   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
13	   author represents that any applicable patent or other IPR claims of
14	   which he or she is aware have been or will be disclosed, and any of
15	   which he or she become aware will be disclosed, in accordance with
16	   RFC 3668.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as
21	   Internet-Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on August 17, 2005.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   This document describes a framework for the interaction between users
43	   and Session Initiation Protocol (SIP) based applications, and defines
44	   a new Refer-To header field parameter and option tag in support of
45	   that framework.  By interacting with applications, users can guide
46	   the way in which they operate.  The focus of this framework is
47	   stimulus signaling, which allows a user agent to interact with an
48	   application without knowledge of the semantics of that application.

50	   Stimulus signaling can occur to a user interface running locally with
51	   the client, or to a remote user interface, through media streams.
52	   Stimulus signaling encompasses a wide range of mechanisms, ranging
53	   from clicking on hyperlinks, to pressing buttons, to traditional Dual
54	   Tone Multi Frequency (DTMF) input.  In all cases, stimulus signaling
55	   is supported through the use of markup languages, which play a key
56	   role in this framework.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
61	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
62	   3.  A Model for Application Interaction  . . . . . . . . . . . . .  7
63	     3.1   Functional vs. Stimulus  . . . . . . . . . . . . . . . . .  9
64	     3.2   Real-Time vs. Non-Real Time  . . . . . . . . . . . . . . .  9
65	     3.3   Client-Local vs. Client-Remote . . . . . . . . . . . . . . 10
66	     3.4   Presentation Capable vs. Presentation Free . . . . . . . . 11
67	   4.  Interaction Scenarios on Telephones  . . . . . . . . . . . . . 11
68	     4.1   Client Remote  . . . . . . . . . . . . . . . . . . . . . . 12
69	     4.2   Client Local . . . . . . . . . . . . . . . . . . . . . . . 12
70	     4.3   Flip-Flop  . . . . . . . . . . . . . . . . . . . . . . . . 12
71	   5.  Framework Overview . . . . . . . . . . . . . . . . . . . . . . 13
72	   6.  Deployment Topologies  . . . . . . . . . . . . . . . . . . . . 15
73	     6.1   Third Party Application  . . . . . . . . . . . . . . . . . 16
74	     6.2   Co-Resident Application  . . . . . . . . . . . . . . . . . 16
75	     6.3   Third Party Application and User Device Proxy  . . . . . . 17
76	     6.4   Proxy Application  . . . . . . . . . . . . . . . . . . . . 18
77	   7.  Application Behavior . . . . . . . . . . . . . . . . . . . . . 19
78	     7.1   Client Local Interfaces  . . . . . . . . . . . . . . . . . 19
79	       7.1.1   Discovering Capabilities . . . . . . . . . . . . . . . 19
80	       7.1.2   Pushing an Initial Interface Component . . . . . . . . 20
81	       7.1.3   Updating an Interface Component  . . . . . . . . . . . 22
82	       7.1.4   Terminating an Interface Component . . . . . . . . . . 22
83	     7.2   Client Remote Interfaces . . . . . . . . . . . . . . . . . 23
84	       7.2.1   Originating and Terminating Applications . . . . . . . 23
85	       7.2.2   Intermediary Applications  . . . . . . . . . . . . . . 24
86	   8.  User Agent Behavior  . . . . . . . . . . . . . . . . . . . . . 24
87	     8.1   Advertising Capabilities . . . . . . . . . . . . . . . . . 24
88	     8.2   Receiving User Interface Components  . . . . . . . . . . . 25
89	     8.3   Mapping User Input to User Interface Components  . . . . . 26
90	     8.4   Receiving Updates to User Interface Components . . . . . . 27
91	     8.5   Terminating a User Interface Component . . . . . . . . . . 27
92	   9.  Inter-Application Feature Interaction  . . . . . . . . . . . . 28
93	     9.1   Client Local UI  . . . . . . . . . . . . . . . . . . . . . 28
94	     9.2   Client-Remote UI . . . . . . . . . . . . . . . . . . . . . 29
95	   10.   Intra Application Feature Interaction  . . . . . . . . . . . 30
96	   11.   Example Call Flow  . . . . . . . . . . . . . . . . . . . . . 30
97	   12.   Security Considerations  . . . . . . . . . . . . . . . . . . 35
98	   13.   IANA Considerations  . . . . . . . . . . . . . . . . . . . . 36
99	     13.1  SIP Option Tag . . . . . . . . . . . . . . . . . . . . . . 36
100	     13.2  Header Field Parameter . . . . . . . . . . . . . . . . . . 36
101	   14.   Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36
102	   15.   Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
103	   16.   References . . . . . . . . . . . . . . . . . . . . . . . . . 37
104	   16.1  Normative References . . . . . . . . . . . . . . . . . . . . 37
105	   16.2  Informative References . . . . . . . . . . . . . . . . . . . 37
106	       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 38
107	       Intellectual Property and Copyright Statements . . . . . . . . 39

109	1.  Introduction

111	   The Session Initiation Protocol (SIP) [1] provides the ability for
112	   users to initiate, manage, and terminate communications sessions.
113	   Frequently, these sessions will involve a SIP application.  A SIP
114	   application is defined as a program running on a SIP-based element
115	   (such as a proxy or user agent) that provides some value-added
116	   function to a user or system administrator.  Examples of SIP
117	   applications include pre-paid calling card calls, conferencing, and
118	   presence-based [11] call routing.

120	   In order for most applications to properly function, they need input
121	   from the user to guide their operation.  As an example, a pre-paid
122	   calling card application requires the user to input their calling
123	   card number, their PIN code, and the destination number they wish to
124	   reach.  The process by which a user provides input to an application
125	   is called "application interaction".

127	   Application interaction can be either functional or stimulus.
128	   Functional interaction requires the user device to understand the
129	   semantics of the application, whereas stimulus interaction does not.
130	   Stimulus signaling allows for applications to be built without
131	   requiring modifications to the user device.  Stimulus interaction is
132	   the subject of this framework.  The framework provides a model for
133	   how users interact with applications through user interfaces, and how
134	   user interfaces and applications can be distributed throughout a
135	   network.  This model is then used to describe how applications can
136	   instantiate and manage user interfaces.

138	   This document also defines a new SIP Refer-To header field parameter
139	   and a new SIP option tag indicating support for that parameter.

141	2.  Definitions

143	   SIP Application: A SIP application is defined as a program running on
144	      a SIP-based element (such as a proxy or user agent) that provides
145	      some value-added function to a user or system administrator.
146	      Examples of SIP applications include pre-paid calling card calls,
147	      conferencing, and presence-based [11] call routing.

149	   Application Interaction: The process by which a user provides input
150	      to an application.

152	   Real-Time Application Interaction: Application interaction that takes
153	      place while an application instance is executing.  For example,
154	      when a user enters their PIN number into a pre-paid calling card
155	      application, this is real-time application interaction.

157	   Non-Real Time Application Interaction: Application interaction that
158	      takes place asynchronously with the execution of the application.
159	      Generally, non-real time application interaction is accomplished
160	      through provisioning.

162	   Functional Application Interaction: Application interaction is
163	      functional when the user device has an understanding of the
164	      semantics of the interaction with the application.

166	   Stimulus Application Interaction: Application interaction is
167	      considered to be stimulus when the user device has no
168	      understanding of the semantics of the interaction with the
169	      application.

171	   User Interface (UI): The user interface provides the user with
172	      context in order to make decisions about what they want.  The user
173	      interacts with the device, which conveys the user input the the
174	      user interface.  The user interface interprets the information,
175	      and passes it to the application.

177	   User Interface Component: A piece of user interface which operates
178	      independently of other pieces of the user interface.  For example,
179	      a user might have two separate web interfaces to a pre-paid
180	      calling card application - one for hanging up and making another
181	      call, and another for entering the username and PIN.

183	   User Device: The software or hardware system that the user directly
184	      interacts with in order to communicate with the application.  An
185	      example of a user device is a telephone.  Another example is a PC
186	      with a web browser.

188	   User Device Proxy: A software or hardware system that a user
189	      indirectly interacts through in order to communicate with the
190	      application.  This indirection can be through a network.  An
191	      example is a gateway from IP to the Public Switched Telephone
192	      Network (PSTN).  It acts a user device proxy, acting on behalf of
193	      the user on the circuit network.

195	   User Input: The "raw" information passed from a user to a user
196	      interface.  Examples of user input include a spoken word or a
197	      click on a hyperlink.

199	   Client-Local User Interface: A user interface which is co-resident
200	      with the user device.

202	   Client-Remote User Interface: A user interface which executes
203	      remotely from the user device.  In this case, a standardized
204	      interface is needed between the user device and the user
205	      interface.  Typically, this is done through media sessions -
206	      audio, video, or application sharing.

208	   Markup Language: A markup language describes a logical flow of
209	      presentation of information to the user, collection of information
210	      from the user, and transmission of that information to an
211	      application.

213	   Media Interaction: A means of separating a user and a user interface
214	      by connecting them with media streams.

216	   Interactive Voice Response (IVR): An IVR is a type of user interface
217	      that allows users to speak commands to the application, and hear
218	      responses to those commands prompting for more information.

220	   Prompt-and-Collect: The basic primitive of an IVR user interface.
221	      The user is presented with a voice option, and the user speaks
222	      their choice.

224	   Barge-In: The act of entering information into an IVR user inteface
225	      prior to the completion of a prompt requesting that information.

227	   Focus: A user interface component has focus when user input is
228	      provided fed to it, as opposed to any other user interface
229	      components.  This is not to be confused with the term focus within
230	      the SIP conferencing framework, which refers to the center user
231	      agent in a conference [13].

233	   Focus Determination: The process by which the user device determines
234	      which user interface component will receive the user input.

236	   Focusless Device: A user device which has no ability to perform focus
237	      determination.  An example of a focusless device is a telephone
238	      with a keypad.

240	   Presentation Capable UI: A user interface which can prompt the user
241	      with input, collect results, and then prompt the user with new
242	      information based on those results.

244	   Presentation Free UI: A user interface which cannot prompt the user
245	      with information.

247	   Feature Interaction: A class of problems which result when multiple
248	      applications or application components are trying to provide
249	      services to a user at the same time.

251	   Inter-Application Feature Interaction: Feature interactions that
252	      occur between applications.

254	   DTMF: Dual-Tone Multi-Frequency.  DTMF refer to a class of tones
255	      generated by circuit switched telephony devices when the user
256	      presses a key on the keypad.  As a result, DTMF and keypad input
257	      are often used synonymously, when in fact one of them (DTMF) is
258	      merely a means of conveying the other (the keypad input) to a
259	      client-remote user interface (the switch, for example).

261	   Application Instance: A single execution path of a SIP application.

263	   Originating Application: A SIP application which acts as a UAC,
264	      making a call on behalf of the user.

266	   Terminating Application: A SIP application which acts as a UAS,
267	      answering a call generated by a user.  IVR applications are
268	      terminating applications.

270	   Intermediary Application: A SIP application which is neither the
271	      caller or callee, but rather, a third party involved in a call.

273	3.  A Model for Application Interaction

275	         +---+            +---+            +---+             +---+
276	         |   |            |   |            |   |             |   |
277	         |   |            | U |            | U |             | A |
278	         |   |   Input    | s |   Input    | s |   Results   | p |
279	         |   | ---------> | e | ---------> | e | ----------> | p |
280	         | U |            | r |            | r |             | l |
281	         | s |            |   |            |   |             | i |
282	         | e |            | D |            | I |             | c |
283	         | r |   Output   | e |   Output   | f |   Update    | a |
284	         |   | <--------- | v | <--------- | a | <.......... | t |
285	         |   |            | i |            | c |             | i |
286	         |   |            | c |            | e |             | o |
287	         |   |            | e |            |   |             | n |
288	         |   |            |   |            |   |             |   |
289	         +---+            +---+            +---+             +---+

291	               Figure 1: Model for Real-Time Interactions

293	   Figure 1 presents a general model for how users interact with
294	   applications.  Generally, users interact with a user interface
295	   through a user device.  A user device can be a telephone, or it can
296	   be a PC with a web browser.  Its role is to pass the user input from
297	   the user, to the user interface.  The user interface provides the
298	   user with context in order to make decisions about what they want.
299	   The user interacts with the device, causing information to be passed
300	   from the device to the user interface.  The user interface interprets
301	   the information, and passes it as a user interface event to the
302	   application.  The application may be able to modify the user
303	   interface based on this event.  Whether or not this is possible
304	   depends on the type of user interface.

306	   User interfaces are fundamentally about rendering and interpretation.
307	   Rendering refers to the way in which the user is provided context.

309	   This can be through hyperlinks, images, sounds, videos, text, and so
310	   on.  Interpretation refers to the way in which the user interface
311	   takes the "raw" data provided by the user, and returns the result to
312	   the application as a meaningful event, abstracted from the
313	   particulars of the user interface.  As an example, consider a
314	   pre-paid calling card application.  The user interface worries about
315	   details such as what prompt the user is provided, whether the voice
316	   is male or female, and so on.  It is concerned with recognizing the
317	   speech that the user provides, in order to obtain the desired
318	   information.  In this case, the desired information is the calling
319	   card number, the PIN code, and the destination number.  The
320	   application needs that data, and it doesn't matter to the application
321	   whether it was collected using a male prompt or a female one.

323	   User interfaces generally have real-time requirements towards the
324	   user.  That is, when a user interacts with the user interface, the
325	   user interface needs to react quickly, and that change needs to be
326	   propagated to the user right away.  However, the interface between
327	   the user interface and the application need not be that fast.  Faster
328	   is better, but the user interface itself can frequently compensate
329	   for long latencies there.  In the case of a pre-paid calling card
330	   application, when the user is prompted to enter their PIN, the prompt
331	   should generally stop immediately once the first digit of the PIN is
332	   entered.  This is referred to as barge-in.  After the user-interface
333	   collects the rest of the PIN, it can tell the user to "please wait
334	   while processing".  The PIN can then be gradually transmitted to the
335	   application.  In this example, the user interface has compensated for
336	   a slow UI to application interface by asking the user to wait.

338	   The separation between user interface and application is absolutely
339	   fundamental to the entire framework provided in this document.  Its
340	   importance cannot be overstated.

342	   With this basic model, we can begin to taxonomize the types of
343	   systems that can be built.

345	3.1  Functional vs. Stimulus

347	   The first way to taxonomize the system is to consider the interface
348	   between the UI and the application.  There are two fundamentally
349	   different models for this interface.  In a functional interface, the
350	   user interface has detailed knowledge about the application, and is,
351	   in fact, specific to the application.  The interface between the two
352	   components is through a functional protocol, capable of representing
353	   the semantics which can be exposed through the user interface.
354	   Because the user interface has knowledge of the application, it can
355	   be optimally designed for that application.  As a result, functional
356	   user interfaces are almost always the most user friendly, the fastest
357	   and the most responsive.  However, in order to allow interoperability
358	   between user devices and applications, the details of the functional
359	   protocols need to be specified in standards.  This slows down
360	   innovation and limits the scope of applications that can be built.

362	   An alternative is a stimulus interface.  In a stimulus interface, the
363	   user interface is generic; totally ignorant of the details of the
364	   application.  Indeed, the application may pass instructions to the
365	   user interface describing how it should operate.  The user interface
366	   translates user input into "stimulus" - which are data understood
367	   only by the application, and not by the user interface.  Because they
368	   are generic, and because they require communications with the
369	   application in order to change the way in which they render
370	   information to the user, stimulus user interfaces are usually slower,
371	   less user friendly, and less responsive than a functional
372	   counterpart.  However, they allow for substantial innovation in
373	   applications, since no standardization activity is needed to build a
374	   new application, as long as it can interact with the user within the
375	   confines of the user interface mechanism.  The web is an example of a
376	   stimulus user interface to applications.

378	   In SIP systems, functional interfaces are provided by extending the
379	   SIP protocol to provide the needed functionality.  For example, the
380	   SIP caller preferences specification [14] provides a functional
381	   interface that allows a user to request applications to route the
382	   call to specific types of user agents.  Functional interfaces are
383	   important, but are not the subject of this framework.  The primary
384	   goal of this framework is to address the role of stimulus interfaces
385	   to SIP applications.

387	3.2  Real-Time vs. Non-Real Time

389	   Application interaction systems can also be real-time or
390	   non-real-time.  Non-real interaction allows the user to enter
391	   information about application operation asynchronously with its
392	   invocation.  Frequently, this is done through provisioning systems.

394	   As an example, a user can set up the forwarding number for a
395	   call-forward on no-answer application using a web page.  Real-time
396	   interaction requires the user to interact with the application at the
397	   time of its invocation.

399	3.3  Client-Local vs. Client-Remote

401	   Another axis in the taxonomization is whether the user interface is
402	   co-resident with the user device (which we refer to as a client-local
403	   user interface), or the user interface runs in a host separated from
404	   the client (which we refer to as a client-remote user interface).  In
405	   a client-remote user interface, there exists some kind of protocol
406	   between the client device and the UI that allows the client to
407	   interact with the user interface over a network.

409	   The most important way to separate the UI and the client device is
410	   through media interaction.  In media interaction, the interface
411	   between the user and the user interface is through media - audio,
412	   video, messaging, and so on.  This is the classic mode of operation
413	   for VoiceXML [4], where the user interface (also referred to as the
414	   voice browser) runs on a platform in the network.  Users communicate
415	   with the voice browser through the telephone network (or using a SIP
416	   session).  The voice browser interacts with the application using
417	   HTTP to convey the information collected from the user.

419	   In the case of a client-local user interface, the user interface runs
420	   co-located with the user device.  The interface between them is
421	   through the software that interprets the users input and passes them
422	   to the user interface.  The classic example of this is the web.  In
423	   the web, the user interface is a web browser, and the interface is
424	   defined by the HTML document that it's rendering.  The user interacts
425	   directly with the user interface running in the browser.  The results
426	   of that user interface are sent to the application (running on the
427	   web server) using HTTP.

429	   It is important to note that whether or not the user interface is
430	   local or remote (in the case of media interaction) is not a property
431	   of the modality of the interface, but rather a property of the
432	   system.  As an example, it is possible for a web-based user interface
433	   to be provided with a client-remote user interface.  In such a
434	   scenario, video and application sharing media sessions can be used
435	   between the user and the user interface.  The user interface, still
436	   guided by HTML, now runs "in the network", remote from the client.
437	   Similarly, a VoiceXML document can be interpreted locally by a client
438	   device, with no media streams at all.  Indeed, the VoiceXML document
439	   can be rendered using text, rather than media, with no impact on the
440	   interface between the user interface and the application.

442	   It is also important to note that systems can be hybrid.  In a hybrid
443	   user interface, some aspects of it (usually those associated with a
444	   particular modality) run locally, and others run remotely.

446	3.4  Presentation Capable vs. Presentation Free

448	   A user interface can be capable of presenting information to the user
449	   (a presentation capable UI), or it can be capable only of collecting
450	   user input (a presentation free UI).  These are very different types
451	   of user interfaces.  A presentation capable UI can provide the user
452	   with feedback after every input, providing the context for collecting
453	   the next input.  As a result, presentation capable user interfaces
454	   require an update to the information provided to the user after each
455	   input.  The web is a classic example of this.  After every input
456	   (i.e., a click), the browser provides the input to the application
457	   and fetches the next page to render.  In a presentation free user
458	   interface, this is not the case.  Since the user is not provided with
459	   feedback, these user interfaces tend to merely collect information as
460	   its entered, and pass it to the application.

462	   Another difference is that a presentation-free user interface cannot
463	   support the concept of a focus.  As a result, if multiple
464	   applications wish to gather input from the user, there is no way for
465	   the user to select which application the input is destined for.  The
466	   input provided to applications through presentation-free user
467	   interfaces is more of a broadcast or notification operation, as a
468	   result.

470	4.  Interaction Scenarios on Telephones

472	   In this section, we applied the model of Section 3 to telephones.

474	   In a traditional telephone, the user interface consists of a 12-key
475	   keypad, a speaker, and a microphone.  Indeed, from here forward, the
476	   term "telephone" is used to represent any device that meets, at a
477	   minimum, the characteristics described in the previous sentence.
478	   Circuit-switched telephony applications are almost universally
479	   client-remote user interfaces.  In the Public Switched Telephone
480	   Network (PSTN), there is usually a circuit interface between the user
481	   and the user interface.  The user input from the keypad is conveyed
482	   used Dual-Tone Multi-Frequency (DTMF), and the microphone input as
483	   Pulse Code Modulated (PCM) encoded voice.

485	   In an IP-based system, there is more variability in how the system
486	   can be instantiated.  Both client-remote and client-local user
487	   interfaces to a telephone can be provided.

489	   In this framework, a PSTN gateway can be considered a User Device
490	   Proxy.  It is a proxy for the user because it can provide, to a user
491	   interface on an IP network, input taken from a user on a circuit
492	   switched telephone.  The gateway may be able to run a client-local
493	   user interface, just as an IP telephone might.

495	4.1  Client Remote

497	   The most obvious instantiation is the "classic" circuit-switched
498	   telephony model.  In that model, the user interface runs remotely
499	   from the client.  The interface between the user and the user
500	   interface is through media, set up by SIP and carried over the Real
501	   Time Transport Protocol (RTP) [16].  The microphone input can be
502	   carried using any suitable voice encoding algorithm.  The keypad
503	   input can be conveyed in one of two ways.  The first is to convert
504	   the keypad input to DTMF, and then convey that DTMF using a suitance
505	   encoding algorithm for it (such as PCMU).  An alternative, and
506	   generally the preferred approach, is to transmit the keypad input
507	   using RFC 2833 [17], which provides an encoding mechanism for
508	   carrying keypad input within RTP.

510	   In this classic model, the user interface would run on a server in
511	   the IP network.  It would perform speech recognition and DTMF
512	   recognition to derive the user intent, feed them through the user
513	   interface, and provide the result to an application.

515	4.2  Client Local

517	   An alternative model is for the entire user interface to reside on
518	   the telephone.  The user interface can be a VoiceXML browser, running
519	   speech recognition on the microphone input, and feeding the keypad
520	   input directly into the script.  As discussed above, the VoiceXML
521	   script could be rendered using text instead of voice, if the
522	   telephone had a textual display.

524	   For simpler phones without a display, the user interface can be
525	   described by a Keypad Markup Language request document [7].  As the
526	   user enters digits in the keypad, they are passed to the user
527	   interface, which generates user interface events that can be
528	   transported to the application.

530	4.3  Flip-Flop

532	   A middle-ground approach is to flip back and forth between a
533	   client-local and client-remote user interface.  Many voice
534	   applications are of the type which listen to the media stream and
535	   wait for some specific trigger that kicks off a more complex user
536	   interaction.  The long pound in a pre-paid calling card application
537	   is one example.  Another example is a conference recording
538	   application, where the user can press a key at some point in the call
539	   to begin recording.  When the key is pressed, the user hears a
540	   whisper to inform them that recording has started.

542	   The ideal way to support such an application is to install a
543	   client-local user interface component that waits for the trigger to
544	   kick off the real interaction.  Once the trigger is received, the
545	   application connects the user to a client-remote user interface that
546	   can play announements, collect more information, and so on.

548	   The benefit of flip-flopping between a client-local and client-remote
549	   user interface is cost.  The client-local user interface will
550	   eliminate the need to send media streams into the network just to
551	   wait for the user to press the pound key on the keypad.

553	   The Keypad Markup Language (KPML) was designed to support exactly
554	   this kind of need [7].  It models the keypad on a phone, and allows
555	   an application to be informed when any sequence of keys have been
556	   pressed.  However, KPML has no presentation component.  Since user
557	   interfaces generally require a response to user input, the
558	   presentation will need to be done using a client-remote user
559	   interface that gets instantiated as a result of the trigger.

561	   It is tempting to use a hybrid model, where a prompt-and-collect
562	   application is implemented by using a client-remote user interface
563	   that plays the prompts, and a client-local user interface, described
564	   by KPML, that collects digits.  However, this only complicates the
565	   application.  Firstly, the keypad input will be sent to both the
566	   media stream and the KPML user interface.  This requires the
567	   application to sort out which user inputs are duplicates, a process
568	   that is very complicated.  Secondly, the primary benefit of KPML is
569	   to avoid having a media stream towards a user interface.  However,
570	   there is already a media stream for the prompting, so there is no
571	   real savings.

573	5.  Framework Overview

575	   In this framework, we use the term "SIP application" to refer to a
576	   broad set of functionality.  A SIP application is a program running
577	   on a SIP-based element (such as a proxy or user agent) that provides
578	   some value-added function to a user or system administrator.  SIP
579	   applications can execute on behalf of a caller, a called party, or a
580	   multitude of users at once.

582	   Each application has a number of instances that are executing at any
583	   given time.  An instance represents a single execution path for an
584	   application.  Each instance has a well defined lifecycle.  It is
585	   established as a result of some event.  That event can be a SIP
586	   event, such as the reception of a SIP INVITE request, or it can be a
587	   non-SIP event, such as a web form post or even a timer.  Application
588	   instances also have a specific end time.  Some instances have a
589	   lifetime that is coupled with a SIP transaction or dialog.  For
590	   example, a proxy application might begin when an INVITE arrives, and
591	   terminate when the call is answered.  Other applications have a
592	   lifetime that spans multiple dialogs or transactions.  For example, a
593	   conferencing application instance may exist so long as there are any
594	   dialogs connected to it.  When the last dialog terminates, the
595	   application instance terminates.  Other applications have a liftime
596	   that is completely decoupled from SIP events.

598	   It is fundamental to the framework described here that multiple
599	   application instances may interact with a user during a single SIP
600	   transaction or dialog.  Each instance may be for the same
601	   application, or different applications.  Each of the applications may
602	   be completely independent, in that they may be owned by different
603	   providers, and may not be aware of each others existence.  Similarly,
604	   there may be application instances interacting with the caller, and
605	   instances interacting with the callee, both within the same
606	   transaction or dialog.

608	   The first step in the interaction with the user is to instantiate one
609	   or more user interface components for the application instance.  A
610	   user interface component is a single piece of the user interface that
611	   is defined by a logical flow that is not synchronously coupled with
612	   any other component.  In other words, each component runs more or
613	   less independently.

615	   A user interface component can be instantiated in one of the user
616	   agents in a dialog (for a client-local user interface), or within a
617	   network element (for a client-remote user interface).  If a
618	   client-local user interface is to be used, the application needs to
619	   determine whether or not the user agent is capable of supporting a
620	   client-local user interface, and in what format.  In this framework,
621	   all client-local user interface components are described by a markup
622	   language.  A markup language describes a logical flow of presentation
623	   of information to the user, collection of information from the user,
624	   and transmission of that information to an application.  Examples of
625	   markup languages include HTML, WML, VoiceXML, and the Keypad Markup
626	   Language (KPML) [7].

628	   Unlike an application instance, which has very flexible lifetimes, a
629	   user interface component has a very fixed lifetime.  A user interface
630	   component is always associated with a dialog.  The user interface
631	   component can be created at any point after the dialog (or early
632	   dialog) is created.  However, the user interface component terminates
633	   when the dialog terminates.  The user interface component can be
634	   terminated earlier by the user agent, and possibly by the
635	   application, but its lifetime never exceeds that of its associated
636	   dialog.

638	   There are two ways to create a client local interface component.  For
639	   interface components that are presentation capable, the application
640	   sends a REFER [6] request to the user agent.  The Refer-To header
641	   field contains an HTTP URI that points to the markup for the user
642	   interface.  For interface components that are presentation free (such
643	   as those defined by KPML), the application sends a SUBSCRIBE request
644	   to the user agent.  The body of the SUBSCRIBE request contains a
645	   filter, which, in this case, is the markup that defines when
646	   information is to be sent to the application in a NOTIFY.

648	   If a user interface component is to be instantiated in the network,
649	   there is no need to determine the capabilities of the device on which
650	   the user interface is instantiated.  Presumably, it is on a device on
651	   which the application knows a UI can be created.  However, the
652	   application does need to connect the user device to the user
653	   interface.  This will require manipulation of media streams in order
654	   to establish that connection.

656	   The interface between the user interface component and the
657	   application depends on the type of user interface.  For presentation
658	   capable user interfaces, such as those described by  HTML and
659	   VoiceXML, HTTP form POST operations are used.  For presentation free
660	   user interfaces, a SIP NOTIFY is used.  The differing needs and
661	   capabilities of these two user interfaces, as described in Section
662	   3.4, is what drives the different choices for the interactions.
663	   Since presentation capable user interfaces require an update to the
664	   presentation every time user data is entered, they are a good match
665	   for HTTP.  Since presentation free user interfaces merely transmit
666	   user input to the application, a NOTIFY is more appropriate.

668	   Indeed, for presentation free user interfaces, there are two
669	   different modalities of operation.  The first is called "one shot".
670	   In the one-shot role, the markup waits for a user to enter some
671	   information, and when they do, reports this event to the application.
672	   The application then does something, and the markup is no longer
673	   used.  In the other modality, called "monitor", the markup stays
674	   permanently resident, and reports information back to an application
675	   until termination of the associated dialog.

677	6.  Deployment Topologies

679	   This section presents some of the network topologies in which this
680	   framework can be instantiated.

682	6.1  Third Party Application

684	                    +-------------+
685	                /---| Application |
686	               /    +-------------+
687	              /
688	       SUB/  / REFER/
689	       NOT  /  HTTP
690	           /
691	      +--------+    SIP (INVITE)    +-----+
692	      |   UI   A--------------------X     |
693	      |........|                    | SIP |
694	      |  User  |        RTP         | UA  |
695	      | Device B--------------------Y     |
696	      +--------+                    +-----+

698	                     Figure 2: Third Party Topology

700	   In this topology, the application that is interested in interacting
701	   with the users exists outside of the SIP dialog between the user
702	   agents.  In that case, the application learns about the initiation
703	   and termination of the dialog, along with the dialog identifiers,
704	   through some out of band means.  One such possibility is the dialog
705	   event package [15].  Dialog information is only revealed to trusted
706	   parties, so the application would need to be trusted by one of the
707	   users in order to obtain this information.

709	   At any point during the dialog, the application can instantiate user
710	   interface components on the user device of the caller or callee.  It
711	   can do this either using SUBSCRIBE or REFER, depending on the type of
712	   user interface (presentation capable or presentation free).

714	6.2  Co-Resident Application

716	      +--------+    SIP (INVITE)    +-----+
717	      |  User  A--------------------X SIP |
718	      | Device |        RTP         | UA  |
719	      |........B--------------------Y     |
720	      |        |    SUB/NOT         | App)|
721	      |  UI    A'-------------------X'    |
722	      +--------+    REFER/HTTP      +-----+

724	                     Figure 3: Co-Resident Topology

726	   In this deployment topology, the application is co-resident with one
727	   of the user agents (the one on the right in the picture above).  This
728	   application can install client-local user interface components on the
729	   other user agent, which is acting as the user device.  These
730	   components can be installed using either SUBSCRIBE, for presentation
731	   free user interfaces, or REFER, for presentation capable ones.  This
732	   situation typically arises when the application wishes to install UI
733	   components on a presentation capable user interface.  If the only
734	   user input is via keypad input, the framework is not needed per se,
735	   because the UA/application will receive the input via RFC 2833 in the
736	   RTP stream.

738	   If the application resides in the called party, it is called a
739	   terminating application.  If it resides in the calling party, it is
740	   called an originating application.

742	   This kind of topology is common in protocol converter and gateway
743	   applications.

745	6.3  Third Party Application and User Device Proxy

747	                                               +-------------+
748	                                           /---| Application |
749	                                          /    +-------------+
750	                                         /
751	                                   SUB/ /  REFER/
752	                                   NOT /   HTTP
753	                                      /
754	      +-----+        SIP         +---M----+        SIP         +-----+
755	      |     V--------------------C        A--------------------X     |
756	      | SIP |                    |   UI   |                    | SIP |
757	      | UAa |        RTP         |        |        RTP         | UAb |
758	      |     W--------------------D        B--------------------Y     |
759	      +-----+                    +--------+                    +-----+
760	       User                         User
761	       Device                      Device
762	                                   Proxy

764	                  Figure 4: User Device Proxy Topology

766	   In this deployment topology, there is a third party application as in
767	   Section 6.1.  However, instead of installing a user interface
768	   component on the end user device, the component is installed in an
769	   intermediate device, known as a User Device Proxy.  From the
770	   perspective of the actual user device (on the left), the User Device
771	   Proxy is a client remote user interface.  As such, media, typically
772	   transported using RTP (including RFC 2833 for carrying user input),
773	   is sent from the user device to the client remote user interface on
774	   the User Device Proxy.  As far as the application is concerned, it is
775	   installing what it thinks is a client local user interface on the
776	   user device, but it happens to be on a user device proxy which looks
777	   like the user device to the application.

779	   The user device proxy will need to terminate and re-originate both
780	   signaling (SIP) and media traffic towards the actual peer in the
781	   conversation.  The User Device Proxy is a media relay in the
782	   terminology of RFC 3550 [16].  The User Device Proxy will need to
783	   monitor the media streams associated with each dialog, in order to
784	   convert user input received in the media stream to events reported to
785	   the user interface.  This can pose a challenge in multi-media
786	   systems, where it may be unclear on which media stream the user input
787	   is being sent.  As discussed in RFC 3264 [18], if a user agent has a
788	   single media source and is supporting multiple streams, it is
789	   supposed to send that source to all streams.  In cases where there
790	   are multiple sources, the mapping is a matter of local policy.  In
791	   the absence of a way to explicitly identify or request which sources
792	   map to which streams, the user device proxy will need to do the best
793	   job it can.  This specification RECOMMENDS that the User Device Proxy
794	   monitor the first stream (defined in terms of ordering of media
795	   sessions within a session description).  As such, user agents SHOULD
796	   send their user input on the first stream, absent a policy to direct
797	   it otherwise.

799	6.4  Proxy Application
800	                             +----------+
801	               SUB/NOT       |   App    |      SUB/NOT
802	            +--------------->|          |<-----------------+
803	            |  REFER/HTTP    |..........|     REFER/HTTP   |
804	            |                |   SIP    |                  |
805	            |                |  Proxy   |                  |
806	            |                +----------+                  |
807	            V                 ^        |                   V
808	      +----------+            |        |             +----------+
809	      |   UI     |   INVITE   |        |    INVITE   |   UI     |
810	      |          |------------+        +------------>|          |
811	      |......... |                                   |..........|
812	      |   SIP    |...................................|   SIP    |
813	      |   UA     |                                   |   UA     |
814	      +----------+               RTP                 +----------+
815	        User Device                                    User Device

817	                  Figure 5: Proxy Application Topology

819	   In this topology, the application is co-resident with a transaction
820	   stateful, record-routing proxy server on the call path between two
821	   user devices.  The application uses SUBSCRIBE or REFER to install
822	   user interface components on one or both user devices.

824	   This topology is common in routing applications, such as a
825	   web-assisted call routing application.

827	7.  Application Behavior

829	   The behavior of an application within this framework depends on
830	   whether it seeks to use a client-local or client-remote user
831	   interface.

833	7.1  Client Local Interfaces

835	   One key component of this framework is support for client local user
836	   interfaces.

838	7.1.1  Discovering Capabilities

840	   A client local user interface can only be instantiated on a user
841	   agent if the user agent supports that type of user interface
842	   component.  Support for client local user interface components is
843	   declared by both the UAC and a UAS in its Accept, Allow, Contact and
844	   Allow-Event header fields of dialog-initiating requests and
845	   responses.  If the Allow header field indicates support for the SIP
846	   SUBSCRIBE method, and the Allow-Event header field indicates support
847	   for the kpml package [7], and the Supported header field indicates
848	   that its Contact URI is a GRUU [8], it means that the UA can
849	   instantiate presentation free user interface components.  In this
850	   case, the application MAY push presentation free user interface
851	   components according to the rules of Section 7.1.2.  The specific
852	   markup languages that can be supported are indicated in the Accept
853	   header field.

855	   If the Allow header field indicates support for the SIP REFER method,
856	   the Supported header field indicates support for the "refer-context"
857	   extension described below, and the Contact header field contains UA
858	   capabilities [5] that indicate support for the HTTP URI scheme, it
859	   means that the UA supports presentation capable user interface
860	   components.  In this case, the application MAY push presentation
861	   capable user interface components to the client according to the
862	   rules of Section 7.1.2.  The specific markups that are supported are
863	   indicated in the Accept header field.

865	   A third party application that is not present on the call path will
866	   not be privy to these headers in the dialog requests that pass by.
867	   As such, it will need to obtain this capability information in other
868	   ways.  One way is through the registration event package [19], which
869	   can contain user agent capability information provided in REGISTER
870	   requests [5].

872	7.1.2  Pushing an Initial Interface Component

874	   Generally, we anticipate that interface components will need to be
875	   created at various different points in a SIP session.  Clearly, they
876	   will need to be pushed during session setup, or after the session is
877	   established.  A user interface component is always associated with a
878	   specific dialog, however.

880	   An application MUST NOT attempt to push a user interface component to
881	   a user agent until it has determined that the user agent has the
882	   neccesary capabilities and a dialog has been created.  In the case of
883	   a UAC, this means that an application MUST NOT push a user interface
884	   component for an INVITE initiated dialog until the application has
885	   seen a request confirming the receipt of a dialog-creating response.
886	   This could be an ACK for a 200 OK, or a PRACK for a provisional
887	   response [2].  For SUBSCRIBE initiated dialogs, it MUST NOT push a
888	   user interface component until the application has seen a 200 OK to
889	   the NOTIFY request.  For a user interface component on a UAS, the
890	   application MUST NOT push a user interface component for an INVITE
891	   initiated dialog until it has seen a dialog-creating response from
892	   the UAS.  For a SUBSCRIBE initiated dialog, it MUST NOT push a user
893	   interface component until it has seen a NOTIFY request from the
894	   notifier.

896	   To create a presentation capable UI component on the UA, the
897	   application sends a REFER request to the UA.  This REFER MUST be sent
898	   to the Globally Routable UA URI (GRUU) [8] advertised by that UA in
899	   the Contact header field of the dialog initiating request or response
900	   sent by that UA.  Note that this REFER request creates a separate
901	   dialog between the application and the UA.  The Refer-To header field
902	   of the REFER request MUST contain an HTTP URI that references the
903	   markup document to be fetched.

905	   Furthermore, it is essential for the REFER request to be correlated
906	   with the dialog to which the user interface component will be
907	   associated.  This is necessary for authorization and for terminating
908	   the user interface components when the dialog terminates.  To provide
909	   this context, this specification defines the "context" header field
910	   parameter as an extension to the Refer-To heder field.  The grammar
911	   for this header field parameter is:

913	   refer-to-ctxt     = "context" EQUAL DQUOTE local-tag "," remote-tag
914	                       "," callid DQUOTE    ; callid defined in RFC 3261
915	                       ;; NOTE: any DQUOTEs inside callid MUST be escaped
916	                       ;; using quoted pair
917	   local-tag         = token
918	   remote-tag        = token

920	   Refer-To          = ("Refer-To" / "r") HCOLON ( name-addr / addr-spec ) *
921	        (SEMI (generic-param / refer-to-ctxt))

923	   The application MUST include the context header field parameter in
924	   the REFER request.  The remote-tag MUST be set to the remote tag of
925	   the dialog as seen by the user device.  The local-tag MUST be set to
926	   the local tag of the dialog as seen by the user device.  The callid
927	   MUST be set to the Call-ID of the dialog as seen by the device.
928	   Since the callid grammar allows it to contain double quotes, any such
929	   double quotes MUST be represented with a quoted pair.

931	   Since the "context" parameter in the Refer-To header field must be
932	   understood by the UA to process the request, this specification
933	   defines a new SIP option tag, "refer-context".  A REFER request
934	   generated by an application MUST include a Require header field with
935	   this option tag value.  Fortunately, the application will know ahead
936	   of time whether this extension is supported, as discussed in Section
937	   7.1.1.

939	   To create a presentation free user interface component, the
940	   application sends a SUBSCRIBE request to the UA.  The SUBSCRIBE MUST
941	   be sent to the GRUU advertised by the UA.  This SUBSCRIBE request
942	   creates a separate dialog.  The SUBSCRIBE request MUST use the KPML

944	   [7] event package.  The Event header field MUST contain parameters
945	   which identify the particular dialog that the interface component is
946	   being instantiated against.  The body of the SUBSCRIBE request
947	   contains the markup document that defines the conditions under which
948	   the application wishes to be notified of user input.

950	   In both cases, the REFER or SUBSCRIBE request SHOULD include a
951	   display name in the From header field which identifies the name of
952	   the application.  For example, a prepaid calling card might include a
953	   From header field which looks like:

955	   From: "Prepaid Calling Card" <sip:prepaid@example.com>

957	   Any of the SIP identity assertion mechanisms that have been defined,
958	   such as [10] and [12] are applicable to these requests as well.

960	7.1.3  Updating an Interface Component

962	   Once a user interface component has been created on a client, it can
963	   be updated.  The means for updating it depends on the type of UI
964	   component.

966	   Presentation capable UI components are updated using techniques
967	   already in place for those markups.  In particular, user input will
968	   cause an HTTP POST operation to push the user input to the
969	   application.  The result of the POST operation is a new markup that
970	   the UI is supposed to use.  This allows the UI to updated in response
971	   to user action.  Some markups, such as HTML, provide the ability to
972	   force a refresh after a certain period of time, so that the UI can be
973	   updated without user input.  Those mechanisms can be used here as
974	   well.  However, there is no support for an asynchronous push of an
975	   updated UI component from the appliciation to the user agent.  A new
976	   REFER request to the same GRUU would create a new UI component rather
977	   than updating any components already in place.

979	   For presentation free UI, the story is different.  The application
980	   MAY update the filter at any time by generating a SUBSCRIBE refresh
981	   with the new filter.  The UA will immediately begin using this new
982	   filter.

984	7.1.4  Terminating an Interface Component

986	   User interface components have a well defined lifetime.  They are
987	   created when the component is first pushed to the client.  User
988	   interface components are always associated with the SIP dialog on
989	   which they were pushed.  As such, their lifetime is bound by the
990	   lifetime of the dialog.  When the dialog ends, so does the interface
991	   component.

993	   However, there are some cases where the application would like to
994	   terminate the user interface component before its natural termination
995	   point.  For presentation capable user interfaces, this is not
996	   possible.  For presentation free user interfaces, the application MAY
997	   terminate the component by sending a SUBSCRIBE with Expires equal to
998	   zero.  This terminates the subscription, which removes the UI
999	   component.

1001	   A client can remove a UI component at any time.  For presentation
1002	   capable UI, this is analagous to the user dismissing the web form
1003	   window.  There is no mechanism provided for reporting this kind of
1004	   event to the application.  The application MUST be prepared to time
1005	   out, and never receive input from a user.  The duration of this
1006	   timeout is application dependent.  For presentation free user
1007	   interfaces, the UA can explicitly terminate the subscription.  This
1008	   will result in the generation of a NOTIFY with a Subscription-State
1009	   header field equal to "terminated".

1011	7.2  Client Remote Interfaces

1013	   As an alternative to, or in conjunction with client local user
1014	   interfaces, an application can make use of client remote user
1015	   interfaces.  These user interfaces can execute co-resident with the
1016	   application itself (in which case no standardized interfaces between
1017	   the UI and the application need to be used), or it can run
1018	   separately.  This framework assumes that the user interface runs on a
1019	   host that has a sufficient trust relationship with the application.
1020	   As such, the means for instantiating the user interface is not
1021	   considered here.

1023	   The primary issue is to connect the user device to the remote user
1024	   interface.  Doing so requires the manipulation of media streams
1025	   between the client and the user interface.  Such manipulation can
1026	   only be done by user agents.  There are two types of user agent
1027	   applications within this framework - originating/terminating
1028	   applications, and intermediary applications.

1030	7.2.1  Originating and Terminating Applications

1032	   Originating and terminating applications are applications which are
1033	   themselves the originator or the final recipient of a SIP invitation.
1034	   They are "pure" user agent applications - not back-to-back user
1035	   agents.  The classic example of such an application is an interactive
1036	   voice response (IVR) application, which is typically a terminating
1037	   application.  It is a terminating application because the user
1038	   explicitly calls it; i.e., it is the actual called party.  An example
1039	   of an originating application is a wakeup call application, which
1040	   calls a user at a specified time in order to wake them up.

1042	   Because originating and terminating applications are a natural
1043	   termination point of the dialog, manipulation of the media session by
1044	   the application is trivial.  Traditional SIP techniques for adding
1045	   and removing media streams, modifying codecs, and changing the
1046	   address of the recipient of the media streams, can be applied.
1047	   Similarly, the application can directly authenticate itself to the
1048	   user through S/MIME, since it is the peer UA in the dialog.

1050	7.2.2  Intermediary Applications

1052	   Intermediary applications are, at the same time, more common than
1053	   originating/terminating applications, and more complex.  Intermediary
1054	   applications are applications that are neither the actual caller or
1055	   called party.  Rather, they represent a "third party" that wishes to
1056	   interact with the user.  The classic example is the ubiquitous
1057	   pre-paid calling card application.

1059	   In order for the intermediary application to add a client remote user
1060	   interface, it needs to manipulate the media streams of the user agent
1061	   to terminate on that user interface.  This also introduces a
1062	   fundamental feature interaction issue.  Since the intermediary
1063	   application is not an actual participant in the call, the user will
1064	   need to interact with both the intermediary application and its peer
1065	   in the dialog.  Doing both at the same time is complicated, and is
1066	   discussed in more detail in Section 9.

1068	8.  User Agent Behavior

1070	8.1  Advertising Capabilities

1072	   In order to participate in applications that make use of stimulus
1073	   interfaces, a user agent needs to advertise its interaction
1074	   capabilities.

1076	   If a user agent supports presentation capable user interfaces, it
1077	   MUST support the REFER method, along with the "context" extension
1078	   defined here.  It MUST include, in all dialog initiating requests and
1079	   responses, an Allow header field that includes the REFER method and
1080	   and the Supported header field that includes the value
1081	   "refer-context".  Furthermore, the UA MUST support the SIP user agent
1082	   capabilities specification [5].  The UA MUST be capable of being
1083	   REFER'd to an HTTP URI.  It MUST include, in the Contact header field
1084	   of its dialog initiating requests and responses, a "schemes" Contact
1085	   header field parameter include the http URI scheme.  The UA MUST
1086	   include, in all dialog initiating requests and responses, an Accept
1087	   header field listing all of those markups supported by the UA.  It is
1088	   RECOMMENDED that all user agents that support presentation capable
1089	   user interfaces support HTML.

1091	   If a user agent supports presentation free user interfaces, it MUST
1092	   support the SUBSCRIBE [3] method.  It MUST support the KPML [7] event
1093	   package.  It MUST include, in all dialog initiating requests and
1094	   responses, an Allow header field that includes the SUBSCRIBE method.
1095	   It MUST include, in all dialog initiating requests and responses, an
1096	   Allow-Events header field that lists the KPML event package.  The UA
1097	   MUST include, in all dialog initiating requests and responses, an
1098	   Accept header field listing those event filters it supports.  At a
1099	   minimum, a UA MUST support the "application/kpml-request+xml" MIME
1100	   type.

1102	   For either presentation free or presentation capable user interfaces,
1103	   the user agent MUST support the GRUU [8] specification.  The Contact
1104	   header field in all dialog initiating requests and responses MUST
1105	   contain a GRUU.  The UA MUST include a Supported header field which
1106	   contains the "gruu" option tag.

1108	   Because these headers are examined by proxies which may be executing
1109	   applications, a UA that wishes to support client local user
1110	   interfaces should not encrypt them.

1112	8.2  Receiving User Interface Components

1114	   Once the UA has created a dialog (in either the early or confirmed
1115	   states), it MUST be prepared to receive a SUBSCRIBE or REFER request
1116	   against its GRUU.  If the UA receives such a request prior to the
1117	   establishment of a dialog, the UA MUST reject the request.

1119	   A user agent SHOULD attempt to authenticate the sender of the
1120	   request.  The sender will generally be an application, and therefore
1121	   the user agent is unlikely to ever have a shared secret with it,
1122	   making digest authentication useless.  However, authenticated
1123	   identities can be obtained through other means, such as [10].

1125	   A user agent MAY have pre-defined authorization policies which permit
1126	   applications which have authenticated themselves with a particular
1127	   identity, to push user interface components.  If such a set of
1128	   policies are present, it is checked first.  If the application is
1129	   authorized, processing proceeds.

1131	   If the application has authenticated itself, but it is not explicitly
1132	   authorized or blocked, this specification RECOMMENDS that the
1133	   application be automatically authorized if it can prove that it was
1134	   either on the call path, or is trusted by one of the elements on the
1135	   call path.  An application proves this to the user agent by
1136	   presenting it with the dialog identifiers in the SUBSCRIBE or REFER
1137	   request.  In the case of SUBSCRIBE, those identifiers are present in
1138	   the Event header field [7].  In the case of REFER, those identifiers
1139	   are present in the "context" parameter of the Refer-To header field.

1141	   Because of the dialog identifiers serve as a tool for authorization,
1142	   a user agent compliant to this framework SHOULD use dialog
1143	   identifiers that are cryptographically random, with at least 128 bits
1144	   of randomness.  It is recommended that this randomness be split
1145	   between the Call-ID and From header field tag in the case of a UAC.

1147	   Furthermore, to ensure that only applications resident in or trusted
1148	   by on-path elements can instantiate a user interface component, a
1149	   user agent compliant to this specification SHOULD use the sips URI
1150	   scheme for all dialogs it initiates.  This will guarantee secure
1151	   links between all of the elements on the signaling path.

1153	   If the dialog was not established with a sips URI, or the user agent
1154	   did not choose cryptographically random dialog identifiers, then the
1155	   application MUST NOT automatically be authorized, even if it
1156	   presented valid dialog identifiers.  A user agent MAY apply any other
1157	   policies in addition to (but not instead of) the ones specified here
1158	   in order to authorize the creation of the user interface component.
1159	   One such mechanism would be to prompt the user, informing them of the
1160	   identity of the application and the dialog it is associated with.  If
1161	   an authorization policy requires user interaction, the user agent
1162	   SHOULD respond to the SUBSCRIBE or REFER request with a 202.  In the
1163	   case of SUBSCRIBE, if authorization is not granted, the user agent
1164	   SHOULD generate a NOTIFY to terminate the subscription.  In the case
1165	   of REFER, the user agent MUST NOT act upon the URI in the Refer-To
1166	   header field until user authorization was obtained.

1168	   If an application does not present a valid dialog identifier in its
1169	   REFER or SUBSCRIBE request, the user agent MUST reject the request
1170	   with a 403 response.

1172	   If a REFER request to an HTTP URI was authorized, the UA executes the
1173	   URI and fetches the content to be rendered to the user.  This
1174	   instantiates a presentation capable user interface component.  If a
1175	   SUBSCRIBE was authorized, a presentation free user interface
1176	   component was instantiated.

1178	8.3  Mapping User Input to User Interface Components

1180	   Once the user interface components are instantiated, the user agent
1181	   must direct user input to the appropriate component.  In the case of
1182	   presentation capable user interfaces, this process is known as focus
1183	   selection.  It is done by means that are specific to the user
1184	   interface on the device.  In the case of a PC, for example, the
1185	   window manager would allow the user to select the appropriate user
1186	   interface component that their input is directed to.

1188	   For presentation free user interfaces, the situation is more
1189	   complicated.  In some cases, the device may support a mechanism that
1190	   allows the user to select a "line", and thus the associated dialog.
1191	   Any user input on the keypad while this line is selected are fed to
1192	   the user interface components associated with that dialog.

1194	   Otherwise, for client local user interfaces, the user input is
1195	   assumed to be associated with all user interface components.  For
1196	   client remote user interfaces, the user device converts the user
1197	   input to media, typically conveyed using RFC 2833, and sends this to
1198	   the client remote user interface.  This user interface then needs to
1199	   map user input from potentially many media streams into user
1200	   interface events.  The process for doing this is described in Section
1201	   6.3.

1203	8.4  Receiving Updates to User Interface Components

1205	   For presentation capable user interfaces, updates to the user
1206	   interface occur in ways specific to that user interface component.
1207	   In the case of HTML, for example, the document can tell the client to
1208	   fetch a new document periodically.  However, this framework does not
1209	   provide any additional machinery to asynchronously push a new user
1210	   interface component to the client.

1212	   For presentation free user interfaces, an application can push an
1213	   update to a component by sending a SUBSCRIBE refresh with a new
1214	   filter.  The user agent will process these according to the rules of
1215	   the event package.

1217	8.5  Terminating a User Interface Component

1219	   Termination of a presentation capable user interface component is a
1220	   trivial procedure.  The user agent merely dismisses the window (or
1221	   equivalent).  The fact that the component is dismissed is not
1222	   communicated to the application.  As such, it is purely a local
1223	   matter.

1225	   In the case of a presentation free user interface, the user might
1226	   wish to cease interacting with the application.  However, most
1227	   presentation free user interfaces will not have a way for the user to
1228	   signal this through the device.  If such a mechanism did exist, the
1229	   UA SHOULD generate a NOTIFY request with a Subscription-State equal
1230	   to "terminated" and a reason of "rejected".  This tells the
1231	   application that the component has been removed, and that it should
1232	   not attempt to re-subscribe.

1234	9.  Inter-Application Feature Interaction

1236	   The inter-application feature interaction problem is inherent to
1237	   stimulus signaling.  Whenever there are multiple applications, there
1238	   are multiple user interfaces.  The system has to determine to which
1239	   user interface any particular input is destined.  That question is
1240	   the essence of the inter-application feature interaction problem.

1242	   Inter-application feature interaction is not an easy problem to
1243	   resolve.  For now, we consider separately the issues for client-local
1244	   and client-remote user interface components.

1246	9.1  Client Local UI

1248	   When the user interface itself resides locally on the client device,
1249	   the feature interaction problem is actually much simpler.  The end
1250	   device knows explicitly about each application, and therefore can
1251	   present the user with each one separately.  When the user provides
1252	   input, the client device can determine to which user interface the
1253	   input is destined.  The user interface to which input is destined is
1254	   referred to as the application in focus, and the means by which the
1255	   focused application is selected is called focus determination.

1257	   Generally speaking, focus determination is purely a local operation.
1258	   In the PC universe, focus determination is provided by window
1259	   managers.  Each application does not know about focus, it merely
1260	   receives the user input that has been targeted to it when its in
1261	   focus.  This basic concept applies to SIP-based applications as well.

1263	   Focus determination will frequently be trivial, depending on the user
1264	   interface type.  Consider a user that makes a call from a PC.  The
1265	   call passes through a pre-paid calling card application, and a call
1266	   recording application.  Both of these wish to interact with the user.
1267	   Both push an HTML-based user interface to the user.  On the PC, each
1268	   user interface would appear as a separate window.  The user interacts
1269	   with the call recording application by selecting its window, and with
1270	   the pre-paid calling card application by selecting its window.  Focus
1271	   determination is literally provided by the PC window manager.  It is
1272	   clear to which application the user input is targeted.

1274	   As another example, consider the same two applications, but on a
1275	   "smart phone" that has a set of buttons, and next to each button, an
1276	   LCD display that can provide the user with an option.  This user
1277	   interface can be represented using the Wireless Markup Language
1278	   (WML).

1280	   The phone would allocate some number of buttons to each application.
1281	   The prepaid calling card would get one button for its "hangup"
1282	   command, and the recording application would get one for its
1283	   "start/stop" command.  The user can easily determine which
1284	   application to interact with by pressing the appropriate button.
1285	   Pressing a button determines focus and provides user input, both at
1286	   the same time.

1288	   Unfortunately, not all devices will have these advanced displays.  A
1289	   PSTN gateway, or a basic IP telephone, may only have a 12-key keypad.
1290	   The user interfaces for these devices are provided through the Keypad
1291	   Markup Language (KPML).  Considering once again the feature
1292	   interaction case above, the pre-paid calling card application and the
1293	   call recording application would both pass a KPML document to the
1294	   device.  When the user presses a button on the keypad, to which
1295	   document does the input apply? The device does not allow the user to
1296	   select.  A device where the user cannot provide focus is called a
1297	   focusless device.  This is quite a hard problem to solve.  This
1298	   framework does not make any explicit normative recommendation, but
1299	   concludes that the best option is to send the input to both user
1300	   interfaces unless the markup in one interface has indicated that it
1301	   should be suppressed from others.  This is a sensible choice by
1302	   analogy - its exactly what the existing circuit switched telephone
1303	   network will do.  It is an explicit non-goal to provide a better
1304	   mechanism for feature interaction resolution than the PSTN on devices
1305	   which have the same user interface as they do on the PSTN.  Devices
1306	   with better displays, such as PCs or screen phones, can benefit from
1307	   the capabilities of this framework, allowing the user to determine
1308	   which application they are interacting with.

1310	   Indeed, when a user provides input on a focusless device, the input
1311	   must be passed to all client local user interfaces, AND all client
1312	   remote user interfaces, unless the markup tells the UI to suppress
1313	   the media.  In the case of KPML, key events are passed to remote user
1314	   interfaces by encoding them in RFC 2833 [17].  Of course, since a
1315	   client cannot determine if a media stream terminates in a remote user
1316	   interface or not, these key events are passed in all audio media
1317	   streams unless the KPML request document is used to suppress.

1319	9.2  Client-Remote UI

1321	   When the user interfaces run remotely, the determination of focus can
1322	   be much, much harder.  There are many architectures that can be
1323	   deployed to handle the interaction.  None are ideal.  However, all
1324	   are beyond the scope of this specification.

1326	10.  Intra Application Feature Interaction

1328	   An application can instantiate a multiplicity of user interface
1329	   components.  For example, a single application can instantiate two
1330	   separate HTML components and one WML component.  Furthermore, an
1331	   application can instantiate both client local and client remote user
1332	   interfaces.

1334	   The feature interaction issues between these components within the
1335	   same application are less severe.  If an application has multiple
1336	   client user interface components, their interaction is resolved
1337	   identically to the inter-application case - through focus
1338	   determination.  However, the problems in focusless user devices (such
1339	   as a keypad on a telephone) generally won't exist, since the
1340	   application can generate user interfaces which do not overlap in
1341	   their usage of an input.

1343	   The real issue is that the optimal user experience frequently
1344	   requires some kind of coupling between the differing user interface
1345	   components.  This is a classic problem in multi-modal user
1346	   interfaces, such as those described by Speech Application Language
1347	   Tags (SALT).  As an example, consider a user interface where a user
1348	   can either press a labeled button to make a selection, or listen to a
1349	   prompt, and speak the desired selection.  Ideally, when the user
1350	   presses the button, the prompt should cease immediately, since both
1351	   of them were targeted at collecting the same information in parallel.
1352	   Such interactions are best handled by markups which natively support
1353	   such interactions, such as SALT, and thus require no explicit support
1354	   from this framework.

1356	11.  Example Call Flow

1358	   This section shows the operation of a call recording application.
1359	   This application allows a user to record the media in their call by
1360	   clicking on a button in a web form.  The application uses a
1361	   presentation capable user interface component that is pushed to the
1362	   caller.

1364	             A                  Recording App                  B
1365	             |(1) INVITE              |                        |
1366	             |----------------------->|                        |
1367	             |                        |(2) INVITE              |
1368	             |                        |----------------------->|
1369	             |                        |(3) 200 OK              |
1370	             |                        |<-----------------------|
1371	             |(4) 200 OK              |                        |
1372	             |<-----------------------|                        |
1373	             |(5) ACK                 |                        |
1374	             |----------------------->|                        |
1375	             |                        |(6) ACK                 |
1376	             |                        |----------------------->|
1377	             |(7) REFER               |                        |
1378	             |<-----------------------|                        |
1379	             |(8) 200 OK              |                        |
1380	             |----------------------->|                        |
1381	             |(9) NOTIFY              |                        |
1382	             |----------------------->|                        |
1383	             |(10) 200 OK             |                        |
1384	             |<-----------------------|                        |
1385	             |(11) HTTP GET           |                        |
1386	             |----------------------->|                        |
1387	             |(12) 200 OK             |                        |
1388	             |<-----------------------|                        |
1389	             |(13) NOTIFY             |                        |
1390	             |----------------------->|                        |
1391	             |(14) 200 OK             |                        |
1392	             |<-----------------------|                        |
1393	             |(15) HTTP POST          |                        |
1394	             |----------------------->|                        |
1395	             |(16) 200 OK             |                        |
1396	             |<-----------------------|                        |

1398	                                Figure 8

1400	   First, the caller, A, sends an INVITE to setup a call (message 1).
1401	   Since the caller supports the framework, and can handle presentation
1402	   capable user interface components, it includes the Supported header
1403	   field indicating that the GRUU extension and the REFER context
1404	   extension are understood, Allow indicating that REFER is understood,
1405	   and a Contact header field that includes the "schemes" header field
1406	   parameter.

1408	   INVITE sips:B@example.com SIP/2.0
1409	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1410	   From: Caller <sip:A@example.com>;tag=kkaz-
1411	   To: Callee <sip:B@example.com>
1412	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1413	   CSeq: 1 INVITE
1414	   Max-Forwards: 70
1415	   Supported: gruu, refer-context
1416	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1417	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1418	   Content-Length: ...
1419	   Content-Type: application/sdp

1421	   --SDP not shown--

1423	   The proxy acts as a recording server, and forwards the INVITE to the
1424	   called party (message 2):

1426	   INVITE sips:B@pc.example.com SIP/2.0
1427	   Record-Route: <sips:app.example.com;lr>
1428	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
1429	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1430	   From: Caller <sip:A@example.com>;tag=kkaz-
1431	   To: Callee <sip:B@example.com>
1432	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1433	   CSeq: 1 INVITE
1434	   Max-Forwards: 69
1435	   Supported: gruu, refer-context
1436	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1437	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1438	   Content-Length: ...
1439	   Content-Type: application/sdp

1441	   --SDP not shown--

1443	   B accepts the call with a 200 OK (message 3).  It does not support
1444	   the framework, and so the various header fields are not present.

1446	   SIP/2.0 200 OK
1447	   Record-Route: <ssip:app.example.com;lr>
1448	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
1449	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1450	   From: Caller <sip:A@example.com>;tag=kkaz-
1451	   To: Callee <sip:B@example.com>;tag=7777
1452	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1453	   CSeq: 1 INVITE
1454	   Contact: <sips:B@pc.example.com>
1455	   Content-Length: ...
1456	   Content-Type: application/sdp

1458	   --SDP not shown--

1460	   This 200 OK is passed back to the caller (message 4):

1462	   SIP/2.0 200 OK
1463	   Record-Route: <sips:app.example.com;lr>
1464	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1465	   From: Caller <sip:A@example.com>;tag=kkaz-
1466	   To: Callee <sip:B@example.com>;tag=7777
1467	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1468	   CSeq: 1 INVITE
1469	   Contact: <sips:B@pc.example.com>
1470	   Content-Length: ...
1471	   Content-Type: application/sdp

1473	   --SDP not shown--

1475	   The caller generates an ACK (message 5).

1477	   ACK sips:B@pc.example.com
1478	   Route: <sips:app.example.com;lr>
1479	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
1480	   From: Caller <sip:A@example.com>;tag=kkaz-
1481	   To: Callee <sip:B@example.com>;tag=7777
1482	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1483	   CSeq: 1 ACK

1485	   The ACK is forwarded to the called party (message 6).

1487	   ACK sips:B@pc.example.com
1488	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bKh7s
1489	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
1490	   From: Caller <sip:A@example.com>;tag=kkaz-
1491	   To: Callee <sip:B@example.com>;tag=7777
1492	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1493	   CSeq: 1 ACK

1495	   Now, the application decides to push a user interface component to
1496	   user A.  So, it sends it a REFER request (message 7):

1498	   REFER sips:bad998asd8asd0000a@example.com SIP/2.0
1499	   Refer-To: https://app.example.com/script.pl
1500	    ;context="kkaz-,7777,faif9ahhs9dd8==-sd98ajzz@host.example.com"
1501	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
1502	   Max-Forwards: 70
1503	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1504	   To: Caller <sip:A@example.com>
1505	   Call-ID: 66676776767@app.example.com
1506	   CSeq: 1 REFER
1507	   Event: refer
1508	   Contact: <sips:app.example.com>

1510	   The REFER is answered by a 200 OK (message 8).

1512	   SIP/2.0 200 OK
1513	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
1514	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1515	   To: Caller <sip:A@example.com>;tag=pqoew
1516	   Call-ID: 66676776767@app.example.com
1517	   Supported: gruu, refer-context
1518	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1519	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1520	   CSeq: 1 REFER

1522	   User A sends a NOTIFY (message 9):

1524	   NOTIFY sips:app.example.com SIP/2.0
1525	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
1526	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1527	   From: Caller <sip:A@example.com>;tag=pqoew
1528	   Call-ID: 66676776767@app.example.com
1529	   CSeq: 1 NOTIFY
1530	   Max-Forwards: 70
1531	   Event: refer;id=93809824
1532	   Subscription-State: active;expires=3600
1533	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1534	   Content-Type: message/sipfrag;version=2.0
1535	   Content-Length: 20
1536	   SIP/2.0 100 Trying

1538	   And the recording server responds with a 200 OK (message 10)

1540	   SIP/2.0 200 OK
1541	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
1542	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1543	   From: Caller <sip:A@example.com>;tag=pqoew
1544	   Call-ID: 66676776767@app.example.com
1545	   CSeq: 1 NOTIFY

1547	   The REFER request contained a "context" Refer-To header field
1548	   parameter with a valid dialog identifier.  Furthermore, all of the
1549	   signaling was over TLS and the dialog identifiers contain sufficient
1550	   randomness.  As such, the caller, A, automatically authorizes the
1551	   application.  It then acts on the Refer-To URI, fetching the script
1552	   from app.example.com (message 11).  The response, message 12,
1553	   contains a web application that the user can click on to enable
1554	   recording.  Because the client executed the URL in the Refer-To, it
1555	   generates another NOTIFY to the application, informing it of the
1556	   successful response (message 13).  This is answered with a 200 OK
1557	   (message 14).  When the user clicks on the link (message 15), the
1558	   results are posted to the server, and an updated display is provided
1559	   (message 16).

1561	12.  Security Considerations

1563	   There are many security considerations associated with this
1564	   framework.  It allows applications in the network to instantiate user
1565	   interface components on a client device.  Such instantiations need to
1566	   be from authenticated applications, and also need to be authorized to
1567	   place a UI into the client.  Indeed, the stronger requirement is
1568	   authorization.  It is not so important to know that name of the
1569	   provider of the application, but rather, that the provider is
1570	   authorized to instantiate components.

1572	   This specification defines specific authorization techniques and
1573	   requirements.  Automatic authorization is granted if the application
1574	   can prove that it is on the call path, or is trusted by an element on
1575	   the call path.  As documented above, this can be accompished by the
1576	   use of cryptographically random dialog identifiers and the usage of
1577	   sips for message confidentiality.  It is RECOMMENDED that sips be
1578	   implemented by user agents compliant to this specification.  This
1579	   does not represent a change from the requirements in RFC 3261.

1581	13.  IANA Considerations

1583	13.1  SIP Option Tag

1585	   This specification registers a new SIP option tag, as per the
1586	   guidelines in Section 27.1 of RFC 3261 [1].

1588	   Name: refer-context

1590	   Description: This option tag is used to identify the REFER extension
1591	      that defines the "context" parameter of the Refer-To header field.

1593	13.2  Header Field Parameter

1595	   This specification defines a new header field parameter, as per the
1596	   registry created by [9].  The required information is as follows:

1598	   Header field in which the parameter can appear: Refer-To

1600	   Name of the Parameter context

1602	   RFC Reference RFC XXXX [[NOTE TO IANA: Please replace XXXX with the
1603	      RFC number of this specification.]]

1605	14.  Contributors

1607	   This document was produced as a result of discussions amongst the
1608	   application interaction design team.  All members of this team
1609	   contributed significantly to the ideas embodied in this document.
1610	   The members of this team were:

1612	   Eric Burger
1613	   Cullen Jennings
1614	   Robert Fairlie-Cuninghame

1616	15.  Acknowledgements

1618	   The authors would like to thank Martin Dolly and Rohan Mahy for their
1619	   input and comments.  Thanks to Allison Mankin for her support of this
1620	   work.

1622	16.  References
1623	16.1  Normative References

1625	   [1]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
1626	        Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
1627	        Session Initiation Protocol", RFC 3261, June 2002.

1629	   [2]  Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
1630	        Responses in Session Initiation Protocol (SIP)", RFC 3262, June
1631	        2002.

1633	   [3]  Roach, A., "Session Initiation Protocol (SIP)-Specific Event
1634	        Notification", RFC 3265, June 2002.

1636	   [4]  McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
1637	        Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup
1638	        Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220,
1639	        February 2003.

1641	   [5]  Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Indicating User
1642	        Agent Capabilities in the Session Initiation Protocol (SIP)",
1643	        RFC 3840, August 2004.

1645	   [6]  Sparks, R., "The Session Initiation Protocol (SIP) Refer
1646	        Method", RFC 3515, April 2003.

1648	   [7]  Burger, E., "A Session Initiation Protocol (SIP) Event Package
1649	        for Key Press Stimulus  (KPML)", draft-ietf-sipping-kpml-07
1650	        (work in progress), December 2004.

1652	   [8]  Rosenberg, J., "Obtaining and Using Globally Routable User Agent
1653	        (UA) URIs (GRUU) in the  Session Initiation Protocol (SIP)",
1654	        draft-ietf-sip-gruu-02 (work in progress), July 2004.

1656	   [9]  Camarillo, G., "The Internet Assigned Number Authority (IANA)
1657	        Header Field Parameter Registry for the Session Initiation
1658	        Protocol (SIP)", BCP 98, RFC 3968, December 2004.

1660	16.2  Informative References

1662	   [10]  Peterson, J., "Enhancements for Authenticated Identity
1663	         Management in the Session Initiation  Protocol (SIP)",
1664	         draft-ietf-sip-identity-03 (work in progress), September 2004.

1666	   [11]  Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and
1667	         Instant Messaging", RFC 2778, February 2000.

1669	   [12]  Jennings, C., Peterson, J. and M. Watson, "Private Extensions
1670	         to the Session Initiation Protocol (SIP) for Asserted Identity
1671	         within Trusted Networks", RFC 3325, November 2002.

1673	   [13]  Rosenberg, J., "A Framework for Conferencing with the Session
1674	         Initiation Protocol",
1675	         draft-ietf-sipping-conferencing-framework-03 (work in
1676	         progress), October 2004.

1678	   [14]  Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller
1679	         Preferences for the Session Initiation Protocol (SIP)", RFC
1680	         3841, August 2004.

1682	   [15]  Rosenberg, J., "An INVITE Inititiated Dialog Event Package for
1683	         the Session Initiation  Protocol (SIP)",
1684	         draft-ietf-sipping-dialog-package-05 (work in progress),
1685	         November 2004.

1687	   [16]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
1688	         "RTP: A Transport Protocol for Real-Time Applications", RFC
1689	         3550, July 2003.

1691	   [17]  Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
1692	         Telephony Tones and Telephony Signals", RFC 2833, May 2000.

1694	   [18]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1695	         Session Description Protocol (SDP)", RFC 3264, June 2002.

1697	   [19]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
1698	         Package for Registrations", RFC 3680, March 2004.

1700	Author's Address

1702	   Jonathan Rosenberg
1703	   Cisco Systems
1704	   600 Lanidex Plaza
1705	   Parsippany, NJ  07054
1706	   US

1708	   Phone: +1 973 952-5000
1709	   EMail: jdrosen@cisco.com
1710	   URI:   http://www.jdrosen.net

1712	Intellectual Property Statement

1714	   The IETF takes no position regarding the validity or scope of any
1715	   Intellectual Property Rights or other rights that might be claimed to
1716	   pertain to the implementation or use of the technology described in
1717	   this document or the extent to which any license under such rights
1718	   might or might not be available; nor does it represent that it has
1719	   made any independent effort to identify any such rights.  Information
1720	   on the procedures with respect to rights in RFC documents can be
1721	   found in BCP 78 and BCP 79.

1723	   Copies of IPR disclosures made to the IETF Secretariat and any
1724	   assurances of licenses to be made available, or the result of an
1725	   attempt made to obtain a general license or permission for the use of
1726	   such proprietary rights by implementers or users of this
1727	   specification can be obtained from the IETF on-line IPR repository at
1728	   http://www.ietf.org/ipr.

1730	   The IETF invites any interested party to bring to its attention any
1731	   copyrights, patents or patent applications, or other proprietary
1732	   rights that may cover technology that may be required to implement
1733	   this standard.  Please address the information to the IETF at
1734	   ietf-ipr@ietf.org.

1736	Disclaimer of Validity

1738	   This document and the information contained herein are provided on an
1739	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1740	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1741	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1742	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1743	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1744	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1746	Copyright Statement

1748	   Copyright (C) The Internet Society (2005).  This document is subject
1749	   to the rights, licenses and restrictions contained in BCP 78, and
1750	   except as set forth therein, the authors retain all their rights.

1752	Acknowledgment

1754	   Funding for the RFC Editor function is currently provided by the
1755	   Internet Society.