idnits 2.17.1 

draft-ietf-sipping-app-interaction-framework-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3667, Section 5.1 on line 14.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1734.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1711.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1718.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1724.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     1740), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 36.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        By submitting this Internet-Draft, I certify that any applicable patent
        or other IPR claims of which I am aware have been disclosed, or
        will be disclosed, and any of which I become aware will be
        disclosed, in accordance with RFC 3668.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 6 instances of too long lines in the document, the longest one
     being 4 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 785: '...ription).  As such, user agents SHOULD...'
     RFC 2119 keyword, line 840: '... the application MAY push presentation...'
     RFC 2119 keyword, line 850: '... the application MAY push presentation...'
     RFC 2119 keyword, line 870: '...  An application MUST NOT attempt to p...'
     RFC 2119 keyword, line 873: '...t an application MUST NOT push a user ...'
     (49 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 24, 2004) is 7123 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 3265 (ref. '3') (Obsoleted by RFC 6665)

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  == Outdated reference: A later version (-08) exists of
     draft-ietf-sipping-kpml-04

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sip-gruu-02

  == Outdated reference: A later version (-06) exists of
     draft-ietf-sip-identity-03

  == Outdated reference: A later version (-05) exists of
     draft-ietf-sipping-conferencing-framework-02

  == Outdated reference: A later version (-06) exists of
     draft-ietf-sipping-dialog-package-04

  -- Obsolete informational reference (is this intentional?): RFC 2833 (ref.
     '17') (Obsoleted by RFC 4733, RFC 4734)


     Summary: 9 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	SIPPING                                                     J. Rosenberg
2	Internet-Draft                                             Cisco Systems
3	Expires: April 24, 2005                                 October 24, 2004

5	   A Framework for Application Interaction in the Session Initiation
6	                             Protocol (SIP)
7	            draft-ietf-sipping-app-interaction-framework-03

9	Status of this Memo

11	   By submitting this Internet-Draft, I certify that any applicable
12	   patent or other IPR claims of which I am aware have been disclosed,
13	   and any of which I become aware will be disclosed, in accordance with
14	   RFC 3668.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as
19	   Internet-Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on April 24, 2005.

34	Copyright Notice

36	   Copyright (C) The Internet Society (2004).  All Rights Reserved.

38	Abstract

40	   This document describes a framework for the interaction between users
41	   and Session Initiation Protocol (SIP) based applications.  By
42	   interacting with applications, users can guide the way in which they
43	   operate.  The focus of this framework is stimulus signaling, which
44	   allows a user agent to interact with an application without knowledge
45	   of the semantics of that application.  Stimulus signaling can occur
46	   to a user interface running locally with the client, or to a remote
47	   user interface, through media streams.  Stimulus signaling
48	   encompasses a wide range of mechanisms, ranging from clicking on
49	   hyperlinks, to pressing buttons, to traditional Dual Tone Multi
50	   Frequency (DTMF) input.  In all cases, stimulus signaling is
51	   supported through the use of markup languages, which play a key role
52	   in this framework.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
57	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	   3.  A Model for Application Interaction  . . . . . . . . . . . . .  7
59	     3.1   Functional vs. Stimulus  . . . . . . . . . . . . . . . . .  8
60	     3.2   Real-Time vs. Non-Real Time  . . . . . . . . . . . . . . .  9
61	     3.3   Client-Local vs. Client-Remote . . . . . . . . . . . . . .  9
62	     3.4   Presentation Capable vs. Presentation Free . . . . . . . . 10
63	   4.  Interaction Scenarios on Telephones  . . . . . . . . . . . . . 11
64	     4.1   Client Remote  . . . . . . . . . . . . . . . . . . . . . . 11
65	     4.2   Client Local . . . . . . . . . . . . . . . . . . . . . . . 12
66	     4.3   Flip-Flop  . . . . . . . . . . . . . . . . . . . . . . . . 12
67	   5.  Framework Overview . . . . . . . . . . . . . . . . . . . . . . 13
68	   6.  Deployment Topologies  . . . . . . . . . . . . . . . . . . . . 15
69	     6.1   Third Party Application  . . . . . . . . . . . . . . . . . 16
70	     6.2   Co-Resident Application  . . . . . . . . . . . . . . . . . 16
71	     6.3   Third Party Application and User Device Proxy  . . . . . . 17
72	     6.4   Proxy Application  . . . . . . . . . . . . . . . . . . . . 18
73	   7.  Application Behavior . . . . . . . . . . . . . . . . . . . . . 19
74	     7.1   Client Local Interfaces  . . . . . . . . . . . . . . . . . 19
75	       7.1.1   Discovering Capabilities . . . . . . . . . . . . . . . 19
76	       7.1.2   Pushing an Initial Interface Component . . . . . . . . 20
77	       7.1.3   Updating an Interface Component  . . . . . . . . . . . 22
78	       7.1.4   Terminating an Interface Component . . . . . . . . . . 22
79	     7.2   Client Remote Interfaces . . . . . . . . . . . . . . . . . 23
80	       7.2.1   Originating and Terminating Applications . . . . . . . 23
81	       7.2.2   Intermediary Applications  . . . . . . . . . . . . . . 24
82	   8.  User Agent Behavior  . . . . . . . . . . . . . . . . . . . . . 24
83	     8.1   Advertising Capabilities . . . . . . . . . . . . . . . . . 24
84	     8.2   Receiving User Interface Components  . . . . . . . . . . . 25
85	     8.3   Mapping User Input to User Interface Components  . . . . . 26
86	     8.4   Receiving Updates to User Interface Components . . . . . . 27
87	     8.5   Terminating a User Interface Component . . . . . . . . . . 27
88	   9.  Inter-Application Feature Interaction  . . . . . . . . . . . . 28
89	     9.1   Client Local UI  . . . . . . . . . . . . . . . . . . . . . 28
90	     9.2   Client-Remote UI . . . . . . . . . . . . . . . . . . . . . 29
91	   10.   Intra Application Feature Interaction  . . . . . . . . . . . 29
92	   11.   Example Call Flow  . . . . . . . . . . . . . . . . . . . . . 30
93	   12.   Security Considerations  . . . . . . . . . . . . . . . . . . 35
94	   13.   IANA Considerations  . . . . . . . . . . . . . . . . . . . . 36
95	     13.1  SIP Option Tag . . . . . . . . . . . . . . . . . . . . . . 36
96	     13.2  Header Field Parameter . . . . . . . . . . . . . . . . . . 36

98	   14.   Contributors . . . . . . . . . . . . . . . . . . . . . . . . 36
99	   15.   Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
100	   16.   References . . . . . . . . . . . . . . . . . . . . . . . . . 37
101	   16.1  Normative References . . . . . . . . . . . . . . . . . . . . 37
102	   16.2  Informative References . . . . . . . . . . . . . . . . . . . 37
103	       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 38
104	       Intellectual Property and Copyright Statements . . . . . . . . 39

106	1.  Introduction

108	   The Session Initiation Protocol (SIP) [1] provides the ability for
109	   users to initiate, manage, and terminate communications sessions.
110	   Frequently, these sessions will involve a SIP application.  A SIP
111	   application is defined as a program running on a SIP-based element
112	   (such as a proxy or user agent) that provides some value-added
113	   function to a user or system administrator.  Examples of SIP
114	   applications include pre-paid calling card calls, conferencing, and
115	   presence-based [11] call routing.

117	   In order for most applications to properly function, they need input
118	   from the user to guide their operation.  As an example, a pre-paid
119	   calling card application requires the user to input their calling
120	   card number, their PIN code, and the destination number they wish to
121	   reach.  The process by which a user provides input to an application
122	   is called "application interaction".

124	   Application interaction can be either functional or stimulus.
125	   Functional interaction requires the user device to understand the
126	   semantics of the application, whereas stimulus interaction does not.
127	   Stimulus signaling allows for applications to be built without
128	   requiring modifications to the user device.  Stimulus interaction is
129	   the subject of this framework.  The framework provides a model for
130	   how users interact with applications through user interfaces, and how
131	   user interfaces and applications can be distributed throughout a
132	   network.  This model is then used to describe how applications can
133	   instantiate and manage user interfaces.

135	2.  Definitions

137	   SIP Application: A SIP application is defined as a program running on
138	      a SIP-based element (such as a proxy or user agent) that provides
139	      some value-added function to a user or system administrator.
140	      Examples of SIP applications include pre-paid calling card calls,
141	      conferencing, and presence-based [11] call routing.

143	   Application Interaction: The process by which a user provides input
144	      to an application.

146	   Real-Time Application Interaction: Application interaction that takes
147	      place while an application instance is executing.  For example,
148	      when a user enters their PIN number into a pre-paid calling card
149	      application, this is real-time application interaction.

151	   Non-Real Time Application Interaction: Application interaction that
152	      takes place asynchronously with the execution of the application.
153	      Generally, non-real time application interaction is accomplished
154	      through provisioning.

156	   Functional Application Interaction: Application interaction is
157	      functional when the user device has an understanding of the
158	      semantics of the interaction with the application.

160	   Stimulus Application Interaction: Application interaction is
161	      considered to be stimulus when the user device has no
162	      understanding of the semantics of the interaction with the
163	      application.

165	   User Interface (UI): The user interface provides the user with
166	      context in order to make decisions about what they want.  The user
167	      interacts with the device, which conveys the user input the the
168	      user interface.  The user interface interprets the information,
169	      and passes it to the application.

171	   User Interface Component: A piece of user interface which operates
172	      independently of other pieces of the user interface.  For example,
173	      a user might have two separate web interfaces to a pre-paid
174	      calling card application - one for hanging up and making another
175	      call, and another for entering the username and PIN.

177	   User Device: The software or hardware system that the user directly
178	      interacts with in order to communicate with the application.  An
179	      example of a user device is a telephone.  Another example is a PC
180	      with a web browser.

182	   User Device Proxy: A software or hardware system that a user
183	      indirectly interacts through in order to communicate with the
184	      application.  This indirection can be through a network.  An
185	      example is a gateway from IP to the Public Switched Telephone
186	      Network (PSTN).  It acts a user device proxy, acting on behalf of
187	      the user on the circuit network.

189	   User Input: The "raw" information passed from a user to a user
190	      interface.  Examples of user input include a spoken word or a
191	      click on a hyperlink.

193	   Client-Local User Interface: A user interface which is co-resident
194	      with the user device.

196	   Client-Remote User Interface: A user interface which executes
197	      remotely from the user device.  In this case, a standardized
198	      interface is needed between the user device and the user
199	      interface.  Typically, this is done through media sessions -
200	      audio, video, or application sharing.

202	   Media Interaction: A means of separating a user and a user interface
203	      by connecting them with media streams.

205	   Interactive Voice Response (IVR): An IVR is a type of user interface
206	      that allows users to speak commands to the application, and hear
207	      responses to those commands prompting for more information.

209	   Prompt-and-Collect: The basic primitive of an IVR user interface.
210	      The user is presented with a voice option, and the user speaks
211	      their choice.

213	   Barge-In: In an IVR user interface, a user is prompted to enter some
214	      information.  With some prompts, the user may enter the requested
215	      information before the prompt completes.  In that case, the prompt
216	      ceases.  The act of entering the information before completion of
217	      the prompt is referred to as barge-in.

219	   Focus: A user interface component has focus when user input is
220	      provided fed to it, as opposed to any other user interface
221	      components.  This is not to be confused with the term focus within
222	      the SIP conferencing framework, which refers to the center user
223	      agent in a conference [13].

225	   Focus Determination: The process by which the user device determines
226	      which user interface component will receive the user input.

228	   Focusless User Interface: A user interface which has no ability to
229	      perform focus determination.  An example of a focusless user
230	      interface is a keypad on a telephone.

232	   Presentation Capable UI: A user interface which can prompt the user
233	      with input, collect results, and then prompt the user with new
234	      information based on those results.

236	   Presentation Free UI: A user interface which cannot prompt the user
237	      with information.

239	   Feature Interaction: A class of problems which result when multiple
240	      applications or application components are trying to provide
241	      services to a user at the same time.

243	   Inter-Application Feature Interaction: Feature interactions that
244	      occur between applications.

246	   DTMF: Dual-Tone Multi-Frequency.  DTMF refer to a class of tones
247	      generated by circuit switched telephony devices when the user
248	      presses a key on the keypad.  As a result, DTMF and keypad input
249	      are often used synonymously, when in fact one of them (DTMF) is
250	      merely a means of conveying the other (the keypad input) to a
251	      client-remote user interface (the switch, for example).

253	   Application Instance: A single execution path of a SIP application.

255	   Originating Application: A SIP application which acts as a UAC,
256	      making a call on behalf of the user.

258	   Terminating Application: A SIP application which acts as a UAS,
259	      answering a call generated by a user.  IVR applications are
260	      terminating applications.

262	   Intermediary Application: A SIP application which is neither the
263	      caller or callee, but rather, a third party involved in a call.

265	3.  A Model for Application Interaction

267	         +---+            +---+            +---+             +---+
268	         |   |            |   |            |   |             |   |
269	         |   |            | U |            | U |             | A |
270	         |   |   Input    | s |   Input    | s |   Results   | p |
271	         |   | ---------> | e | ---------> | e | ----------> | p |
272	         | U |            | r |            | r |             | l |
273	         | s |            |   |            |   |             | i |
274	         | e |            | D |            | I |             | c |
275	         | r |   Output   | e |   Output   | f |   Update    | a |
276	         |   | <--------- | v | <--------- | a | <.......... | t |
277	         |   |            | i |            | c |             | i |
278	         |   |            | c |            | e |             | o |
279	         |   |            | e |            |   |             | n |
280	         |   |            |   |            |   |             |   |
281	         +---+            +---+            +---+             +---+

283	               Figure 1: Model for Real-Time Interactions

285	   Figure 1 presents a general model for how users interact with
286	   applications.  Generally, users interact with a user interface
287	   through a user device.  A user device can be a telephone, or it can
288	   be a PC with a web browser.  Its role is to pass the user input from
289	   the user, to the user interface.  The user interface provides the
290	   user with context in order to make decisions about what they want.
291	   The user interacts with the device, causing information to be passed
292	   from the device to the user interface.  The user interface interprets
293	   the information, and passes it as a user interface event to the
294	   application.  The application may be able to modify the user
295	   interface based on this event.  Whether or not this is possible
296	   depends on the type of user interface.

298	   User interfaces are fundamentally about rendering and interpretation.
299	   Rendering refers to the way in which the user is provided context.
300	   This can be through hyperlinks, images, sounds, videos, text, and so
301	   on.  Interpretation refers to the way in which the user interface
302	   takes the "raw" data provided by the user, and returns the result to
303	   the application as a meaningful event, abstracted from the
304	   particulars of the user interface.  As an example, consider a
305	   pre-paid calling card application.  The user interface worries about
306	   details such as what prompt the user is provided, whether the voice
307	   is male or female, and so on.  It is concerned with recognizing the
308	   speech that the user provides, in order to obtain the desired
309	   information.  In this case, the desired information is the calling
310	   card number, the PIN code, and the destination number.  The
311	   application needs that data, and it doesn't matter to the application
312	   whether it was collected using a male prompt or a female one.

314	   User interfaces generally have real-time requirements towards the
315	   user.  That is, when a user interacts with the user interface, the
316	   user interface needs to react quickly, and that change needs to be
317	   propagated to the user right away.  However, the interface between
318	   the user interface and the application need not be that fast.  Faster
319	   is better, but the user interface itself can frequently compensate
320	   for long latencies there.  In the case of a pre-paid calling card
321	   application, when the user is prompted to enter their PIN, the prompt
322	   should generally stop immediately once the first digit of the PIN is
323	   entered.  This is referred to as barge-in.  After the user-interface
324	   collects the rest of the PIN, it can tell the user to "please wait
325	   while processing".  The PIN can then be gradually transmitted to the
326	   application.  In this example, the user interface has compensated for
327	   a slow UI to application interface by asking the user to wait.

329	   The separation between user interface and application is absolutely
330	   fundamental to the entire framework provided in this document.  Its
331	   importance cannot be overstated.

333	   With this basic model, we can begin to taxonomize the types of
334	   systems that can be built.

336	3.1  Functional vs. Stimulus

338	   The first way to taxonomize the system is to consider the interface
339	   between the UI and the application.  There are two fundamentally
340	   different models for this interface.  In a functional interface, the
341	   user interface has detailed knowledge about the application, and is,
342	   in fact, specific to the application.  The interface between the two
343	   components is through a functional protocol, capable of representing
344	   the semantics which can be exposed through the user interface.
345	   Because the user interface has knowledge of the application, it can
346	   be optimally designed for that application.  As a result, functional
347	   user interfaces are almost always the most user friendly, the fastest
348	   and the most responsive.  However, in order to allow interoperability
349	   between user devices and applications, the details of the functional
350	   protocols need to be specified in standards.  This slows down
351	   innovation and limits the scope of applications that can be built.

353	   An alternative is a stimulus interface.  In a stimulus interface, the
354	   user interface is generic; totally ignorant of the details of the
355	   application.  Indeed, the application may pass instructions to the
356	   user interface describing how it should operate.  The user interface
357	   translates user input into "stimulus" - which are data understood
358	   only by the application, and not by the user interface.  Because they
359	   are generic, and because they require communications with the
360	   application in order to change the way in which they render
361	   information to the user, stimulus user interfaces are usually slower,
362	   less user friendly, and less responsive than a functional
363	   counterpart.  However, they allow for substantial innovation in
364	   applications, since no standardization activity is needed to build a
365	   new application, as long as it can interact with the user within the
366	   confines of the user interface mechanism.  The web is an example of a
367	   stimulus user interface to applications.

369	   In SIP systems, functional interfaces are provided by extending the
370	   SIP protocol to provide the needed functionality.  For example, the
371	   SIP caller preferences specification [14] provides a functional
372	   interface that allows a user to request applications to route the
373	   call to specific types of user agents.  Functional interfaces are
374	   important, but are not the subject of this framework.  The primary
375	   goal of this framework is to address the role of stimulus interfaces
376	   to SIP applications.

378	3.2  Real-Time vs. Non-Real Time

380	   Application interaction systems can also be real-time or
381	   non-real-time.  Non-real interaction allows the user to enter
382	   information about application operation asynchronously with its
383	   invocation.  Frequently, this is done through provisioning systems.
384	   As an example, a user can set up the forwarding number for a
385	   call-forward on no-answer application using a web page.  Real-time
386	   interaction requires the user to interact with the application at the
387	   time of its invocation.

389	3.3  Client-Local vs. Client-Remote

391	   Another axis in the taxonomization is whether the user interface is
392	   co-resident with the user device (which we refer to as a client-local
393	   user interface), or the user interface runs in a host separated from
394	   the client (which we refer to as a client-remote user interface).  In
395	   a client-remote user interface, there exists some kind of protocol
396	   between the client device and the UI that allows the client to
397	   interact with the user interface over a network.

399	   The most important way to separate the UI and the client device is
400	   through media interaction.  In media interaction, the interface
401	   between the user and the user interface is through media - audio,
402	   video, messaging, and so on.  This is the classic mode of operation
403	   for VoiceXML [4], where the user interface (also referred to as the
404	   voice browser) runs on a platform in the network.  Users communicate
405	   with the voice browser through the telephone network (or using a SIP
406	   session).  The voice browser interacts with the application using
407	   HTTP to convey the information collected from the user.

409	   In the case of a client-local user interface, the user interface runs
410	   co-located with the user device.  The interface between them is
411	   through the software that interprets the users input and passes them
412	   to the user interface.  The classic example of this is the web.  In
413	   the web, the user interface is a web browser, and the interface is
414	   defined by the HTML document that it's rendering.  The user interacts
415	   directly with the user interface running in the browser.  The results
416	   of that user interface are sent to the application (running on the
417	   web server) using HTTP.

419	   It is important to note that whether or not the user interface is
420	   local or remote (in the case of media interaction) is not a property
421	   of the modality of the interface, but rather a property of the
422	   system.  As an example, it is possible for a web-based user interface
423	   to be provided with a client-remote user interface.  In such a
424	   scenario, video and application sharing media sessions can be used
425	   between the user and the user interface.  The user interface, still
426	   guided by HTML, now runs "in the network", remote from the client.
427	   Similarly, a VoiceXML document can be interpreted locally by a client
428	   device, with no media streams at all.  Indeed, the VoiceXML document
429	   can be rendered using text, rather than media, with no impact on the
430	   interface between the user interface and the application.

432	   It is also important to note that systems can be hybrid.  In a hybrid
433	   user interface, some aspects of it (usually those associated with a
434	   particular modality) run locally, and others run remotely.

436	3.4  Presentation Capable vs. Presentation Free

438	   A user interface can be capable of presenting information to the user
439	   (a presentation capable UI), or it can be capable only of collecting
440	   user input (a presentation free UI).  These are very different types
441	   of user interfaces.  A presentation capable UI can provide the user
442	   with feedback after every input, providing the context for collecting
443	   the next input.  As a result, presentation capable user interfaces
444	   require an update to the information provided to the user after each
445	   input.  The web is a classic example of this.  After every input
446	   (i.e., a click), the browser provides the input to the application
447	   and fetches the next page to render.  In a presentation free user
448	   interface, this is not the case.  Since the user is not provided with
449	   feedback, these user interfaces tend to merely collect information as
450	   its entered, and pass it to the application.

452	   Another difference is that a presentation-free user interface cannot
453	   support the concept of a focus.  As a result, if multiple
454	   applications wish to gather input from the user, there is no way for
455	   the user to select which application the input is destined for.  The
456	   input provided to applications through presentation-free user
457	   interfaces is more of a broadcast or notification operation, as a
458	   result.

460	4.  Interaction Scenarios on Telephones

462	   In this section, we applied the model of Section 3 to telephones.

464	   In a traditional telephone, the user interface consists of a 12-key
465	   keypad, a speaker, and a microphone.  Indeed, from here forward, the
466	   term "telephone" is used to represent any device that meets, at a
467	   minimum, the characteristics described in the previous sentence.
468	   Circuit-switched telephony applications are almost universally
469	   client-remote user interfaces.  In the Public Switched Telephone
470	   Network (PSTN), there is usually a circuit interface between the user
471	   and the user interface.  The user input from the keypad is conveyed
472	   used Dual-Tone Multi-Frequency (DTMF), and the microphone input as
473	   Pulse Code Modulated (PCM) encoded voice.

475	   In an IP-based system, there is more variability in how the system
476	   can be instantiated.  Both client-remote and client-local user
477	   interfaces to a telephone can be provided.

479	   In this framework, a PSTN gateway can be considered a User Device
480	   Proxy.  It is a proxy for the user because it can provide, to a user
481	   interface on an IP network, input taken from a user on a circuit
482	   switched telephone.  The gateway may be able to run a client-local
483	   user interface, just as an IP telephone might.

485	4.1  Client Remote

487	   The most obvious instantiation is the "classic" circuit-switched
488	   telephony model.  In that model, the user interface runs remotely
489	   from the client.  The interface between the user and the user
490	   interface is through media, set up by SIP and carried over the Real
491	   Time Transport Protocol (RTP) [16].  The microphone input can be
492	   carried using any suitable voice encoding algorithm.  The keypad
493	   input can be conveyed in one of two ways.  The first is to convert
494	   the keypad input to DTMF, and then convey that DTMF using a suitance
495	   encoding algorithm for it (such as PCMU).  An alternative, and
496	   generally the preferred approach, is to transmit the keypad input
497	   using RFC 2833 [17], which provides an encoding mechanism for
498	   carrying keypad input within RTP.

500	   In this classic model, the user interface would run on a server in
501	   the IP network.  It would perform speech recognition and DTMF
502	   recognition to derive the user intent, feed them through the user
503	   interface, and provide the result to an application.

505	4.2  Client Local

507	   An alternative model is for the entire user interface to reside on
508	   the telephone.  The user interface can be a VoiceXML browser, running
509	   speech recognition on the microphone input, and feeding the keypad
510	   input directly into the script.  As discussed above, the VoiceXML
511	   script could be rendered using text instead of voice, if the
512	   telephone had a textual display.

514	   For simpler phones without a display, the user interface can be
515	   described by a Keypad Markup Language request document [7].  As the
516	   user enters digits in the keypad, they are passed to the user
517	   interface, which generates user interface events that can be
518	   transported to the application.

520	4.3  Flip-Flop

522	   A middle-ground approach is to flip back and forth between a
523	   client-local and client-remote user interface.  Many voice
524	   applications are of the type which listen to the media stream and
525	   wait for some specific trigger that kicks off a more complex user
526	   interaction.  The long pound in a pre-paid calling card application
527	   is one example.  Another example is a conference recording
528	   application, where the user can press a key at some point in the call
529	   to begin recording.  When the key is pressed, the user hears a
530	   whisper to inform them that recording has started.

532	   The ideal way to support such an application is to install a
533	   client-local user interface component that waits for the trigger to
534	   kick off the real interaction.  Once the trigger is received, the
535	   application connects the user to a client-remote user interface that
536	   can play announements, collect more information, and so on.

538	   The benefit of flip-flopping between a client-local and client-remote
539	   user interface is cost.  The client-local user interface will
540	   eliminate the need to send media streams into the network just to
541	   wait for the user to press the pound key on the keypad.

543	   The Keypad Markup Language (KPML) was designed to support exactly
544	   this kind of need [7].  It models the keypad on a phone, and allows
545	   an application to be informed when any sequence of keys have been
546	   pressed.  However, KPML has no presentation component.  Since user
547	   interfaces generally require a response to user input, the
548	   presentation will need to be done using a client-remote user
549	   interface that gets instantiated as a result of the trigger.

551	   It is tempting to use a hybrid model, where a prompt-and-collect
552	   application is implemented by using a client-remote user interface
553	   that plays the prompts, and a client-local user interface, described
554	   by KPML, that collects digits.  However, this only complicates the
555	   application.  Firstly, the keypad input will be sent to both the
556	   media stream and the KPML user interface.  This requires the
557	   application to sort out which user inputs are duplicates, a process
558	   that is very complicated.  Secondly, the primary benefit of KPML is
559	   to avoid having a media stream towards a user interface.  However,
560	   there is already a media stream for the prompting, so there is no
561	   real savings.

563	5.  Framework Overview

565	   In this framework, we use the term "SIP application" to refer to a
566	   broad set of functionality.  A SIP application is a program running
567	   on a SIP-based element (such as a proxy or user agent) that provides
568	   some value-added function to a user or system administrator.  SIP
569	   applications can execute on behalf of a caller, a called party, or a
570	   multitude of users at once.

572	   Each application has a number of instances that are executing at any
573	   given time.  An instance represents a single execution path for an
574	   application.  Each instance has a well defined lifecycle.  It is
575	   established as a result of some event.  That event can be a SIP
576	   event, such as the reception of a SIP INVITE request, or it can be a
577	   non-SIP event, such as a web form post or even a timer.  Application
578	   instances also have a specific end time.  Some instances have a
579	   lifetime that is coupled with a SIP transaction or dialog.  For
580	   example, a proxy application might begin when an INVITE arrives, and
581	   terminate when the call is answered.  Other applications have a
582	   lifetime that spans multiple dialogs or transactions.  For example, a
583	   conferencing application instance may exist so long as there are any
584	   dialogs connected to it.  When the last dialog terminates, the
585	   application instance terminates.  Other applications have a liftime
586	   that is completely decoupled from SIP events.

588	   It is fundamental to the framework described here that multiple
589	   application instances may interact with a user during a single SIP
590	   transaction or dialog.  Each instance may be for the same
591	   application, or different applications.  Each of the applications may
592	   be completely independent, in that they may be owned by different
593	   providers, and may not be aware of each others existence.  Similarly,
594	   there may be application instances interacting with the caller, and
595	   instances interacting with the callee, both within the same
596	   transaction or dialog.

598	   The first step in the interaction with the user is to instantiate one
599	   or more user interface components for the application instance.  A
600	   user interface component is a single piece of the user interface that
601	   is defined by a logical flow that is not synchronously coupled with
602	   any other component.  In other words, each component runs more or
603	   less independently.

605	   A user interface component can be instantiated in one of the user
606	   agents in a dialog (for a client-local user interface), or within a
607	   network element (for a client-remote user interface).  If a
608	   client-local user interface is to be used, the application needs to
609	   determine whether or not the user agent is capable of supporting a
610	   client-local user interface, and in what format.  In this framework,
611	   all client-local user interface components are described by a markup
612	   language.  A markup language describes a logical flow of presentation
613	   of information to the user, collection of information from the user,
614	   and transmission of that information to an application.  Examples of
615	   markup languages include HTML, WML, VoiceXML, and the Keypad Markup
616	   Language (KPML) [7].

618	   Unlike an application instance, which has very flexible lifetimes, a
619	   user interface component has a very fixed lifetime.  A user interface
620	   component is always associated with a dialog.  The user interface
621	   component can be created at any point after the dialog (or early
622	   dialog) is created.  However, the user interface component terminates
623	   when the dialog terminates.  The user interface component can be
624	   terminated earlier by the user agent, and possibly by the
625	   application, but its lifetime never exceeds that of its associated
626	   dialog.

628	   There are two ways to create a client local interface component.  For
629	   interface components that are presentation capable, the application
630	   sends a REFER [6] request to the user agent.  The Refer-To header
631	   field contains an HTTP URI that points to the markup for the user
632	   interface.  For interface components that are presentation free (such
633	   as those defined by KPML), the application sends a SUBSCRIBE request
634	   to the user agent.  The body of the SUBSCRIBE request contains a
635	   filter, which, in this case, is the markup that defines when
636	   information is to be sent to the application in a NOTIFY.

638	   If a user interface component is to be instantiated in the network,
639	   there is no need to determine the capabilities of the device on which
640	   the user interface is instantiated.  Presumably, it is on a device on
641	   which the application knows a UI can be created.  However, the
642	   application does need to connect the user device to the user
643	   interface.  This will require manipulation of media streams in order
644	   to establish that connection.

646	   The interface between the user interface component and the
647	   application depends on the type of user interface.  For presentation
648	   capable user interfaces, such as those described by  HTML and
649	   VoiceXML, HTTP form POST operations are used.  For presentation free
650	   user interfaces, a SIP NOTIFY is used.  The differing needs and
651	   capabilities of these two user interfaces, as described in Section
652	   3.4, is what drives the different choices for the interactions.
653	   Since presentation capable user interfaces require an update to the
654	   presentation every time user data is entered, they are a good match
655	   for HTTP.  Since presentation free user interfaces merely transmit
656	   user input to the application, a NOTIFY is more appropriate.

658	   Indeed, for presentation free user interfaces, there are two
659	   different modalities of operation.  The first is called "one shot".
660	   In the one-shot role, the markup waits for a user to enter some
661	   information, and when they do, reports this event to the application.
662	   The application then does something, and the markup is no longer
663	   used.  In the other modality, called "monitor", the markup stays
664	   permanently resident, and reports information back to an application
665	   until termination of the associated dialog.

667	6.  Deployment Topologies

669	   This section presents some of the network topologies in which this
670	   framework can be instantiated.

672	6.1  Third Party Application

674	                    +-------------+
675	                /---| Application |
676	               /    +-------------+
677	              /
678	       SUB/  / REFER/
679	       NOT  /  HTTP
680	           /
681	      +--------+    SIP (INVITE)    +-----+
682	      |   UI   A--------------------X     |
683	      |........|                    | SIP |
684	      |  User  |        RTP         | UA  |
685	      | Device B--------------------Y     |
686	      +--------+                    +-----+

688	                     Figure 2: Third Party Topology

690	   In this topology, the application that is interested in interacting
691	   with the users exists outside of the SIP dialog between the user
692	   agents.  In that case, the application learns about the initiation
693	   and termination of the dialog, along with the dialog identifiers,
694	   through some out of band means.  One such possibility is the dialog
695	   event package [15].  Dialog information is only revealed to trusted
696	   parties, so the application would need to be trusted by one of the
697	   users in order to obtain this information.

699	   At any point during the dialog, the application can instantiate user
700	   interface components on the user device of the caller or callee.  It
701	   can do this either using SUBSCRIBE or REFER, depending on the type of
702	   user interface (presentation capable or presentation free).

704	6.2  Co-Resident Application

706	      +--------+    SIP (INVITE)    +-----+
707	      |  User  A--------------------X SIP |
708	      | Device |        RTP         | UA  |
709	      |........B--------------------Y     |
710	      |        |    SUB/NOT         | App)|
711	      |  UI    A'-------------------X'    |
712	      +--------+    REFER/HTTP      +-----+

714	                     Figure 3: Co-Resident Topology

716	   In this deployment topology, the application is co-resident with one
717	   of the user agents (the one on the right in the picture above).  This
718	   application can install client-local user interface components on the
719	   other user agent, which is acting as the user device.  These
720	   components can be installed using either SUBSCRIBE, for presentation
721	   free user interfaces, or REFER, for presentation capable ones.  This
722	   situation typically arises when the application wishes to install UI
723	   components on a presentation capable user interface.  If the only
724	   user input is via keypad input, the framework is not needed per se,
725	   because the UA/application will receive the input via RFC 2833 in the
726	   RTP stream.

728	   If the application resides in the called party, it is called a
729	   terminating application.  If it resides in the calling party, it is
730	   called an originating application.

732	   This kind of topology is common in protocol converter and gateway
733	   applications.

735	6.3  Third Party Application and User Device Proxy

737	                                               +-------------+
738	                                           /---| Application |
739	                                          /    +-------------+
740	                                         /
741	                                   SUB/ /  REFER/
742	                                   NOT /   HTTP
743	                                      /
744	      +-----+        SIP         +---M----+        SIP         +-----+
745	      |     V--------------------C        A--------------------X     |
746	      | SIP |                    |   UI   |                    | SIP |
747	      | UAa |        RTP         |        |        RTP         | UAb |
748	      |     W--------------------D        B--------------------Y     |
749	      +-----+                    +--------+                    +-----+
750	       User                         User
751	       Device                      Device
752	                                   Proxy

754	                  Figure 4: User Device Proxy Topology

756	   In this deployment topology, there is a third party application as in
757	   Section 6.1.  However, instead of installing a user interface
758	   component on the end user device, the component is installed in an
759	   intermediate device, known as a User Device Proxy.  From the
760	   perspective of the actual user device (on the left), the User Device
761	   Proxy is a client remote user interface.  As such, media, typically
762	   transported using RTP (including RFC 2833 for carrying user input),
763	   is sent from the user device to the client remote user interface on
764	   the User Device Proxy.  As far as the application is concerned, it is
765	   installing what it thinks is a client local user interface on the
766	   user device, but it happens to be on a user device proxy which looks
767	   like the user device to the application.

769	   The user device proxy will need to terminate and re-originate both
770	   signaling (SIP) and media traffic towards the actual peer in the
771	   conversation.  The User Device Proxy is a media relay in the
772	   terminology of RFC 3550 [16].  The User Device Proxy will need to
773	   monitor the media streams associated with each dialog, in order to
774	   convert user input received in the media stream to events reported to
775	   the user interface.  This can pose a challenge in multi-media
776	   systems, where it may be unclear on which media stream the user input
777	   is being sent.  As discussed in RFC 3264 [18], if a user agent has a
778	   single media source and is supporting multiple streams, it is
779	   supposed to send that source to all streams.  In cases where there
780	   are multiple sources, the mapping is a matter of local policy.  In
781	   the absence of a way to explicitly identify or request which sources
782	   map to which streams, the user device proxy will need to do the best
783	   job it can.  This specification RECOMMENDS that the User Device Proxy
784	   monitor the first stream (defined in terms of ordering of media
785	   sessions within a session description).  As such, user agents SHOULD
786	   send their user input on the first stream, absent a policy to direct
787	   it otherwise.

789	6.4  Proxy Application
790	                             +----------+
791	               SUB/NOT       |   App    |      SUB/NOT
792	            +--------------->|          |<-----------------+
793	            |  REFER/HTTP    |..........|     REFER/HTTP   |
794	            |                |   SIP    |                  |
795	            |                |  Proxy   |                  |
796	            |                +----------+                  |
797	            V                 ^        |                   V
798	      +----------+            |        |             +----------+
799	      |   UI     |   INVITE   |        |    INVITE   |   UI     |
800	      |          |------------+        +------------>|          |
801	      |......... |                                   |..........|
802	      |   SIP    |...................................|   SIP    |
803	      |   UA     |                                   |   UA     |
804	      +----------+               RTP                 +----------+
805	        User Device                                    User Device

807	                  Figure 5: Proxy Application Topology

809	   In this topology, the application is co-resident with a transaction
810	   stateful, record-routing proxy server on the call path between two
811	   user devices.  The application uses SUBSCRIBE or REFER to install
812	   user interface components on one or both user devices.

814	   This topology is common in routing applications, such as a
815	   web-assisted call routing application.

817	7.  Application Behavior

819	   The behavior of an application within this framework depends on
820	   whether it seeks to use a client-local or client-remote user
821	   interface.

823	7.1  Client Local Interfaces

825	   One key component of this framework is support for client local user
826	   interfaces.

828	7.1.1  Discovering Capabilities

830	   A client local user interface can only be instantiated on a user
831	   agent if the user agent supports that type of user interface
832	   component.  Support for client local user interface components is
833	   declared by both the UAC and a UAS in its Accept, Allow, Contact and
834	   Allow-Event header fields of dialog-initiating requests and
835	   responses.  If the Allow header field indicates support for the SIP
836	   SUBSCRIBE method, and the Allow-Event header field indicates support
837	   for the kpml package [7], and the Supported header field indicates
838	   that its Contact URI is a GRUU [8], it means that the UA can
839	   instantiate presentation free user interface components.  In this
840	   case, the application MAY push presentation free user interface
841	   components according to the rules of Section 7.1.2.  The specific
842	   markup languages that can be supported are indicated in the Accept
843	   header field.

845	   If the Allow header field indicates support for the SIP REFER method,
846	   the Supported header field indicates support for the "refer-context"
847	   extension described below, and the Contact header field contains UA
848	   capabilities [5] that indicate support for the HTTP URI scheme, it
849	   means that the UA supports presentation capable user interface
850	   components.  In this case, the application MAY push presentation
851	   capable user interface components to the client according to the
852	   rules of Section 7.1.2.  The specific markups that are supported are
853	   indicated in the Accept header field.

855	   A third party application that is not present on the call path will
856	   not be privy to these headers in the dialog requests that pass by.
857	   As such, it will need to obtain this capability information in other
858	   ways.  One way is through the registration event package [19], which
859	   can contain user agent capability information provided in REGISTER
860	   requests [5].

862	7.1.2  Pushing an Initial Interface Component

864	   Generally, we anticipate that interface components will need to be
865	   created at various different points in a SIP session.  Clearly, they
866	   will need to be pushed during session setup, or after the session is
867	   established.  A user interface component is always associated with a
868	   specific dialog, however.

870	   An application MUST NOT attempt to push a user interface component to
871	   a user agent until it has determined that the user agent has the
872	   neccesary capabilities and a dialog has been created.  In the case of
873	   a UAC, this means that an application MUST NOT push a user interface
874	   component for an INVITE initiated dialog until the application has
875	   seen a request confirming the receipt of a dialog-creating response.
876	   This could be an ACK for a 200 OK, or a PRACK for a provisional
877	   response [2].  For SUBSCRIBE initiated dialogs, it MUST NOT push a
878	   user interface component until the application has seen a 200 OK to
879	   the NOTIFY request.  For a user interface component on a UAS, the
880	   application MUST NOT push a user interface component for an INVITE
881	   initiated dialog until it has seen a dialog-creating response from
882	   the UAS.  For a SUBSCRIBE initiated dialog, it MUST NOT push a user
883	   interface component until it has seen a NOTIFY request from the
884	   notifier.

886	   To create a presentation capable UI component on the UA, the
887	   application sends a REFER request to the UA.  This REFER MUST be sent
888	   to the Globally Routable UA URI (GRUU) [8] advertised by that UA in
889	   the Contact header field of the dialog initiating request or response
890	   sent by that UA.  Note that this REFER request creates a separate
891	   dialog between the application and the UA.  The Refer-To header field
892	   of the REFER request MUST contain an HTTP URI that references the
893	   markup document to be fetched.

895	   Furthermore, it is essential for the REFER request to be correlated
896	   with the dialog to which the user interface component will be
897	   associated.  This is necessary for authorization and for terminating
898	   the user interface components when the dialog terminates.  To provide
899	   this context, this specification defines the "context" header field
900	   parameter as an extension to the Refer-To heder field.  The grammar
901	   for this header field parameter is:

903	   refer-to-ctxt     = "context" EQUAL DQUOTE local-tag "," remote-tag
904	                       "," callid DQUOTE    ; callid defined in RFC 3261
905	                       ;; NOTE: any DQUOTEs inside callid MUST be escaped
906	                       ;; using quoted pair
907	   local-tag         = token
908	   remote-tag        = token

910	   Refer-To          = ("Refer-To" / "r") HCOLON ( name-addr / addr-spec ) *
911	        (SEMI (generic-param / refer-to-ctxt))

913	   The application MUST include the context header field parameter in
914	   the REFER request.  The remote-tag MUST be set to the remote tag of
915	   the dialog as seen by the user device.  The local-tag MUST be set to
916	   the local tag of the dialog as seen by the user device.  The callid
917	   MUST be set to the Call-ID of the dialog as seen by the device.
918	   Since the callid grammar allows it to contain double quotes, any such
919	   double quotes MUST be represented with a quoted pair.

921	   Since the "context" parameter in the Refer-To header field must be
922	   understood by the UA to process the request, this specification
923	   defines a new SIP option tag, "refer-context".  A REFER request
924	   generated by an application MUST include a Require header field with
925	   this option tag value.  Fortunately, the application will know ahead
926	   of time whether this extension is supported, as discussed in Section
927	   7.1.1.

929	   To create a presentation free user interface component, the
930	   application sends a SUBSCRIBE request to the UA.  The SUBSCRIBE MUST
931	   be sent to the GRUU advertised by the UA.  This SUBSCRIBE request
932	   creates a separate dialog.  The SUBSCRIBE request MUST use the KPML

934	   [7] event package.  The Event header field MUST contain parameters
935	   which identify the particular dialog that the interface component is
936	   being instantiated against.  The body of the SUBSCRIBE request
937	   contains the markup document that defines the conditions under which
938	   the application wishes to be notified of user input.

940	   In both cases, the REFER or SUBSCRIBE request SHOULD include a
941	   display name in the From header field which identifies the name of
942	   the application.  For example, a prepaid calling card might include a
943	   From header field which looks like:

945	   From: "Prepaid Calling Card" <sip:prepaid@example.com>

947	   Any of the SIP identity assertion mechanisms that have been defined,
948	   such as [10] and [12] are applicable to these requests as well.

950	7.1.3  Updating an Interface Component

952	   Once a user interface component has been created on a client, it can
953	   be updated.  The means for updating it depends on the type of UI
954	   component.

956	   Presentation capable UI components are updated using techniques
957	   already in place for those markups.  In particular, user input will
958	   cause an HTTP POST operation to push the user input to the
959	   application.  The result of the POST operation is a new markup that
960	   the UI is supposed to use.  This allows the UI to updated in response
961	   to user action.  Some markups, such as HTML, provide the ability to
962	   force a refresh after a certain period of time, so that the UI can be
963	   updated without user input.  Those mechanisms can be used here as
964	   well.  However, there is no support for an asynchronous push of an
965	   updated UI component from the appliciation to the user agent.  A new
966	   REFER request to the same GRUU would create a new UI component rather
967	   than updating any components already in place.

969	   For presentation free UI, the story is different.  The application
970	   MAY update the filter at any time by generating a SUBSCRIBE refresh
971	   with the new filter.  The UA will immediately begin using this new
972	   filter.

974	7.1.4  Terminating an Interface Component

976	   User interface components have a well defined lifetime.  They are
977	   created when the component is first pushed to the client.  User
978	   interface components are always associated with the SIP dialog on
979	   which they were pushed.  As such, their lifetime is bound by the
980	   lifetime of the dialog.  When the dialog ends, so does the interface
981	   component.

983	   However, there are some cases where the application would like to
984	   terminate the user interface component before its natural termination
985	   point.  For presentation capable user interfaces, this is not
986	   possible.  For presentation free user interfaces, the application MAY
987	   terminate the component by sending a SUBSCRIBE with Expires equal to
988	   zero.  This terminates the subscription, which removes the UI
989	   component.

991	   A client can remove a UI component at any time.  For presentation
992	   capable UI, this is analagous to the user dismissing the web form
993	   window.  There is no mechanism provided for reporting this kind of
994	   event to the application.  The application MUST be prepared to time
995	   out, and never receive input from a user.  The duration of this
996	   timeout is application dependent.  For presentation free user
997	   interfaces, the UA can explicitly terminate the subscription.  This
998	   will result in the generation of a NOTIFY with a Subscription-State
999	   header field equal to "terminated".

1001	7.2  Client Remote Interfaces

1003	   As an alternative to, or in conjunction with client local user
1004	   interfaces, an application can make use of client remote user
1005	   interfaces.  These user interfaces can execute co-resident with the
1006	   application itself (in which case no standardized interfaces between
1007	   the UI and the application need to be used), or it can run
1008	   separately.  This framework assumes that the user interface runs on a
1009	   host that has a sufficient trust relationship with the application.
1010	   As such, the means for instantiating the user interface is not
1011	   considered here.

1013	   The primary issue is to connect the user device to the remote user
1014	   interface.  Doing so requires the manipulation of media streams
1015	   between the client and the user interface.  Such manipulation can
1016	   only be done by user agents.  There are two types of user agent
1017	   applications within this framework - originating/terminating
1018	   applications, and intermediary applications.

1020	7.2.1  Originating and Terminating Applications

1022	   Originating and terminating applications are applications which are
1023	   themselves the originator or the final recipient of a SIP invitation.
1024	   They are "pure" user agent applications - not back-to-back user
1025	   agents.  The classic example of such an application is an interactive
1026	   voice response (IVR) application, which is typically a terminating
1027	   application.  It is a terminating application because the user
1028	   explicitly calls it; i.e., it is the actual called party.  An example
1029	   of an originating application is a wakeup call application, which
1030	   calls a user at a specified time in order to wake them up.

1032	   Because originating and terminating applications are a natural
1033	   termination point of the dialog, manipulation of the media session by
1034	   the application is trivial.  Traditional SIP techniques for adding
1035	   and removing media streams, modifying codecs, and changing the
1036	   address of the recipient of the media streams, can be applied.
1037	   Similarly, the application can directly authenticate itself to the
1038	   user through S/MIME, since it is the peer UA in the dialog.

1040	7.2.2  Intermediary Applications

1042	   Intermediary applications are, at the same time, more common than
1043	   originating/terminating applications, and more complex.  Intermediary
1044	   applications are applications that are neither the actual caller or
1045	   called party.  Rather, they represent a "third party" that wishes to
1046	   interact with the user.  The classic example is the ubiquitous
1047	   pre-paid calling card application.

1049	   In order for the intermediary application to add a client remote user
1050	   interface, it needs to manipulate the media streams of the user agent
1051	   to terminate on that user interface.  This also introduces a
1052	   fundamental feature interaction issue.  Since the intermediary
1053	   application is not an actual participant in the call, the user will
1054	   need to interact with both the intermediary application and its peer
1055	   in the dialog.  Doing both at the same time is complicated, and is
1056	   discussed in more detail in Section 9.

1058	8.  User Agent Behavior

1060	8.1  Advertising Capabilities

1062	   In order to participate in applications that make use of stimulus
1063	   interfaces, a user agent needs to advertise its interaction
1064	   capabilities.

1066	   If a user agent supports presentation capable user interfaces, it
1067	   MUST support the REFER method, along with the "context" extension
1068	   defined here.  It MUST include, in all dialog initiating requests and
1069	   responses, an Allow header field that includes the REFER method and
1070	   and the Supported header field that includes the value
1071	   "refer-context".  Furthermore, the UA MUST support the SIP user agent
1072	   capabilities specification [5].  The UA MUST be capable of being
1073	   REFER'd to an HTTP URI.  It MUST include, in the Contact header field
1074	   of its dialog initiating requests and responses, a "schemes" Contact
1075	   header field parameter include the http URI scheme.  The UA MUST
1076	   include, in all dialog initiating requests and responses, an Accept
1077	   header field listing all of those markups supported by the UA.  It is
1078	   RECOMMENDED that all user agents that support presentation capable
1079	   user interfaces support HTML.

1081	   If a user agent supports presentation free user interfaces, it MUST
1082	   support the SUBSCRIBE [3] method.  It MUST support the KPML [7] event
1083	   package.  It MUST include, in all dialog initiating requests and
1084	   responses, an Allow header field that includes the SUBSCRIBE method.
1085	   It MUST include, in all dialog initiating requests and responses, an
1086	   Allow-Events header field that lists the KPML event package.  The UA
1087	   MUST include, in all dialog initiating requests and responses, an
1088	   Accept header field listing those event filters it supports.  At a
1089	   minimum, a UA MUST support the "application/kpml-request+xml" MIME
1090	   type.

1092	   For either presentation free or presentation capable user interfaces,
1093	   the user agent MUST support the GRUU [8] specification.  The Contact
1094	   header field in all dialog initiating requests and responses MUST
1095	   contain a GRUU.  The UA MUST include a Supported header field which
1096	   contains the "gruu" option tag.

1098	   Because these headers are examined by proxies which may be executing
1099	   applications, a UA that wishes to support client local user
1100	   interfaces should not encrypt them.

1102	8.2  Receiving User Interface Components

1104	   Once the UA has created a dialog (in either the early or confirmed
1105	   states), it MUST be prepared to receive a SUBSCRIBE or REFER request
1106	   against its GRUU.  If the UA receives such a request prior to the
1107	   establishment of a dialog, the UA MUST reject the request.

1109	   A user agent SHOULD attempt to authenticate the sender of the
1110	   request.  The sender will generally be an application, and therefore
1111	   the user agent is unlikely to ever have a shared secret with it,
1112	   making digest authentication useless.  However, authenticated
1113	   identities can be obtained through other means, such as [10].

1115	   A user agent MAY have pre-defined authorization policies which permit
1116	   applications which have authenticated themselves with a particular
1117	   identity, to push user interface components.  If such a set of
1118	   policies are present, it is checked first.  If the application is
1119	   authorized, processing proceeds.

1121	   If the application has authenticated itself, but it is not explicitly
1122	   authorized or blocked, this specification RECOMMENDS that the
1123	   application be automatically authorized if it can prove that it was
1124	   either on the call path, or is trusted by one of the elements on the
1125	   call path.  An application proves this to the user agent by
1126	   presenting it with the dialog identifiers in the SUBSCRIBE or REFER
1127	   request.  In the case of SUBSCRIBE, those identifiers are present in
1128	   the Event header field [7].  In the case of REFER, those identifiers
1129	   are present in the "context" parameter of the Refer-To header field.

1131	   Because of the dialog identifiers serve as a tool for authorization,
1132	   a user agent compliant to this framework SHOULD use dialog
1133	   identifiers that are cryptographically random, with at least 128 bits
1134	   of randomness.  It is recommended that this randomness be split
1135	   between the Call-ID and From header field tag in the case of a UAC.

1137	   Furthermore, to ensure that only applications resident in or trusted
1138	   by on-path elements can instantiate a user interface component, a
1139	   user agent compliant to this specification SHOULD use the sips URI
1140	   scheme for all dialogs it initiates.  This will guarantee secure
1141	   links between all of the elements on the signaling path.

1143	   If the dialog was not established with a sips URI, or the user agent
1144	   did not choose cryptographically random dialog identifiers, then the
1145	   application MUST NOT automatically be authorized, even if it
1146	   presented valid dialog identifiers.  A user agent MAY apply any other
1147	   policies in addition to (but not instead of) the ones specified here
1148	   in order to authorize the creation of the user interface component.
1149	   One such mechanism would be to prompt the user, informing them of the
1150	   identity of the application and the dialog it is associated with.  If
1151	   an authorization policy requires user interaction, the user agent
1152	   SHOULD respond to the SUBSCRIBE or REFER request with a 202.  In the
1153	   case of SUBSCRIBE, if authorization is not granted, the user agent
1154	   SHOULD generate a NOTIFY to terminate the subscription.  In the case
1155	   of REFER, the user agent MUST NOT act upon the URI in the Refer-To
1156	   header field until user authorization was obtained.

1158	   If an application does not present a valid dialog identifier in its
1159	   REFER or SUBSCRIBE request, the user agent MUST reject the request
1160	   with a 403 response.

1162	   If a REFER request to an HTTP URI was authorized, the UA executes the
1163	   URI and fetches the content to be rendered to the user.  This
1164	   instantiates a presentation capable user interface component.  If a
1165	   SUBSCRIBE was authorized, a presentation free user interface
1166	   component was instantiated.

1168	8.3  Mapping User Input to User Interface Components

1170	   Once the user interface components are instantiated, the user agent
1171	   must direct user input to the appropriate component.  In the case of
1172	   presentation capable user interfaces, this process is known as focus
1173	   selection.  It is done by means that are specific to the user
1174	   interface on the device.  In the case of a PC, for example, the
1175	   window manager would allow the user to select the appropriate user
1176	   interface component that their input is directed to.

1178	   For presentation free user interfaces, the situation is more
1179	   complicated.  In some cases, the device may support a mechanism that
1180	   allows the user to select a "line", and thus the associated dialog.
1181	   Any user input on the keypad while this line is selected are fed to
1182	   the user interface components associated with that dialog.

1184	   Otherwise, for client local user interfaces, the user input is
1185	   assumed to be associated with all user interface components.  For
1186	   client remote user interfaces, the user device converts the user
1187	   input to media, typically conveyed using RFC 2833, and sends this to
1188	   the client remote user interface.  This user interface then needs to
1189	   map user input from potentially many media streams into user
1190	   interface events.  The process for doing this is described in Section
1191	   6.3.

1193	8.4  Receiving Updates to User Interface Components

1195	   For presentation capable user interfaces, updates to the user
1196	   interface occur in ways specific to that user interface component.
1197	   In the case of HTML, for example, the document can tell the client to
1198	   fetch a new document periodically.  However, this framework does not
1199	   provide any additional machinery to asynchronously push a new user
1200	   interface component to the client.

1202	   For presentation free user interfaces, an application can push an
1203	   update to a component by sending a SUBSCRIBE refresh with a new
1204	   filter.  The user agent will process these according to the rules of
1205	   the event package.

1207	8.5  Terminating a User Interface Component

1209	   Termination of a presentation capable user interface component is a
1210	   trivial procedure.  The user agent merely dismisses the window (or
1211	   equivalent).  The fact that the component is dismissed is not
1212	   communicated to the application.  As such, it is purely a local
1213	   matter.

1215	   In the case of a presentation free user interface, the user might
1216	   wish to cease interacting with the application.  However, most
1217	   presentation free user interfaces will not have a way for the user to
1218	   signal this through the device.  If such a mechanism did exist, the
1219	   UA SHOULD generate a NOTIFY request with a Subscription-State equal
1220	   to "terminated" and a reason of "rejected".  This tells the
1221	   application that the component has been removed, and that it should
1222	   not attempt to re-subscribe.

1224	9.  Inter-Application Feature Interaction

1226	   The inter-application feature interaction problem is inherent to
1227	   stimulus signaling.  Whenever there are multiple applications, there
1228	   are multiple user interfaces.  The system has to determine to which
1229	   user interface any particular input is destined.  That question is
1230	   the essence of the inter-application feature interaction problem.

1232	   Inter-application feature interaction is not an easy problem to
1233	   resolve.  For now, we consider separately the issues for client-local
1234	   and client-remote user interface components.

1236	9.1  Client Local UI

1238	   When the user interface itself resides locally on the client device,
1239	   the feature interaction problem is actually much simpler.  The end
1240	   device knows explicitly about each application, and therefore can
1241	   present the user with each one separately.  When the user provides
1242	   input, the client device can determine to which user interface the
1243	   input is destined.  The user interface to which input is destined is
1244	   referred to as the application in focus, and the means by which the
1245	   focused application is selected is called focus determination.

1247	   Generally speaking, focus determination is purely a local operation.
1248	   In the PC universe, focus determination is provided by window
1249	   managers.  Each application does not know about focus, it merely
1250	   receives the user input that has been targeted to it when its in
1251	   focus.  This basic concept applies to SIP-based applications as well.

1253	   Focus determination will frequently be trivial, depending on the user
1254	   interface type.  Consider a user that makes a call from a PC.  The
1255	   call passes through a pre-paid calling card application, and a call
1256	   recording application.  Both of these wish to interact with the user.
1257	   Both push an HTML-based user interface to the user.  On the PC, each
1258	   user interface would appear as a separate window.  The user interacts
1259	   with the call recording application by selecting its window, and with
1260	   the pre-paid calling card application by selecting its window.  Focus
1261	   determination is literally provided by the PC window manager.  It is
1262	   clear to which application the user input is targeted.

1264	   As another example, consider the same two applications, but on a
1265	   "smart phone" that has a set of buttons, and next to each button, an
1266	   LCD display that can provide the user with an option.  This user
1267	   interface can be represented using the Wireless Markup Language
1268	   (WML).

1270	   The phone would allocate some number of buttons to each application.
1271	   The prepaid calling card would get one button for its "hangup"
1272	   command, and the recording application would get one for its "start/
1273	   stop" command.  The user can easily determine which application to
1274	   interact with by pressing the appropriate button.  Pressing a button
1275	   determines focus and provides user input, both at the same time.

1277	   Unfortunately, not all devices will have these advanced displays.  A
1278	   PSTN gateway, or a basic IP telephone, may only have a 12-key keypad.
1279	   The user interfaces for these devices are provided through the Keypad
1280	   Markup Language (KPML).  Considering once again the feature
1281	   interaction case above, the pre-paid calling card application and the
1282	   call recording application would both pass a KPML document to the
1283	   device.  When the user presses a button on the keypad, to which
1284	   document does the input apply? The user interface does not allow the
1285	   user to select.  A user interface where the user cannot provide focus
1286	   is called a focusless user interface.  This is quite a hard problem
1287	   to solve.  This framework does not make any explicit normative
1288	   recommendation, but concludes that the best option is to send the
1289	   input to both user interfaces unless the markup in one interface has
1290	   indicated that it should be suppressed from others.  This is a
1291	   sensible choice by analogy - its exactly what the existing circuit
1292	   switched telephone network will do.  It is an explicit non-goal to
1293	   provide a better mechanism for feature interaction resolution than
1294	   the PSTN on devices which have the same user interface as they do on
1295	   the PSTN.  Devices with better displays, such as PCs or screen
1296	   phones, can benefit from the capabilities of this framework, allowing
1297	   the user to determine which application they are interacting with.

1299	   Indeed, when a user provides input on a focusless device, the input
1300	   must be passed to all client local user interfaces, AND all client
1301	   remote user interfaces, unless the markup tells the UI to suppress
1302	   the media.  In the case of KPML, key events are passed to remote user
1303	   interfaces by encoding them in RFC 2833 [17].  Of course, since a
1304	   client cannot determine if a media stream terminates in a remote user
1305	   interface or not, these key events are passed in all audio media
1306	   streams unless the KPML request document is used to suppress.

1308	9.2  Client-Remote UI

1310	   When the user interfaces run remotely, the determination of focus can
1311	   be much, much harder.  There are many architectures that can be
1312	   deployed to handle the interaction.  None are ideal.  However, all
1313	   are beyond the scope of this specification.

1315	10.  Intra Application Feature Interaction

1317	   An application can instantiate a multiplicity of user interface
1318	   components.  For example, a single application can instantiate two
1319	   separate HTML components and one WML component.  Furthermore, an
1320	   application can instantiate both client local and client remote user
1321	   interfaces.

1323	   The feature interaction issues between these components within the
1324	   same application are less severe.  If an application has multiple
1325	   client user interface components, their interaction is resolved
1326	   identically to the inter-application case - through focus
1327	   determination.  However, the problems in focusless user interfaces
1328	   (such as a keypad) generally won't exist, since the application can
1329	   generate user interfaces which do not overlap in their usage of an
1330	   input.

1332	   The real issue is that the optimal user experience frequently
1333	   requires some kind of coupling between the differing user interface
1334	   components.  This is a classic problem in multi-modal user
1335	   interfaces, such as those described by Speech Application Language
1336	   Tags (SALT).  As an example, consider a user interface where a user
1337	   can either press a labeled button to make a selection, or listen to a
1338	   prompt, and speak the desired selection.  Ideally, when the user
1339	   presses the button, the prompt should cease immediately, since both
1340	   of them were targeted at collecting the same information in parallel.
1341	   Such interactions are best handled by markups which natively support
1342	   such interactions, such as SALT, and thus require no explicit support
1343	   from this framework.

1345	11.  Example Call Flow

1347	   This section shows the operation of a call recording application.
1348	   This application allows a user to record the media in their call by
1349	   clicking on a button in a web form.  The application uses a
1350	   presentation capable user interface component that is pushed to the
1351	   caller.

1353	             A                  Recording App                  B
1354	             |(1) INVITE              |                        |
1355	             |----------------------->|                        |
1356	             |                        |(2) INVITE              |
1357	             |                        |----------------------->|
1358	             |                        |(3) 200 OK              |
1359	             |                        |<-----------------------|
1360	             |(4) 200 OK              |                        |
1361	             |<-----------------------|                        |
1362	             |(5) ACK                 |                        |
1363	             |----------------------->|                        |
1364	             |                        |(6) ACK                 |
1365	             |                        |----------------------->|
1366	             |(7) REFER               |                        |
1367	             |<-----------------------|                        |
1368	             |(8) 200 OK              |                        |
1369	             |----------------------->|                        |
1370	             |(9) NOTIFY              |                        |
1371	             |----------------------->|                        |
1372	             |(10) 200 OK             |                        |
1373	             |<-----------------------|                        |
1374	             |(11) HTTP GET           |                        |
1375	             |----------------------->|                        |
1376	             |(12) 200 OK             |                        |
1377	             |<-----------------------|                        |
1378	             |(13) NOTIFY             |                        |
1379	             |----------------------->|                        |
1380	             |(14) 200 OK             |                        |
1381	             |<-----------------------|                        |
1382	             |(15) HTTP POST          |                        |
1383	             |----------------------->|                        |
1384	             |(16) 200 OK             |                        |
1385	             |<-----------------------|                        |

1387	                                Figure 8

1389	   First, the caller, A, sends an INVITE to setup a call (message 1).
1390	   Since the caller supports the framework, and can handle presentation
1391	   capable user interface components, it includes the Supported header
1392	   field indicating that the GRUU extension and the REFER context
1393	   extension are understood, Allow indicating that REFER is understood,
1394	   and a Contact header field that includes the "schemes" header field
1395	   parameter.

1397	   INVITE sips:B@example.com SIP/2.0
1398	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1399	   From: Caller <sip:A@example.com>;tag=kkaz-
1400	   To: Callee <sip:B@example.com>
1401	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1402	   CSeq: 1 INVITE
1403	   Max-Forwards: 70
1404	   Supported: gruu, refer-context
1405	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1406	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1407	   Content-Length: ...
1408	   Content-Type: application/sdp

1410	   --SDP not shown--

1412	   The proxy acts as a recording server, and forwards the INVITE to the
1413	   called party (message 2):

1415	   INVITE sips:B@pc.example.com SIP/2.0
1416	   Record-Route: <sips:app.example.com;lr>
1417	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
1418	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1419	   From: Caller <sip:A@example.com>;tag=kkaz-
1420	   To: Callee <sip:B@example.com>
1421	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1422	   CSeq: 1 INVITE
1423	   Max-Forwards: 69
1424	   Supported: gruu, refer-context
1425	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1426	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1427	   Content-Length: ...
1428	   Content-Type: application/sdp

1430	   --SDP not shown--

1432	   B accepts the call with a 200 OK (message 3).  It does not support
1433	   the framework, and so the various header fields are not present.

1435	   SIP/2.0 200 OK
1436	   Record-Route: <ssip:app.example.com;lr>
1437	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
1438	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1439	   From: Caller <sip:A@example.com>;tag=kkaz-
1440	   To: Callee <sip:B@example.com>;tag=7777
1441	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1442	   CSeq: 1 INVITE
1443	   Contact: <sips:B@pc.example.com>
1444	   Content-Length: ...
1445	   Content-Type: application/sdp

1447	   --SDP not shown--

1449	   This 200 OK is passed back to the caller (message 4):

1451	   SIP/2.0 200 OK
1452	   Record-Route: <sips:app.example.com;lr>
1453	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
1454	   From: Caller <sip:A@example.com>;tag=kkaz-
1455	   To: Callee <sip:B@example.com>;tag=7777
1456	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1457	   CSeq: 1 INVITE
1458	   Contact: <sips:B@pc.example.com>
1459	   Content-Length: ...
1460	   Content-Type: application/sdp

1462	   --SDP not shown--

1464	   The caller generates an ACK (message 5).

1466	   ACK sips:B@pc.example.com
1467	   Route: <sips:app.example.com;lr>
1468	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
1469	   From: Caller <sip:A@example.com>;tag=kkaz-
1470	   To: Callee <sip:B@example.com>;tag=7777
1471	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1472	   CSeq: 1 ACK

1474	   The ACK is forwarded to the called party (message 6).

1476	   ACK sips:B@pc.example.com
1477	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bKh7s
1478	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
1479	   From: Caller <sip:A@example.com>;tag=kkaz-
1480	   To: Callee <sip:B@example.com>;tag=7777
1481	   Call-ID: faif9ahhs9dd8==-sd98ajzz@host.example.com
1482	   CSeq: 1 ACK

1484	   Now, the application decides to push a user interface component to
1485	   user A.  So, it sends it a REFER request (message 7):

1487	   REFER sips:bad998asd8asd0000a@example.com SIP/2.0
1488	   Refer-To: https://app.example.com/script.pl
1489	    ;context="kkaz-,7777,faif9ahhs9dd8==-sd98ajzz@host.example.com"
1490	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
1491	   Max-Forwards: 70
1492	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1493	   To: Caller <sip:A@example.com>
1494	   Call-ID: 66676776767@app.example.com
1495	   CSeq: 1 REFER
1496	   Event: refer
1497	   Contact: <sips:app.example.com>

1499	   The REFER is answered by a 200 OK (message 8).

1501	   SIP/2.0 200 OK
1502	   Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
1503	   From: Recorder Application <sip:app.example.com>;tag=jhgf
1504	   To: Caller <sip:A@example.com>;tag=pqoew
1505	   Call-ID: 66676776767@app.example.com
1506	   Supported: gruu, refer-context
1507	   Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
1508	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1509	   CSeq: 1 REFER

1511	   User A sends a NOTIFY (message 9):

1513	   NOTIFY sips:app.example.com SIP/2.0
1514	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
1515	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1516	   From: Caller <sip:A@example.com>;tag=pqoew
1517	   Call-ID: 66676776767@app.example.com
1518	   CSeq: 1 NOTIFY
1519	   Max-Forwards: 70
1520	   Event: refer;id=93809824
1521	   Subscription-State: active;expires=3600
1522	   Contact: <sips:bad998asd8asd0000a@example.com>;schemes="http,sip,sips"
1523	   Content-Type: message/sipfrag;version=2.0
1524	   Content-Length: 20
1525	   SIP/2.0 100 Trying

1527	   And the recording server responds with a 200 OK (message 10)

1529	   SIP/2.0 200 OK
1530	   Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
1531	   To: Recorder Application <sip:app.example.com>;tag=jhgf
1532	   From: Caller <sip:A@example.com>;tag=pqoew
1533	   Call-ID: 66676776767@app.example.com
1534	   CSeq: 1 NOTIFY

1536	   The REFER request contained a "context" Refer-To header field
1537	   parameter with a valid dialog identifier.  Furthermore, all of the
1538	   signaling was over TLS and the dialog identifiers contain sufficient
1539	   randomness.  As such, the caller, A, automatically authorizes the
1540	   application.  It then acts on the Refer-To URI, fetching the script
1541	   from app.example.com (message 11).  The response, message 12,
1542	   contains a web application that the user can click on to enable
1543	   recording.  Because the client executed the URL in the Refer-To, it
1544	   generates another NOTIFY to the application, informing it of the
1545	   successful response (message 13).  This is answered with a 200 OK
1546	   (message 14).  When the user clicks on the link (message 15), the
1547	   results are posted to the server, and an updated display is provided
1548	   (message 16).

1550	12.  Security Considerations

1552	   There are many security considerations associated with this
1553	   framework.  It allows applications in the network to instantiate user
1554	   interface components on a client device.  Such instantiations need to
1555	   be from authenticated applications, and also need to be authorized to
1556	   place a UI into the client.  Indeed, the stronger requirement is
1557	   authorization.  It is not so important to know that name of the
1558	   provider of the application, but rather, that the provider is
1559	   authorized to instantiate components.

1561	   This specification defines specific authorization techniques and
1562	   requirements.  Automatic authorization is granted if the application
1563	   can prove that it is on the call path, or is trusted by an element on
1564	   the call path.  As documented above, this can be accompished by the
1565	   use of cryptographically random dialog identifiers and the usage of
1566	   sips for message confidentiality.  It is RECOMMENDED that sips be
1567	   implemented by user agents compliant to this specification.  This
1568	   does not represent a change from the requirements in RFC 3261.

1570	13.  IANA Considerations

1572	13.1  SIP Option Tag

1574	   This specification registers a new SIP option tag, as per the
1575	   guidelines in Section 27.1 of RFC 3261 [1].

1577	   Name: refer-context

1579	   Description: This option tag is used to identify the REFER extension
1580	      that defines the "context" parameter of the Refer-To header field.

1582	13.2  Header Field Parameter

1584	   This specification defines a new header field parameter, as per the
1585	   registry created by [9].  The required information is as follows:

1587	   Header field in which the parameter can appear: Refer-To

1589	   Name of the Parameter context

1591	   RFC Reference RFC XXXX [[NOTE TO IANA: Please replace XXXX with the
1592	      RFC number of this specification.]]

1594	14.  Contributors

1596	   This document was produced as a result of discussions amongst the
1597	   application interaction design team.  All members of this team
1598	   contributed significantly to the ideas embodied in this document.
1599	   The members of this team were:

1601	   Eric Burger
1602	   Cullen Jennings
1603	   Robert Fairlie-Cuninghame

1605	15.  Acknowledgements

1607	   The authors would like to thank Martin Dolly and Rohan Mahy for their
1608	   input and comments.  Thanks to Allison Mankin for her support of this
1609	   work.

1611	16.  References
1612	16.1  Normative References

1614	   [1]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
1615	        Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
1616	        Session Initiation Protocol", RFC 3261, June 2002.

1618	   [2]  Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
1619	        Responses in Session Initiation Protocol (SIP)", RFC 3262, June
1620	        2002.

1622	   [3]  Roach, A., "Session Initiation Protocol (SIP)-Specific Event
1623	        Notification", RFC 3265, June 2002.

1625	   [4]  McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
1626	        Carter, J., Ferrans, J. and A. Hunt, "Voice Extensible Markup
1627	        Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-20030220,
1628	        February 2003.

1630	   [5]  Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Indicating User
1631	        Agent Capabilities in the Session Initiation Protocol (SIP)",
1632	        RFC 3840, August 2004.

1634	   [6]  Sparks, R., "The Session Initiation Protocol (SIP) Refer
1635	        Method", RFC 3515, April 2003.

1637	   [7]  Burger, E., "A Session Initiation Protocol (SIP) Event Package
1638	        for Key Press Stimulus  (KPML)", draft-ietf-sipping-kpml-04
1639	        (work in progress), July 2004.

1641	   [8]  Rosenberg, J., "Obtaining and Using Globally Routable User Agent
1642	        (UA) URIs (GRUU) in the  Session Initiation Protocol (SIP)",
1643	        draft-ietf-sip-gruu-02 (work in progress), July 2004.

1645	   [9]  Camarillo, G., "The Internet Assigned Number Authority (IANA)
1646	        Header Field Parameter  Registry for the Session Initiation
1647	        Protocol (SIP)", draft-ietf-sip-parameter-registry-02 (work in
1648	        progress), June 2004.

1650	16.2  Informative References

1652	   [10]  Peterson, J., "Enhancements for Authenticated Identity
1653	         Management in the Session Initiation  Protocol (SIP)",
1654	         draft-ietf-sip-identity-03 (work in progress), September 2004.

1656	   [11]  Day, M., Rosenberg, J. and H. Sugano, "A Model for Presence and
1657	         Instant Messaging", RFC 2778, February 2000.

1659	   [12]  Jennings, C., Peterson, J. and M. Watson, "Private Extensions
1660	         to the Session Initiation Protocol (SIP) for Asserted Identity
1661	         within Trusted Networks", RFC 3325, November 2002.

1663	   [13]  Rosenberg, J., "A Framework for Conferencing with the Session
1664	         Initiation Protocol",
1665	         draft-ietf-sipping-conferencing-framework-02 (work in
1666	         progress), June 2004.

1668	   [14]  Rosenberg, J., Schulzrinne, H. and P. Kyzivat, "Caller
1669	         Preferences for the Session Initiation Protocol (SIP)", RFC
1670	         3841, August 2004.

1672	   [15]  Rosenberg, J. and H. Schulzrinne, "An INVITE Inititiated Dialog
1673	         Event Package for the Session Initiation  Protocol (SIP)",
1674	         draft-ietf-sipping-dialog-package-04 (work in progress),
1675	         February 2004.

1677	   [16]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
1678	         "RTP: A Transport Protocol for Real-Time Applications", RFC
1679	         3550, July 2003.

1681	   [17]  Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
1682	         Telephony Tones and Telephony Signals", RFC 2833, May 2000.

1684	   [18]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1685	         Session Description Protocol (SDP)", RFC 3264, June 2002.

1687	   [19]  Rosenberg, J., "A Session Initiation Protocol (SIP) Event
1688	         Package for Registrations", RFC 3680, March 2004.

1690	Author's Address

1692	   Jonathan Rosenberg
1693	   Cisco Systems
1694	   600 Lanidex Plaza
1695	   Parsippany, NJ  07054
1696	   US

1698	   Phone: +1 973 952-5000
1699	   EMail: jdrosen@dynamicsoft.com
1700	   URI:   http://www.jdrosen.net

1702	Intellectual Property Statement

1704	   The IETF takes no position regarding the validity or scope of any
1705	   Intellectual Property Rights or other rights that might be claimed to
1706	   pertain to the implementation or use of the technology described in
1707	   this document or the extent to which any license under such rights
1708	   might or might not be available; nor does it represent that it has
1709	   made any independent effort to identify any such rights.  Information
1710	   on the procedures with respect to rights in RFC documents can be
1711	   found in BCP 78 and BCP 79.

1713	   Copies of IPR disclosures made to the IETF Secretariat and any
1714	   assurances of licenses to be made available, or the result of an
1715	   attempt made to obtain a general license or permission for the use of
1716	   such proprietary rights by implementers or users of this
1717	   specification can be obtained from the IETF on-line IPR repository at
1718	   http://www.ietf.org/ipr.

1720	   The IETF invites any interested party to bring to its attention any
1721	   copyrights, patents or patent applications, or other proprietary
1722	   rights that may cover technology that may be required to implement
1723	   this standard.  Please address the information to the IETF at
1724	   ietf-ipr@ietf.org.

1726	Disclaimer of Validity

1728	   This document and the information contained herein are provided on an
1729	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1730	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1731	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1732	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1733	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1734	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1736	Copyright Statement

1738	   Copyright (C) The Internet Society (2004).  This document is subject
1739	   to the rights, licenses and restrictions contained in BCP 78, and
1740	   except as set forth therein, the authors retain all their rights.

1742	Acknowledgment

1744	   Funding for the RFC Editor function is currently provided by the
1745	   Internet Society.