idnits 2.17.1 

draft-rosenberg-sip-vxml-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 2 instances of too long lines in the document, the longest one
     being 6 characters in excess of 72.

  == There are 9 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 131: '...ted), the server SHOULD generate a 501...'
     RFC 2119 keyword, line 135: '...   The server SHOULD authenticate the ...'
     RFC 2119 keyword, line 141: '... request is allowed. It is RECOMMENDED...'
     RFC 2119 keyword, line 149: '... far, the server SHOULD fetch the scri...'
     RFC 2119 keyword, line 153: '...eway to HTTP. It MAY include a Warning...'
     (32 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 13, 2001) is 8323 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 658 looks like a reference

  -- Missing reference section? '2' on line 662 looks like a reference

  -- Missing reference section? '3' on line 666 looks like a reference

  -- Missing reference section? '4' on line 670 looks like a reference

  -- Missing reference section? '5' on line 674 looks like a reference

  -- Missing reference section? '6' on line 678 looks like a reference

  -- Missing reference section? '7' on line 682 looks like a reference

  -- Missing reference section? '8' on line 686 looks like a reference

  -- Missing reference section? '9' on line 690 looks like a reference

  -- Missing reference section? '10' on line 695 looks like a reference

  -- Missing reference section? '11' on line 699 looks like a reference

  -- Missing reference section? '12' on line 703 looks like a reference


     Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                   SIP WG
3	Internet Draft                                     Rosenberg,Mataga,Ladd
4	draft-rosenberg-sip-vxml-00.txt                              dynamicsoft
5	July 13, 2001
6	Expires: February 2001

8	               A SIP Interface to VoiceXML Dialog Servers

10	STATUS OF THIS MEMO

12	   This document is an Internet-Draft and is in full conformance with
13	   all provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress".

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt

28	   To view the list Internet-Draft Shadow Directories, see
29	   http://www.ietf.org/shadow.html.

31	Abstract

33	   VoiceXML is an XML based scripting language for describing voice
34	   dialogs. VoiceXML interpreters run within an interpreter context
35	   that, among other tasks, provides a call control interface for
36	   accessing the interpreter. It is very natural to provide a VoIP-based
37	   interpreter context that uses SIP and RTP to communicate with the
38	   outside world. In this document, we provide detailed specifications
39	   for a SIP/RTP based interpreter context.

41	1 Introduction

43	   VoiceXML [1] is an XML based scripting language for describing voice
44	   dialogs. It supports user input through speech recognition and DTMF,
45	   and can communicate with the user through text-to-speech or recorded
46	   files. VoiceXML scripts are interpreted by a VoiceXML interpreter.

48	   This interpreter, in turn, runs within an interpreter context. The
49	   interpreter context is the interface between the outside world and
50	   the interpreter. It typically handles the mechanisms by which the
51	   script execution begins, and by which it is fed media to drive it. It
52	   also provides the means for fetching documents from some form of
53	   document server.

55	   It is very natural to provide a VoiceXML interpeter context based
56	   purely on IP. Specifically, based on VoIP using SIP [2] and RTP [3],
57	   along with HTTP for document access. An incoming VoIP call triggers
58	   the execution of the script, fetched from a server using HTTP. The
59	   incoming RTP stream for the call is passed to the interpeter for
60	   processing, and speech generated by the interpreter is sent over RTP
61	   to the called party. We call a pure IP-based VoiceXML system an "IP
62	   dialog server", or just "dialog server".

64	   Dialog servers are a key part of the application story for SIP-based
65	   networks, as described in the SIP application component architecture
66	   [4]. That document describes SIP-based dialog servers, and provides a
67	   high level overview of how the SIP interface works. This document
68	   provides a stand-alone, self-contained, more thorough description of
69	   a SIP-based VoIP VoiceXML interpreter context.

71	2 Script Initiation

73	   The script execution begins when a session is established using an
74	   INVITE request.

76	2.1 Script Naming

78	   In SIP, the request-URI identifies the user or service that the call
79	   is destined for. In the case of a dialog server, the dialog itself is
80	   the target for the call. As such, the request URI should contain the
81	   identifier for this dialog. This is consistent with the Request-URI
82	   service invocation model of RFC 3087 [5]. This URL can be in one of
83	   two formats. In the first, the VoiceXML script is identified directly
84	   by an HTTP URL. In the second, the script is not specified. Rather,
85	   the dialog server uses its configuration to map the incoming request
86	   to a specific script. The format for the Request-URI in either case
87	   is:

89	   Request-URI      =  "sip:" service-ID "." dialog-type ["." dialog-specific]
90	                       "@" hostport url-parameters [headers]
91	   service-ID       =  "dialog" | extension-token
92	   dialog-type      =  "vxml" | service-token
93	   dialog-specific  =  vxml-specific | service-token
94	   service-token    =  1*(alphanum | "-" | "!" | "%" | "*"
95	                       | "_" | "+" | "`" | "'" | "~{}" )
96	   vxml-specific    =  user-unreserved | unreserved | escaped

98	   Since the request URI can indicate a request for a variety of
99	   different services, of which a dialog server is only one type, the
100	   request URI first begins with a service identifier, that indicates
101	   the basic service required. This document specifies that dialog
102	   servers are addressed by having the first part of the username in the
103	   request-URI contain the service identifier "dialog" to indicate that
104	   a dialog service is requested. This is followed by a period, and
105	   after that, an identifier that indicates the means by which the
106	   dialog is specified. Currently, one mechanism is defined - a VoiceXML
107	   script. Other tokens can be used to indicate different mechanisms
108	   (note that service-token is identical to the BNF for token from RFC
109	   2543, except that the "." character is disallowed). After that comes
110	   an optional period followed by dialog-mechanism specific
111	   identification. For VoiceXML scripts, when present, this
112	   identification information is always a URL-encoded version of the URL
113	   which references the script to execute. When not present, the dialog
114	   server uses server-specific configuration to determine which script
115	   to execute.

117	   Examples of URLs that invoke VoiceXML dialogs are:

119	   sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml@vxmlservers.com
120	   sip:dialog.vxml@vxmlservers.com

122	   The first of these indicates that the dialog server (located at
123	   vxmlservers.com) should invoke a VoiceXML script fetched from
124	   http://dialogs.server.com/script32.vxml. Since the user part of the
125	   SIP URL cannot contain the : character, this must be escaped to %3a.

127	2.2 Responding to the INVITE

129	   If the server receiving the INVITE doesn't support the specifics of
130	   the service request (for example, the requested VoiceXML version is
131	   not supported), the server SHOULD generate a 501 response. It MAY
132	   include a Warning header providing details on why the request could
133	   not be serviced.

135	   The server SHOULD authenticate the caller and verify that they are
136	   authorized to access the requested service. It is anticipated that
137	   dialog servers will generally be used in conjunction with an
138	   application server which makes the actual authorization decision
139	   about whether the call is to be processed. As a result, the dialog
140	   server's authorization decision is simple - if it came from an
141	   authorized upstream server, the request is allowed. It is RECOMMENDED
142	   that a persistent TLS connection between the application server and
143	   the dialog server be used to provide the authentication credentials
144	   for this kind of scenario.

146	   The server then validates that the SDP in the INVITE, if present, is
147	   acceptable. It does so based on the procedures of Section 2.3.

149	   If it has gotten this far, the server SHOULD fetch the script
150	   identified by the request-URI before generating a final response to
151	   the request. If the script cannot be fetched, or is invalid, the
152	   server generates a 502 Bad Gateway response, since effectively the
153	   server is a gateway to HTTP. It MAY include a Warning header
154	   providing details on the reason for failure.

156	   Once the script has been fetched, and is valid, and the offered SDP
157	   is deemed acceptable, the server SHOULD generate a 200 OK response.
158	   The generation of the response, and ACK processing, are based on
159	   standard SIP semantics.

161	2.3 SDP Processing

163	   If the INVITE contains SDP with an offer, the dialog server will
164	   generate an answer as per SIP-bis [6]. The offer is deemed
165	   unacceptable if it contains no media lines of type audio, or if the
166	   dialog server supports none of the codecs listed for the audio
167	   streams. Otherwise, it is deemed acceptable.

169	   The answer generated by the dialog server SHOULD refuse all media
170	   streams excepting the first offered audio stream. Choice of codecs
171	   used by the dialog server is at the discretion of the implementor.
172	   However, it is STRONGLY RECOMMENDED that all dialog servers support
173	   G.711 and RFC 2833. If an offered media stream does not indicate
174	   support for RFC 2833 tones, the dialog server SHOULD add that codec
175	   to the answer. As described in RFC2543-bis, this allows the dialog
176	   server to inform the caller that it can receive rfc2833 media, even
177	   if the caller cannot receive it.

179	   The server SHOULD allow sendonly, recvonly, and sendrecv media
180	   streams, as well as streams on hold. The meaning of these for script
181	   interpretation is discussed in Section 4.

183	   If the INVITE from the caller did not contain an SDP, the dialog
184	   server SHOULD generate an offer in the 2xx with a single audio media
185	   line, listing all codecs supported by the dialog server.

187	2.4 Script Variables

189	   In VoiceXML 1.0, the interpreter context provides the script with
190	   several variables that provide information on the call control
191	   interfaces. These variables are set in the following fashion:

193	        session.telephone.ani: This variable is the value of the URL in
194	             the From field of the INVITE that triggered the script.

196	        session.telephone.dnis: This variable is the value of the URL in
197	             the To field of the INVITE that triggered the script.

199	        session.telephone.iidigits: If the Contact header in the INVITE
200	             request uses the SIP caller preferences contact parameters
201	             [7] to provide additional information on the initiating
202	             device, the interpreter context SHOULD map these parameters
203	             to closest II digit if possible.

205	        session.telephone.uui: This variable is set only if the INVITE
206	             request contained an embedded ISUP IAM request [8]. In that
207	             case, the user-to-user information elements from that IAM
208	             are extracted, and mapped to this variable. Support for
209	             this is optional, but RECOMMENDED.

211	3 Document Acquisition

213	   The interpreter context fetches the script using normal HTTP GET and
214	   POST requests [9]. It MUST follow the caching behaviors specified in
215	   VoiceXML 1.0. It MAY support other document acquisition protocols,
216	   such as FTP.

218	4 Audio Input and Output

220	   Audio input and output are provided through RTP. The implementation
221	   platform SHOULD provide DTMF recognition on the incoming media
222	   stream, indpendent of its codec type. This is greatly facilitated
223	   through RFC 2833, which pushes the DTMF detection operation to the
224	   originator. The implementation platform SHOULD provide speech
225	   recognition on the incoming media stream as well.

227	   To be very explicit, this means that the dialog server SHOULD support
228	   recognition of DTMF and speech by processing a single incoming media
229	   stream. Furthermore, this stream can be sent by the caller using one
230	   of at least two codecs - G.711 and RFC 2833, and that the sender of
231	   the media can switch codecs on the fly when it detects DTMF. This
232	   means that RTP packets 1, 2 and 3 might be G.711, followed by RTP
233	   packet 4 which is RFC 2833. Furthermore, despite the fact that the
234	   sender can send RFC2833, the dialog server SHOULD still perform DTMF
235	   detection on the media stream, in case the sender does not support
236	   RFC 2833, or does support it, but misses a digit.

238	        OPEN ISSUE: This is a strong statement; if the probability
239	        of missed DTMF is small, the dialog server shouldn't have
240	        to do detection if it knows the caller has done it.
241	        Problem, though: since SDP has no way to indicate code-
242	        specific directionalities in a sendrecv stream, a UA that
243	        can only send RFC 2833 doesn't say anything about it in the
244	        SDP in the INVITE. As a result, there is no way to know for
245	        sure that the sender can do it until the first RFC 2833
246	        packet shows up. The SDP FID [10] specification resolves
247	        this. Should we make support for the FID spec mandatory for
248	        dialog servers?

250	   Some implementations we are aware of use a separate stream for the
251	   DTMF and for the speech. This approach is NOT RECOMMENDED, since it
252	   makes synchronization of the speech and DTMF difficult.

254	   SDP allows media streams to be unidirectional. If a stream is one-way
255	   from the caller to the dialog server, this means that script
256	   processing SHOULD proceed normally, except that any audio which would
257	   normally be output by the implementation platform is discarded.
258	   Furthermore, if a stream is one-way from the dialog server to the
259	   caller, script processing SHOULD proceed normally, except that the
260	   implementation platform never delivers characters (i.e., DTMF digits)
261	   or utterances to the interpreter. In other words, behavior is
262	   identical to the case where the caller is simply not talking.

264	   Unidirectional streams are very useful for applications which require
265	   a "listener" on an existing media stream to look for a particular
266	   utterance and DTMF digit, and deliver that to an application server
267	   for event processing. Therefore, it is RECOMMENDED that they be
268	   supported in dialog servers as described above.

270	   SIP allows media streams to be placed on hold. This will happen when
271	   the interpreter context receives a re-INVITE with an SDP with a
272	   0.0.0.0 connection line. This is handled identically to the case of a
273	   media stream which is unidirectional from the dialog server to the
274	   caller, meaning that it's "just" disconnected, not an interpreter-
275	   freeze.

277	   SIP allows media streams to be disabled by setting the port to zero.
278	   This has very specific meaning in the case of a dialog server. It has
279	   the effect of requesting a freeze of the interpreter state. When the
280	   interpreter context returns a 200 OK as a response, it indicates that
281	   the interpreter has been frozen. The interpreter is truly frozen; the
282	   behavior should be as if time were literally suspended as far as the
283	   interpreter is concerned. To unfreeze the interpreter state, a re-
284	   INVITE is needed to establish a new audio media stream. This will
285	   cause processing of the script to continue at exactly the same place
286	   it left off, using the media input and output from the new media
287	   stream to drive the interpreter. It is critical that, as far as the
288	   script is concerned, the freeze never even took place.

290	   This capability is essential for supporting feature composition of
291	   voice-based applications. Consider application A, which allows the
292	   user to hear an announcement when a friend comes online. If the user
293	   says yes, a call is placed to that friend. Another application, B,
294	   allows the user to hear stock quotes. We'd like to compose these so
295	   that both can happen simultaneously. For that to happen in a
296	   reasonable fashion, one of these applications has the "focus",
297	   meaning that it is the one processing the input and output from the
298	   user. Consider the case where the stock quote application has the
299	   focus. An the stock quote application runs on dialog server X, and
300	   the presence application on dialog server Y. Application server Z is
301	   the central point for all system events related to all applications.
302	   The flow to consider is show in Figure 1. At the beginning of the
303	   flow, the caller has a call leg to the AS, the the AS has used third
304	   party call control [11] to connect the caller to dialog server X.
305	   This means there is an RTP connection between the caller and this
306	   dialog server, as shown.

308	   An external event (such as a friend coming online), will cause an
309	   application server to decide that the other voice application needs
310	   to receive the focus. However, we don't want to terminate the stock
311	   quote application; we merely wish to suspend it so that the user can
312	   resume it after hearing that the friend came online. So, the
313	   application server sends a re-INVITE (1) to the dialog server running
314	   the stock quote application, and requests it to be frozen. When the
315	   interpreter comples the current prompt block, the context freezes the
316	   interpreter and returns a 200 OK. The AS then connects the user to
317	   the dialog server running the presence application (4-9). Dialog
318	   server Y will fetch the VoiceXML script from the AS (since the AS
319	   knows the identity of the buddy that came online, it needs to be the
320	   one that generates the VoiceXML script), but this is not shown. This
321	   dialog runs, and assuming the user doesn't call the friend, the
322	   script terminates, causing server Y to send a BYE (10). The AS
323	   decides to resume the stock quote application. So, using 3pcc, it
324	   reconnects the caller with server X (12-17). The re-INVITE to server
325	   X (14) has the effect of unfreezing the context, so processing
326	   continues where the call left off.

328	   The result of this is that the user's experience is the following:

330	   network: Please enter the stock to check.
331	   user: Lucent
332	   network: Lucent technologies is at six dollars.
333	   network: Friend alert: Bob is online. Would you like to call him?
334	   user: no
335	   network: Please enter the name of the stock to check.

337	   Note that The issue of when the interpreter can be suspended is being
338	   worked in the W3C.

340	   The key idea with this mechanism is that in NO CASE should the
341	   VoiceXML script for the stock quote application need to know that
342	   this external event (the buddy coming online) has occurred, so that
343	   it can play the buddy announcement. Doing so is counter to the entire
344	   concept of feature interaction; it is an intractable problem if every
345	   application and feature needs to know about each other. In the
346	   approach proposed here, each voice application remains independent.
347	   The application server plays the role of composing them by activating
348	   and deactivating the contexts as needed. This still requires the AS
349	   to know the set of applications that are running, but in this case,
350	   it doesn't need to know anything except the relative precedences of
351	   the various applications and the events which trigger them. Logic for
352	   that can, in principle, be constructed in a generic way, independent
353	   of the specific applications.

355	   This approach isn't perfect for all cases, but its simple enough to
356	   get things started.

358	4.1 Processing Further SIP Messages

360	   The interpreter context processes subsequent SIP messages in the
361	   following fashion.

363	4.2 BYE

365	   If a BYE request is received from the caller, this terminates the
366	   call. The interpreter context SHOULD throw the telephone.disconnect
367	   event to the interpreter.

369	4.3 re-INVITE

371	   If a re-INVITE is received, it has the effect of changing some aspect
372	   of the media input and output. Codec changes, port changes, and IP
373	      Caller              AS (Z)           DS (X)            DS (Y)
374	         |RTP              |                 |                 |
375	         |...................................|                 |
376	         |                 |friend online    |                 |
377	         |                 |<--------        |                 |
378	         |                 |(1) INV disable  |                 |
379	         |                 |---------------->|request freeze   |
380	         |                 |(2) 200 OK       |                 |
381	         |                 |<----------------|frozen           |
382	         |                 |(3) ACK          |                 |
383	         |                 |---------------->|                 |
384	         |                 |(4) INV no SDP   |                 |
385	         |                 |---------------------------------->|
386	         |                 |(5) 200 SDP 1    |                 |
387	         |(6) INV SDP 1    |<----------------------------------|
388	         |<----------------|                 |                 |
389	         |(7) 200 SDP 2    |                 |                 |
390	         |---------------->|(8) ACK SDP 2    |                 |
391	         |(9) ACK          |---------------------------------->|
392	         |<----------------|                 |                 |
393	         |                 |                 |                 |
394	         |   RTP           |                 |                 |
395	         |.....................................................|
396	         |                 |                 |                 |
397	         |                 |(10) BYE         |                 |
398	         |                 |<----------------------------------|
399	         |                 |(11) 200 OK      |                 |
400	         |                 |---------------------------------->|
401	         |(12) INV no SDP  |                 |                 |
402	         |<----------------|                 |                 |
403	         |(13) 200 SDP 3   |                 |                 |
404	         |---------------->|(14) INV SDP 3   |                 |
405	         |                 |---------------->|unfreeze         |
406	         |                 |(15) 200 SDP 4   |                 |
407	         |(16) ACK SDP 4   |<----------------|                 |
408	         |<----------------|(17) ACK         |                 |
409	         |                 |---------------->|                 |
410	         |RTP              |                 |                 |
411	         |.................|.................|                 |
412	         |                 |                 |                 |
413	         |                 |                 |                 |

415	   Figure 1: Voice Application Composition
416	   address changes are handled normally as per bis [6]. Specific
417	   processing is required for changes in stream direction, placing the
418	   call on hold, disabling a media stream, and adding a new audio stream
419	   after a previous re-INVITE disabled it. See Section 4.

421	4.4 INFO, MESSAGE

423	   These messages are ignored by the interpreter context.

425	5 Tag Processing

427	   Certain tags within the VoiceXML script have call control
428	   implications. The following subsections describe how the interpreter
429	   context handles them.

431	5.1 Exit

433	   VoiceXML 1.0 says that the processing of the exit tag is entirely
434	   context specific.

436	   For SIP, the interpreter context SHOULD send a BYE to terminate the
437	   call.

439	   Ideally, the VoiceXML <exit> element would also post the given
440	   namelist to a URI specified in the original call setup. For example,
441	   the URI of an HTTP servlet running directly in the AS or in an
442	   associated web application server would be an appropriate choice.
443	   This would allow voice interactions to be completely independent of
444	   the calling context, and therefore be re-usable across providers and
445	   applications. The VoiceXML specification is silent on exactly what
446	   should happen with the <exit> namelist. For this reason, we do not
447	   specify specific processing at this time.

449	        OPEN ISSUE: Should we specify something? We could provide
450	        an additional URL at script initiation which is used to
451	        post the namelist upon exit.

453	5.2 Disconnect

455	   The interpreter context SHOULD send a BYE to terminate the call. As
456	   per the VoiceXML specification, a telephone.disconnected.hangup event
457	   is also thrown.

459	5.3 Transfer

461	   VoiceXML 1.0 supports two styles of transfer, bridged and blind.

463	5.3.1 Blind

465	   When the interpreter context needs to perform a blind transfer, it
466	   SHOULD generate a REFER [12] request. The REFER request is sent to
467	   the caller. It contains a Refer-To header which contains the target
468	   URL specified in the URI in the value of the "dest" attribute of the
469	   transfer tag. If the transfer tag contains a connecttimeout
470	   attribute, the URI in the Refer-To has an Expires header parameter
471	   appended to it, containing the duration from the attribute.

473	   For example, if the following transfer tag was encountered:

475	   <transfer name="mycall" dest="sip:support@foo.com" bridge="false"
476	             connecttimeout="10s">

478	   The REFER would look like:

480	   REFER sip:caller@pc13.company.com
481	   Via: SIP/2.0/UDP server3.vxmlservers.com
482	   From: sip:dialog.vxml20@vxmlservers.com;tag=8aa6s
483	   CSeq: 3487 REFER
484	   Call-ID: 9a8s9809s@102.3.4.4
485	   To: sip:caller@company.com;tag=99as7
486	   Refer-To: sip:support@foo.com?Expires=10
487	   Referred-By: sip:dialog.vxml20@vxmlservers.com

489	   If the REFER is rejected, the interpreter context outputs a
490	   network_busy as the outcome of the transfer attempt. Otherwise, the
491	   interpreter context remains suspended until a NOTIFY is received.

493	   At some point before the expiration, the interpreter context will
494	   receive a NOTIFY request containing the final response received for
495	   the triggered INVITE. If this response is a 2xx, the interpreter
496	   context throws a telephone.disconnect.transfer, and sends a BYE
497	   request to terminate the call.

499	   If the final response was a non-2xx response, the transfer attempt
500	   failed. If the final response was a 486, the outcome of the transfer
501	   attempt is set to busy, and form processing continues. If the final
502	   response was a 408, the outcome of the transfer attempt is set to
503	   noanswer, and form processing continues. For any other response, the
504	   outcome of the transfer attempt is set to network_busy, and form
505	   processing continues.

507	5.3.2 Bridged

509	   In a bridged transfer, the interpreter context resumes after the
510	   transfer call completes. VoiceXML 1.0 also allows the script to
511	   specify a grammar within the transfer tag, allowing it to listen in
512	   for DTMF that meets that grammar. When a match is found, the transfer
513	   is terminated and control returns to the interpreter.

515	   This function requires that the dialog server act as a UAC, and make
516	   the outbound call to the transferred party. The flow is shown in
517	   Figure 2. The caller connects to the dialog server with messages 1-3.
518	   RTP flows between the caller and the dialog server. When the transfer
519	   tag is encountered, the dialog server sends an outbound INVITE (4)
520	   The outbound INVITE contains the same SDP, SDP 1, offered by the
521	   caller. If the final response (5) is a 200 OK, this contains SDP3.
522	   The dialog server continues to receive media from the caller. This is
523	   passed on to the transfer target, using SDP3. However, media from the
524	   transfer target to the caller goes direct, bypassing the dialog
525	   server.

527	   If the final response to the INVITE was a non-2xx response, the
528	   transfer attempt failed. If the final response was a 486, the outcome
529	   of the transfer attempt is set to busy, and form processing
530	   continues. If the final response was a 408, the outcome of the
531	   transfer attempt is set to noanswer, and form processing continues.
532	   For any other response, the outcome of the transfer attempt is set to
533	   network_busy, and form processing continues.

535	   The INVITE should not be left pending for more than the amount of
536	   time in the connecttimeout parameter, if specified. After that amount
537	   of time has passed, the INVITE request is cancelled, and form
538	   processing continues. The outcome of the transfer is set to noanswer.

540	   If the final response to the INVITE was a 2xx response, the transfer
541	   attempt succeeded. In addition to passing on the media to the
542	   transfer target, the interpreter passes the media received from the
543	   caller through the grammar present within the transfer tag, if
544	   present. If the grammar is matched, the interpreter context sends a
545	   BYE to the transfer target. Processing continues within the
546	   interpreter.

548	   If the transfer target sends a BYE, a 200 OK is returned. The outcome
549	   of the script is set to far_end_disconnect. Form interpretation
550	   continues. If the caller sends a BYE, a 200 OK is returned. The
551	   dialog server sends a BYE to the transfer target. A
552	          |(1) INVITE SDP1      |                    |
553	          |-------------------->|                    |
554	          |(2) 200 SDP2         |                    |
555	          |<--------------------|                    |
556	          |(3) ACK              |                    |
557	          |-------------------->|                    |
558	          |RTP                  |                    |
559	          |<...................>|                    |
560	          |                     |(4) INVITE SDP1     |
561	          |                     |------------------->|
562	          |                     |(5) 200 SDP3        |
563	          |                     |<-------------------|
564	          |                     |(6) ACK             |
565	          |                     |------------------->|
566	          | RTP from caller     |                    |
567	          |....................>| RTP from caller    |
568	          |                     |...................>|
569	          |           RTP to caller                  |
570	          |<.........................................|
571	          |                     |                    |
572	          |                     |                    |

574	        Caller                  DS                 Transfer
575	                                                   target

577	   Figure 2: Bridged Transfer flow

579	   telephone.disconnect.hangup event is thrown, and form processing
580	   continues to allow cleanup.

582	        OPEN ISSUE: When would it even be possible for the transfer
583	        outcome to be near_end_disconnect? Wouldn't this terminate
584	        the script, so that there is no transfer outcome?

586	   If the transfer target sends a REFER (ie., the caller is to be
587	   transferred elsewhere), the interpreter context responds with a 200
588	   OK. It creates a new REFER with the same Refer-To header (but its own
589	   value for Referred-By), and sends it to the caller. Upon receiving a
590	   200 OK to the REFER, the dialog server sends a NOTIFY to the transfer
591	   target, informing it of a successful REFER completion to the new
592	   target. If a BYE is received from the transfer target, the
593	   interpreter sends a BYE to the caller as well, and throws a
594	   telephone.disconnect.transfer event.

596	6 Additional Requirements

598	   In addition to the above behaviors, we also recommend that several
599	   optional SIP capabilities be implemented by dialog servers. This is
600	   to support their intended use cases as components in the application
601	   server component architecture [4]. The following list of requirements
602	   includes these recommended features, in addition to summarizing the
603	   ones scattered above:

605	        1.   The dialog server SHOULD support SIP over persistent TCP
606	             and TLS connections, and SHOULD support a configurable
607	             authorization listing of allowed Distinguished Names which
608	             can connect. This is useful when authorization decisions
609	             are outsourced to an application server, as described
610	             above.

612	        2.   The dialog server SHOULD fully support RFC 1889 and RFC
613	             1890. Of particular importance is RTCP.

615	        3.   The dialog server SHOULD support G.711 and RFC 2833.

617	        4.   The dialog server SHOULD support the UA requirements
618	             outlined in the third party call control specification
619	             [11]. This is important for building more complex
620	             applications, a common usage for dialog servers.

622	        5.   The dialog server SHOULD support the SDP FID attribute
623	             [10], and SHOULD use it to allow processing to occur over a
624	             collection of alternate streams with the same FID group.

626	        6.   The dialog server SHOULD support the REFER method [12],
627	             needed for the blind transfer tag. It SHOULD also allow
628	             itself to be referrred as a normal UAS.

630	        7.   The dialog server SHOULD allow any HTTP URL to be placed in
631	             the request-URI for specifying the script to execute.

633	7 Authors Addresses

635	   Jonathan Rosenberg
636	   dynamicsoft
637	   72 Eagle Rock Avenue
638	   First Floor
639	   East Hanover, NJ 07936
640	   email: jdrosen@dynamicsoft.com

642	   Peter Mataga
643	   dynamicsoft
644	   72 Eagle Rock Avenue
645	   First Floor
646	   East Hanover, NJ 07936
647	   email: pmataga@dynamicsoft.com

649	   David Ladd
650	   dynamicsoft
651	   72 Eagle Rock Avenue
652	   First Floor
653	   East Hanover, NJ 07936
654	   email: dladd@dynamicsoft.com

656	8 Bibliography

658	   [1] VoiceXML Forum, "Voice extensible markup language (VoiceXML)
659	   version 1.00," VoiceXML forum specification, VoiceXML Forum, Mar.
660	   2000.

662	   [2] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
663	   session initiation protocol," Request for Comments 2543, Internet
664	   Engineering Task Force, Mar. 1999.

666	   [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a
667	   transport protocol for real-time applications," Request for Comments
668	   1889, Internet Engineering Task Force, Jan. 1996.

670	   [4] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application
671	   server component architecture for SIP," Internet Draft, Internet
672	   Engineering Task Force, Mar. 2001.  Work in progress.

674	   [5] B. Campbell and R. Sparks, "Control of service context using SIP
675	   Request-URI," Request for Comments 3087, Internet Engineering Task
676	   Force, Apr. 2001.

678	   [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
679	   Session initiation protocol," Internet Draft, Internet Engineering
680	   Task Force, Nov. 2000.  Work in progress.

682	   [7] H. Schulzrinne and J. Rosenberg, "SIP caller preferences and
683	   callee capabilities," Internet Draft, Internet Engineering Task
684	   Force, Nov. 2000.  Work in progress.

686	   [8] E. Zimmerer, J. Peterson, A. Vemuri, L. Ong, F. Audet, M. Watson,
687	   and M.Zonoun, "MIME media types for ISUP and QSIG objects," Internet
688	   Draft, Internet Engineering Task Force, Mar. 2001.  Work in progress.

690	   [9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
691	   Leach, and T. Berners-Lee, "Hypertext transfer protocol -- HTTP/1.1,"
692	   Request for Comments 2616, Internet Engineering Task Force, June
693	   1999.

695	   [10] G. Camarillo, J. Holler, and G. Eriksson, "The SDP fid
696	   attribute," Internet Draft, Internet Engineering Task Force, Apr.
697	   2001.  Work in progress.

699	   [11] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo,
700	   "Third party call control in SIP," Internet Draft, Internet
701	   Engineering Task Force, Mar.  2001.  Work in progress.

703	   [12] R. Sparks, "SIP call control," Internet Draft, Internet
704	   Engineering Task Force, Feb. 2001.  Work in progress.