idnits 2.17.1 draft-burke-vxml-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 22. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1698. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1709. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1716. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1722. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 46 instances of lines with non-RFC2606-compliant FQDNs in the document. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 9, 2007) is 6135 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 646 == Missing Reference: 'VXML' is mentioned on line 1156, but not defined == Unused Reference: 'RFC3986' is defined on line 1557, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3016 (Obsoleted by RFC 6416) ** Obsolete normative reference: RFC 3265 (Obsoleted by RFC 6665) ** Obsolete normative reference: RFC 4244 (Obsoleted by RFC 7044) ** Obsolete normative reference: RFC 4627 (Obsoleted by RFC 7158, RFC 7159) == Outdated reference: A later version (-28) exists of draft-ietf-speechsc-mrcpv2-12 Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Burke 3 Internet-Draft Google 4 Intended status: Informational M. Scott 5 Expires: January 10, 2008 Genesys 6 J. Haynie 7 Hakano Inc 8 R. Auburn 9 Voxeo 10 S. McGlashan 11 Hewlett-Packard 12 July 9, 2007 14 SIP Interface to VoiceXML Media Services 15 draft-burke-vxml-03.txt 17 Status of this Memo 19 By submitting this Internet-Draft, each author represents that any 20 applicable patent or other IPR claims of which he or she is aware 21 have been or will be disclosed, and any of which he or she becomes 22 aware will be disclosed, in accordance with Section 6 of BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 This Internet-Draft will expire on January 10, 2008. 42 Copyright Notice 44 Copyright (C) The IETF Trust (2007). 46 Abstract 48 This document describes a SIP interface to VoiceXML media services, 49 which is commonly employed between application servers and media 50 servers offering VoiceXML processing capabilities. 52 Comments 54 Comments are solicited and should be addressed to the authors. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 1.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 60 1.1.1. IVR Services with Application Servers . . . . . . . . 5 61 1.1.2. PSTN IVR Service Node . . . . . . . . . . . . . . . . 6 62 1.1.3. 3GPP IMS Media Resource Function (MRF) . . . . . . . . 7 63 1.1.4. CCXML <-> VoiceXML Interaction . . . . . . . . . . . . 8 64 1.1.5. Other Use Cases . . . . . . . . . . . . . . . . . . . 8 65 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 8 66 2. VoiceXML Session Establishment and Termination . . . . . . . . 10 67 2.1. Service Identification . . . . . . . . . . . . . . . . . . 10 68 2.2. Initiating a VoiceXML Session . . . . . . . . . . . . . . 12 69 2.3. Preparing a VoiceXML Session . . . . . . . . . . . . . . . 14 70 2.4. Session Variable Mappings . . . . . . . . . . . . . . . . 14 71 2.5. Terminating a VoiceXML Session . . . . . . . . . . . . . . 17 72 2.6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 18 73 2.6.1. Basic Session Establishment . . . . . . . . . . . . . 18 74 2.6.2. VoiceXML Session Preparation . . . . . . . . . . . . . 18 75 2.6.3. MRCP Establishment . . . . . . . . . . . . . . . . . . 19 76 3. Media Support . . . . . . . . . . . . . . . . . . . . . . . . 22 77 3.1. Offer/Answer . . . . . . . . . . . . . . . . . . . . . . . 22 78 3.2. Early Media . . . . . . . . . . . . . . . . . . . . . . . 22 79 3.3. Modifying the Media Session . . . . . . . . . . . . . . . 24 80 3.4. Audio and Video Codecs . . . . . . . . . . . . . . . . . . 24 81 3.5. DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 82 4. Returning Data to the Application Server . . . . . . . . . . . 26 83 4.1. HTTP Mechanism . . . . . . . . . . . . . . . . . . . . . . 26 84 4.2. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 26 85 5. Outbound Calling . . . . . . . . . . . . . . . . . . . . . . . 29 86 5.1. Third Party Call Control Mechanism . . . . . . . . . . . . 29 87 5.2. REFER Mechanism . . . . . . . . . . . . . . . . . . . . . 29 88 6. Call Transfer . . . . . . . . . . . . . . . . . . . . . . . . 31 89 6.1. Blind . . . . . . . . . . . . . . . . . . . . . . . . . . 31 90 6.2. Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . 33 91 6.3. Consultation . . . . . . . . . . . . . . . . . . . . . . . 34 92 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 37 93 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 94 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 95 10. Changes since last version: . . . . . . . . . . . . . . . . . 40 96 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41 97 11.1. Normative References . . . . . . . . . . . . . . . . . . . 41 98 11.2. Informative References . . . . . . . . . . . . . . . . . . 43 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 45 100 Intellectual Property and Copyright Statements . . . . . . . . . . 46 102 1. Introduction 104 VoiceXML [VXML20], [VXML21] is a World Wide Web Consortium (W3C) 105 standard for creating audio and video dialogs that feature 106 synthesized speech, digitized audio, recognition of spoken and DTMF 107 key input, recording of audio and video, telephony, and mixed 108 initiative conversations. VoiceXML allows Web-based development and 109 content delivery paradigms to be used with interactive video and 110 voice response applications. 112 This document describes a SIP [RFC3261] interface to VoiceXML media 113 services, which is commonly employed between Application Servers and 114 media servers offering VoiceXML processing capabilities. SIP is 115 responsible for initiating a media session to the VoiceXML media 116 server and simultaneously triggering the execution of a specified 117 VoiceXML application. 119 The interface described here owes its genesis to the draft [SIPVXML] 120 and leverages a mechanism for identifying dialog media services 121 described in [RFC4240]. The interface has been updated and extended 122 to support the W3C Recommendation for VoiceXML 2.0 [VXML20] and 123 VoiceXML 2.1 [VXML21]. A set of commonly implemented functions and 124 extensions have been specified including VoiceXML dialog preparation, 125 outbound calling, video media support, and transfers. VoiceXML 126 session variable mappings have been defined for SIP with an 127 extensible mechanism for passing application-specific values into the 128 VoiceXML application. Mechanisms for returning data to the 129 Application Server have also been added. 131 1.1. Use Cases 133 The VoiceXML media service user in this document is generically 134 referred to as an Application Server. In practice, it is intended 135 that the interface defined by this document is applicable across a 136 wide range of use cases. Several intended use cases are described 137 below. 139 1.1.1. IVR Services with Application Servers 141 SIP Application Servers provide services to users of the network. 142 Typically, there may be several Application Servers in the same 143 network, each specialised in providing a particular service. 144 Throughout this specification and without loss of generality, we 145 posit the presence of an Application Server specialised in providing 146 IVR services. A typical configuration for this use case is 147 illustrated below. 149 +--------------+ 150 | | 151 | Application |\ 152 | Server | \ 153 | | \ HTTP 154 SIP +--------------+ \ 155 / \ \ 156 +-------------+ / SIP \ +--------------+ 157 | |/ \| | 158 | SIP | | VoiceXML | 159 | User Agent | RTP/SRTP | Media Server | 160 | |=====================| | 161 +-------------+ +--------------+ 163 Assuming the Application Server also supports HTTP, the VoiceXML 164 application may be hosted on it and served up via HTTP [RFC2616]. 165 Note, however, that the Web model allows the VoiceXML application to 166 be hosted on a separate (HTTP) Application Server from the (SIP) 167 Application Server that interacts with the VoiceXML Media Server via 168 this specification. It is also possible for a static VoiceXML 169 application to be stored locally on the VoiceXML Media Server, 170 leveraging the VoiceXML 2.1 [VXML21] mechanism to interact 171 with a Web/Application Server when dynamic behavior is required. The 172 viability of static VoiceXML applications is further enhanced by the 173 mechanisms defined in section 2.4, through which the Application 174 Server can make session-specific information available within the 175 VoiceXML session context. 177 The approach described in this document is sometimes termed the 178 "delegation model" - the Application Server is essentially delegating 179 programmatic control of the human-machine interactions to one or more 180 VoiceXML documents running on the VoiceXML Media Server. During the 181 human-machine interactions, the Application Server remains in the 182 signaling path and can respond to results returned from the VoiceXML 183 Media Server or other external network events. 185 1.1.2. PSTN IVR Service Node 187 While this document is intended to enable enhanced use of VoiceXML as 188 a component of larger systems and services, it is intended that 189 devices that are completely unaware of this specification but that 190 support [RFC4240] remain capable of invoking VoiceXML services 191 offered by a VoiceXML Media Server compliant with this document. A 192 typical configuration for this use case is as follows: 194 +-------------+ SIP +--------------+ 195 | |---------------------| | 196 | IP/PSTN | | VoiceXML | 197 | Gateway | RTP/SRTP | Media Server | 198 | |=====================| | 199 +-------------+ +--------------+ 201 Note also that beyond the invocation and termination of a VoiceXML 202 dialog, the semantics defined for call transfers using REFER are 203 intended to be compatible with standard, existing IP/PSTN gateways. 205 1.1.3. 3GPP IMS Media Resource Function (MRF) 207 The 3GPP IP Multimedia Subsystem (IMS) [TS23002] defines a Media 208 Resource Function (MRF) used to offer media processing services such 209 as conferencing, transcoding, and prompt/collect. The capabilities 210 offered by VoiceXML are ideal for offering richer media processing 211 services in the context of the MRF. In this architecture, the 212 interface defined here corresponds to the "Mr" interface to the MRFC; 213 the implementation of this interface might use separated MRFC and 214 MRFP elements (as per the IMS architecture), or might be an 215 integrated MRF (as is common practice). 217 +----------+ 218 | App | 219 | Server | 220 +----------+ 221 | 222 | SIP (ISC) 223 | 224 +----------+ SIP (Mr) +--------------+ 225 | S-CSCF |---------------| VoiceXML | 226 | | | MRF | 227 +----------+ +--------------+ 228 || 229 || RTP/SRTP (Mb) 230 || 232 The above diagram is highly simplified and shows a subset of nodes 233 typically involved in MRF interactions. It should be noted that 234 while the MRF will primarily be used by the Application Server via 235 the S-CSCF, it is also possible for calls to be routed directly to 236 the MRF without the involvement of an Application Server. 238 Although the above is described in terms of the 3GPP IMS 239 architecture, it is intended that it is also applicable to 3GPP2, 240 NGN, and PacketCable architectures that are converging with 3GPP IMS 241 standards. 243 1.1.4. CCXML <-> VoiceXML Interaction 245 CCXML 1.0 [CCXML10] applications provide services mainly through 246 controlling the interaction between Connections, Conferences, and 247 Dialogs. Although CCXML is capable of supporting arbitrary dialog 248 environments, VoiceXML is commonly used as a dialog environment in 249 conjunction with CCXML applications; CCXML is specifically designed 250 to effectively support the use of VoiceXML. CCXML 1.0 defines 251 language elements that allow for Dialogs to be prepared, started, and 252 terminated; it further allows for data to be returned by the dialog 253 environment, for call transfers to be requested (by the dialog) and 254 responded to by the CCXML application, and for arbitrary eventing 255 between the CCXML application and running dialog application. 257 The interface described in this document can be used by CCXML 1.0 258 implementations to control VoiceXML Media Servers. Note, however, 259 that some CCXML language features require eventing facilities between 260 CCXML and VoiceXML sessions that go beyond what is defined in this 261 specification. For example, VoiceXML-controlled call transfers and 262 mid-dialog application-defined events cannot be fully realized using 263 this specification alone. A SIP event package [RFC3265] MAY be used 264 in addition to this specification to provide extended eventing. 266 1.1.5. Other Use Cases 268 In addition to the use cases described in some detail above, there 269 are a number of other intended use cases that are not described in 270 detail, such as: 272 1. Use of a VoiceXML Media Server as an adjunct to an IP-based PBX/ 273 ACD, possibly to provide voicemail/messaging, automated 274 attendant, or other capabilities. 276 2. Invocation and control of a VoiceXML session that provides the 277 voice modality component in a multimodal system. 279 1.2. Terminology 281 Application Server: A SIP Application Server hosts and executes 282 services, in particular by terminating SIP sessions on a media 283 server. The Application Server MAY also act as an HTTP server 284 [RFC2616] in interactions with media servers. 286 VoiceXML Media Server: A VoiceXML interpreter including a SIP-based 287 interpreter context and the requisite media processing 288 capabilities to support VoiceXML functionality. 290 VoiceXML Session: A VoiceXML Session is a multimedia session 291 comprising of at least a SIP user agent, a VoiceXML Media Server, 292 the data streams between them, and an executing VoiceXML 293 application. 295 VoiceXML Dialog: Equivalent to VoiceXML Session. 297 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 298 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 299 document are to be interpreted as described in [RFC2119]. 301 2. VoiceXML Session Establishment and Termination 303 This section describes how to establish a VoiceXML Session, with or 304 without preparation, and how to terminate a session. This section 305 also addresses how session information is made available to VoiceXML 306 applications. 308 2.1. Service Identification 310 The SIP Request-URI is used to identify the VoiceXML media service as 311 defined in [RFC4240]. The user part of the SIP Request-URI is fixed 312 to "dialog". The initial VoiceXML document is specified with the 313 "voicexml" parameter. In addition, parameters are defined that 314 control how the VoiceXML Media Server fetches the specified VoiceXML 315 document. The list of parameters defined by this specification is as 316 follows: 318 voicexml: URI of the initial VoiceXML document to fetch. This will 319 typically contain an HTTP URI, but may use other URI schemes, for 320 example to refer to local, static VoiceXML documents. If the 321 "voicexml" parameter is omitted, the VoiceXML Media Server may 322 select the initial VoiceXML document by other means, such as by 323 applying a default, or may reject the request. 325 maxage: Used to set the max-age value of the Cache-Control header in 326 conjunction with VoiceXML documents fetched using HTTP, as per 327 [RFC2616]. If omitted, the VoiceXML Media Server will use a 328 default value. 330 maxstale: Used to set the max-stale value of the Cache-Control 331 header in conjunction with VoiceXML documents fetched using HTTP, 332 as per [RFC2616]. If omitted, the VoiceXML Media Server will use 333 a default value. 335 method: Used to set the HTTP method applied in the fetch of the 336 initial VoiceXML document. Allowed values are "get" or "post" 337 (case-insensitive). Default is "get". 339 postbody: Used to set the application/x-www-form-urlencoded encoded 340 [HTML4] HTTP body for "post" requests (or is otherwise ignored). 341 The postbody value is the prepared application/ 342 x-www-form-urlencoded content, subsequently URL-encoded (see note 343 below). 345 Other application-specific parameters may be added to the Request-URI 346 and are exposed in VoiceXML session variables (see section 2.4). 348 The BNF for the Request-URI is given below: 350 DIALOG-URL = sip-ind dialog-ind "@" hostport 351 dialog-parameters 353 sip-ind = "sip:" / "sips:" 354 dialog-ind = "dialog" 356 dialog-parameters = [ init-parameters ] 357 [ vxml-parameters ] 358 [ uri-parameters ] 360 init-parameters = init-param [ init-parameters ] 362 init-param = ";" (dialog-param / 363 maxage-param / 364 maxstale-param / 365 method-param / 366 postbody-param) 368 dialog-param = "voicexml=" vxml-url ; vxml-url follows the URI 369 ; syntax defined in RFC3986 370 maxage-param = "maxage=" 1*DIGIT 372 maxstale-param = "maxstale=" 1*DIGIT 374 method-param = "method=" ("get" / "post") 376 postbody-param = "postbody=" token 378 vxml-parameters = vxml-param [ vxml-parameters ] 380 vxml-param = ";" vxml-keyword "=" vxml-value 382 vxml-keyword = token 384 vxml-value = false / 385 null / 386 true / 387 object / 388 array / 389 number / 390 string ; see RFC4627 392 Parameters of the Request-URI in subsequent re-INVITEs are ignored. 393 One consequence of this is that the VoiceXML Media Server cannot be 394 instructed by the Application Server to change the executing VoiceXML 395 Application after a VoiceXML Session has been started. 397 Incorrectly formed requests MUST be rejected with the appropriate 4xx 398 class response. If one of the init-parameters is repeated, then the 399 request MUST be rejected with a 400 Bad Request response. 401 Note: Special characters in Request-URI parameter values need to be 402 URL-encoded as required by the SIP URI syntax, for example '?' (%3f), 403 '=' (%3d), and ';' (%3b). The VoiceXML Media Server MUST therefore 404 unescape Request-URI parameter values before making use of them or 405 exposing them to running VoiceXML applications. It is important that 406 the VoiceXML Media Server only unescape the parameter values once 407 since the desired VoiceXML URI value could itself be URL encoded, for 408 example. When a postbody is included, its entire content including 409 any line breaks (represented by a CR LF pair) is encoded as a single 410 parameter value following the above rules (such that the line breaks 411 would be replaced by '%0D%0A', for example). 413 As an example, the following SIP Request-URI identifies the use of 414 VoiceXML media services, with 415 'http://appserver.example.com/promptcollect.vxml' as the initial 416 VoiceXML document, to be fetched with max-age/max-stale values of 417 3600s/0s respectively: 419 sip:dialog@mediaserver.example.com; \ 420 voicexml=http://appserver.example.com/promptcollect.vxml; \ 421 maxage=3600;maxstale=0 423 2.2. Initiating a VoiceXML Session 425 A VoiceXML Session is initiated via the Application Server using a 426 SIP INVITE or REFER (see section 5.2). Typically, the Application 427 Server will be specialized in providing VoiceXML services. At a 428 minimum, the Application Server may behave as a simple proxy by 429 rewriting the Request-URI received from the User Agent to a Request- 430 URI suitable for consumption by the VoiceXML Media Server (as 431 specified in section 2.1). For example, a User Agent might present a 432 dialed number: 434 tel:+1-201-555-0123 436 which the Application Server maps to a directory assistance 437 application on the VoiceXML Media Server with a Request-URI of: 439 sip:dialog@ms1.example.com; \ 440 voicexml=http://as1.example.com/da.vxml 442 The Application Server SHOULD insert its own URI in the Record-Route 443 header so that it remains in the signaling path for subsequent 444 signaling related to the session. This is of particular importance 445 for call transfers so that upstream Application Servers or proxy 446 servers see signaling originating from the Application Server and not 447 the VoiceXML Media Server itself. Certain header values in the 448 INVITE message to the VoiceXML Media Server are mapped into VoiceXML 449 session variables and are specified in section 2.4. 451 On receipt of the INVITE, the VoiceXML Media Server issues a 452 provisional response, 100 Trying, and commences the fetch of the 453 initial VoiceXML document. The 200 OK response indicates that the 454 VoiceXML document has been fetched and parsed correctly and is ready 455 for execution. Application execution commences on receipt of the ACK 456 (except if the dialog is being prepared as specified in section 2.3). 457 Note that the 100 Trying response will usually be sent on receipt of 458 the INVITE in accordance with [RFC3261], since the VoiceXML Media 459 Server cannot in general guarantee that the initial fetch will 460 complete in less than 200 ms. However, certain implementations may 461 be able to guarantee response times to the initial INVITE, and thus 462 may not need to send a 100 Trying response. 464 As an optimization, prior to sending the 200 OK response, the 465 VoiceXML Media Server MAY execute the application up to the point of 466 the first VoiceXML waiting state or prompt flush. 468 A VoiceXML Media Server, like any SIP User Agent, may be unable to 469 accept the INVITE request for a variety of reasons. For instance, an 470 SDP offer contained in the INVITE might require the use of codecs 471 that are not supported by the Media Server. In such cases, the Media 472 Server should respond as defined by [RFC3261]. However, there are 473 error conditions specific to VoiceXML, as follows: 475 1. If the Request-URI does not conform to this specification, a 400 476 Bad Request MUST be returned (unless it is used to select other 477 services not defined by this specification). 479 2. If the Request-URI does not include a "voicexml" parameter, and 480 the VoiceXML Media Server does not elect to use a default page, 481 the VoiceXML Media Server MUST return a final response of 400 Bad 482 Request, and SHOULD include a Warning header with a 3-digit code 483 of 399 and a human readable error message. 485 3. If the VoiceXML document cannot be fetched or parsed, the 486 VoiceXML Media Server MUST return a final response of 500 Server 487 Internal Error and SHOULD include a Warning header with a 3-digit 488 code of 399 and a human readable error message. 490 Informational note: Certain applications may pass a significant 491 amount of data to the VoiceXML dialog in the form of Request-URI 492 parameters. This may cause the total size of the INVITE request to 493 exceed the MTU of the underlying network. In such cases, 494 applications/implementations must take care either to use a transport 495 appropriate to these larger messages (such as TCP), or to use 496 alternative means of passing the required information to the VoiceXML 497 dialog (such as supplying a unique session identifier in the initial 498 VoiceXML URI and later using that identifier as a key to retrieve 499 data from the HTTP server). This note also applies if the dialog is 500 started using a REFER request as described in section 5.2. 502 2.3. Preparing a VoiceXML Session 504 In certain scenarios, it is beneficial to prepare a VoiceXML Session 505 for execution prior to running it. A previously prepared VoiceXML 506 Session is expected to execute with minimal delay when instructed to 507 do so. 509 If a media-less SIP dialog is established with the initial INVITE to 510 the VoiceXML Media Server, the VoiceXML Application will not execute 511 after receipt of the ACK. To run the VoiceXML Application, the AS 512 must issue a re-INVITE to establish a media session. 514 A media-less SIP dialog can be established by sending SDP containing 515 no media lines in the initial INVITE. Alternatively, if no SDP is 516 sent in the initial INVITE, the VoiceXML Media Server will include an 517 offer in the 200 OK message, which can be responded to with an answer 518 in the ACK with the media port(s) set to 0. 520 Once a VoiceXML Application is running, a re-INVITE which disables 521 the media streams (i.e. sets the ports to 0) will not otherwise 522 affect the executing application (except that recognition actions 523 initiated while the media streams are disabled will result in noinput 524 timeouts). 526 2.4. Session Variable Mappings 528 The standard VoiceXML session variables are assigned values according 529 to: 531 session.connection.local.uri: Evaluates to the SIP URI specified in 532 the To: header of the initial INVITE (or REFER). 534 session.connection.remote.uri: Evaluates to the SIP URI specified in 535 the From: header of the initial INVITE (or REFER). 537 session.connection.redirect: This array is populated by information 538 contained in the History-Info [RFC4244] header in the initial 539 INVITE or is otherwise undefined. Each entry (hi-entry) in the 540 History-Info header is mapped, in reverse order, into an element 541 of the session.connection.redirect array. Properties of each 542 element of the array are determined as follows: 544 * uri - Set to the hi-targeted-to-uri value of the History-Info 545 entry 547 * pi - Set to 'true' if hi-targeted-to-uri contains a 548 'Privacy=history' parameter, or if the INVITE Privacy header 549 includes 'history'; 'false' otherwise 551 * si - Set to the value of the 'si' parameter if it exists, 552 undefined otherwise 554 * reason - Set verbatim to the value of the 'Reason' parameter of 555 hi-targeted-to-uri 557 session.connection.protocol.name: Evaluates to "sip". Note that 558 this is intended to reflect the use of SIP in general, and does 559 not distinguish between whether the media server was accessed via 560 SIP or SIPS procedures. 562 session.connection.protocol.version: Evaluates to "2.0". 564 session.connection.protocol.sip.headers: This is an associative 565 array where each key in the array is the non-compact name of a SIP 566 header in the initial INVITE converted to lower-case (note the 567 case conversion does not apply to the header value). If multiple 568 header fields of the same field name are present, the values are 569 combined into a single comma-separated value. Implementations 570 MUST at a minimum include the Call-ID header and MAY include other 571 headers. For example, 572 session.connection.protocol.sip.headers["call-id"] evaluates to 573 the Call-ID of the SIP dialog. 575 session.connection.protocol.sip.requesturi: This is an associative 576 array where the array keys and values are formed from the URI 577 parameters on the SIP Request-URI of the initial INVITE (or 578 REFER). The array key is the URI parameter name. The 579 corresponding array value is derived from the URI parameter value 580 according to the following rules: 582 * If the URI parameter name is an init-param or dialog-param, the 583 corresponding array value is obtained by evaluating the URI 584 parameter value as a string. 586 * If the URI parameter name is a vxml-param, the corresponding 587 array value is obtained by evaluating the URI parameter value 588 as a "JSON value" [RFC4627]. 590 * If the URI parameter name is present but its value is omitted, 591 the value is an empty string. 593 In addition, the array's toString() function returns the full SIP 594 Request-URI. For example, assuming a Request-URI of sip:dialog@ 595 example.com;voicexml=http://ajax.com;obj={"x":1,"y":true} then 596 session.connection.protocol.sip.requesturi["voicexml"] evaluates 597 to "http://ajax.com", 598 session.connection.protocol.sip.requesturi["obj"].x evaluates to 1 599 (type Number), session.connection.protocol.sip.requesturi["obj"].y 600 evaluates to true (type Boolean), and 601 session.connection.protocol.sip.requesturi evaluates to the 602 complete Request-URI. 604 session.connection.aai: Evaluates to 605 session.connection.protocol.sip.requesturi["aai"] 607 session.connection.ccxml: Evaluates to 608 session.connection.protocol.sip.requesturi["ccxml"] 610 session.connection.protocol.sip.media: This is an array where each 611 array element is an object with the following properties: 613 * type: - This required property indicates the type of the media 614 associated with the stream. The value is a string. It is 615 strongly recommended that the following values are used for 616 common types of media: "audio" for audio media, and "video" for 617 video media. 619 * direction: - This required property indicates the 620 directionality of the media relative to 621 session.connection.originator. Defined values are sendrecv, 622 sendonly, recvonly, and inactive. 624 * format: - This property is optional. If defined, the value of 625 the property is an array. Each array element is an object 626 which specifies information about one format of the media 627 (there is an array element for each payload type on the 628 m-line). The object contains at least one property called name 629 whose value is the MIME subtype of the media format (MIME 630 subtypes are registered in [RFC4855]). Other properties may be 631 defined with string values; these correspond to required and, 632 if defined, optional parameters of the format. 634 As a consequence of this definition, there is an array entry in 635 session.connection.protocol.sip.media for each non-disabled m-line 636 for the negotiated media session. Note that this session variable 637 is updated if the media session characteristics for the VoiceXML 638 Session change (i.e. due to a re-INVITE). For an example, 639 consider a connection with bi-directional G.711 mu-law audio 640 sampled at 8kHz. In this case, 641 session.connection.protocol.sip.media[0].type evaluates to 642 "audio", session.connection.protocol.sip.media[0].direction to 643 "sendrecv", and 644 session.connection.protocol.sip.media[0].format[0].name evaluates 645 to "audio/PCMU" and 646 session.connection.protocol.sip.media[0].format[0].rate evaluates 647 to "8000". 649 Note that when accessing SIP headers and Request-URI parameters via 650 the session.connection.protocol.sip.headers and 651 session.connection.protocol.sip.requesturi associative arrays defined 652 above, applications can choose between two semantically equivalent 653 ways of referring to the array. For example, either of the following 654 can be used to access a Request-URI parameter named 'foo': 656 session.connection.protocol.sip.requesturi["foo"] 657 session.connection.protocol.sip.requesturi.foo 659 However, it is important to note that not all SIP header names or 660 Request-URI parameter names are valid ECMAScript identifiers, and as 661 such, can only be accessed using the first form (array notation). 662 For example, the Call-ID header can only be accessed as 663 session.connection.protocol.sip.headers["call-id"]; attempting to 664 access the same value as 665 session.connection.protocol.sip.headers.call-id would result in an 666 error. 668 2.5. Terminating a VoiceXML Session 670 The Application Server can terminate a VoiceXML Session by issuing a 671 BYE to the VoiceXML Media Server. Upon receipt of a BYE in the 672 context of an existing VoiceXML Session, the VoiceXML Media Server 673 MUST send a 200 OK response, and MUST throw a 674 'connection.disconnect.hangup' event to the VoiceXML application. If 675 the Reason header [RFC3326] is present on the BYE Request, then the 676 value of the Reason header is provided verbatim via the '_message' 677 variable within the catch element's anonymous variable scope. 679 The VoiceXML Media Server may also initiate termination of the 680 session by issuing a BYE request. This will typically occur as a 681 result of encoutering a or in the VoiceXML 682 application, due to the VoiceXML application running to completion, 683 or due to unhandled errors within the VoiceXML application. 685 See Section 4 for mechanisms to return data to the Application 686 Server. 688 2.6. Examples 690 2.6.1. Basic Session Establishment 692 This example illustrates an Application Server setting up a VoiceXML 693 Session on behalf of a User Agent. 695 SIP VoiceXML HTTP 696 User Application Media Application 697 Agent Server Server Server 698 | | | | 699 |(1) INVITE [offer] | | | 700 |------------------->|(2) INVITE [offer] | | 701 |(3) 100 Trying |------------------->| | 702 |<-------------------|(4) 100 Trying | | 703 | |<-------------------| | 704 | | | | 705 | | |(5) GET | 706 | | |------------------->| 707 | | |(6) 200 OK [VXML] | 708 | | |<-------------------| 709 | | | | 710 | |(7) 200 OK [answer] | | 711 |(8) 200 OK [answer] |<-------------------| | 712 |<-------------------| | | 713 |(9) ACK | | | 714 |------------------->|(10) ACK | | 715 | |------------------->| (execute | 716 |(11) RTP/SRTP | | VoiceXML | 717 |.........................................| application) | 718 | | | | 720 2.6.2. VoiceXML Session Preparation 722 This example demonstrates the preparation of a VoiceXML Session. In 723 this example, the VoiceXML session is prepared prior to placing an 724 outbound call to a User Agent, and is started as soon as the User 725 Agent answers. 727 The [answer1:0] notation is used to indicate an SDP answer with the 728 media ports set to 0. 730 SIP VoiceXML HTTP 731 User Application Media Application 732 Agent Server Server Server 733 | | | | 734 | |(1) INVITE | | 735 | |-------------------->| | 736 | |(2) 100 Trying | | 737 | |<--------------------| | 738 | | | | 739 | | |(3) GET | 740 | | |------------------->| 741 | | |(4) 200 OK [VXML] | 742 | | |<-------------------| 743 | | | | 744 | |(5) 200 OK [offer1] | | 745 | |<--------------------| | 746 | |(6) ACK [answer1:0] | | 747 |(7) INVITE |-------------------->| | 748 |<-------------------| | | 749 |(8) 200 OK [offer2] | | | 750 |------------------->|(9) INVITE [offer2] | | 751 | |-------------------->| | 752 | |(10) 100 Trying | | 753 | |<--------------------| | 754 | |(11) 200 OK [answer2]| | 755 |(12) ACK [answer2] |<--------------------| | 756 |<-------------------|(13) ACK | | 757 | |-------------------->| (execute | 758 |(14) RTP/SRTP | VoiceXML | 759 |..........................................| application) | 760 | | | | 762 2.6.3. MRCP Establishment 764 MRCP [MRCPv2] is a protocol that enables clients such as a VoiceXML 765 Media Server to control media service resources such as speech 766 synthesizers, recognizers, verifiers and identifiers residing in 767 servers on the network. 769 The example below illustrates how a VoiceXML Media Server may 770 establish an MRCP session in response to an initial INVITE. 772 VoiceXML HTTP 773 User Media MRCPv2 Application 774 Agent Server Server Server 775 | | | | 776 |(1) INVITE [offer1] | | | 777 |------------------->| | | 778 |(2) 100 Trying | | | 779 |<-------------------|(3) GET | | 780 | |---------------------------------------->| 781 | | | | 782 | |(4) 200 OK [VXML] | | 783 | |<----------------------------------------| 784 | | | | 785 | |(5) INVITE [offer2] | | 786 | |--------------------->| | 787 | | | | 788 | |(6) 200 OK [answer2] | | 789 | |<---------------------| | 790 | | | | 791 | |(7) ACK | | 792 | |--------------------->| | 793 | | | | 794 | |(8) MRCP connection | | 795 | |<-------------------->| | 796 |(9) 200 OK [answer1]| | | 797 |<-------------------| | | 798 | | | | 799 |(10) ACK | | | 800 |------------------->| | | 801 | | | | 802 |(11) RTP/SRTP | | | 803 .............................................| | 804 | | | | 806 In this example, the VoiceXML Media Server is responsible for 807 establishing a session with the MRCPv2 Media Resource Server prior to 808 sending the 200 OK response to the initial INVITE. The VoiceXML 809 Media Server will perform the appropriate offer/answer with the 810 MRCPv2 Media Resource Server based on the SDP capabilities of the 811 Application Server and the MRCPv2 Media Resource Server. The 812 VoiceXML Media Server will change the offer received from step 1 to 813 establish a MRCPv2 session in step (5) and will re-write the SDP to 814 include an m-line for each MRCPv2 resource to be used and other 815 required SDP modifications as specified by MRCPv2. Once the VoiceXML 816 Media Server performs the offer/answer with the MRCPv2 Media Resource 817 Server, it will establish a MRCPv2 control channel in step (8). The 818 MRCPv2 resource is deallocated when the VoiceXML Media Server 819 receives or sends a BYE (not shown). 821 3. Media Support 823 This section describes the mandatory and optional media support 824 required by this interface. 826 3.1. Offer/Answer 828 The VoiceXML Media Server MUST support the standard offer/answer 829 mechanism of [RFC3264]. In particular, if an SDP offer is not 830 present in the INVITE, the VoiceXML Media Server will make an offer 831 in the 200 OK response listing its supported codecs. 833 3.2. Early Media 835 The VoiceXML Media Server MAY support early establishment of media 836 streams by sending a 183 Session Progress provisional response to the 837 initial INVITE. This allows the Application Server to establish 838 media streams between a user agent and the VoiceXML Media Server 839 while the initial VoiceXML document is being processed. This is 840 useful primarily for minimizing the delay in starting a VoiceXML 841 Session, since media stream establishment and initial VoiceXML 842 document processing can occur in parallel. This can be particularly 843 important in cases where the session with the user agent has already 844 been established, since the user agent is already "connected". The 845 following flow demonstrates the use of early media: 847 SIP VoiceXML HTTP 848 User Application Media Application 849 Agent Server Server Server 850 | | | | 851 |..(existing session)..| | | 852 | |(1) INVITE | | 853 | |------------------>| | 854 | |(2) 183 [offer] | | 855 |(3) re-INVITE [offer] |<------------------| | 856 |<---------------------| | | 857 |(4) 200 OK [answer] | | | 858 |--------------------->| | | 859 |(5) ACK | | | 860 |<---------------------| | | 861 | | (6) PRACK [answer]| | 862 | |------------------>| | 863 | | (7) PRACK 200 OK | | 864 | |<------------------| | 865 |(8) RTP/SRTP | | | 866 |..........................................| | 867 | | |(9) HTTP GET | 868 | | |------------------>| 869 | | |(10) 200 OK [VXML] | 870 | | |<------------------| 871 | | | | 872 | |(11) 200 OK | | 873 | |<------------------| | 874 | |(12) ACK | | 875 | |------------------>| (execute | 876 | | | VoiceXML | 877 | | | application) | 878 | | | | 880 In the figure shown above, although step 9 (HTTP GET) is shown 881 occuring after the early media offer/answer exchange (starting in 882 step 2), the intent is that the fetching of the VoiceXML document 883 happens concurrently with the negotiation of early media. 885 Note that the offer of early media by a VoiceXML Media Server does 886 not imply that the referenced VoiceXML application can always be 887 fetched and executed successfully. For instance, if the HTTP 888 Application Server were to return a 4xx response in step 10 above, or 889 if the provided VoiceXML content was not valid, the VoiceML Media 890 Server would still return a 500 response (as per section 2.2). At 891 this point, it would be the responsibility of the application server 892 to tear down any media streams established with the media server. 894 The use of early media is substantially complicated if the SDP 895 supplied in the 183 Session Progress differs from that supplied in 896 the 200 OK. Therefore, if a VoiceXML Media Server generates a 183 897 Session Progress provisional response containing SDP, it MUST return 898 identical SDP when generating the 200 OK final response (i.e. the 899 "gateway model" in [RFC3960]). 901 Early media is not optimal in all circumstances; for instance, when 902 handling an incoming call, a 183 Session Progress propagated by the 903 Application Server to the user agent will typically stop the 904 "ringback tone" a user would otherwise hear. Furthermore, a 183 905 Session Progress provisional response does not guarantee that the 906 VoiceXML application will be executed successfully - the subsequent 907 fetching of the VoiceXML document could fail. 909 Finally, the example above assumed the User Agent supported re- 910 INVITE. If it didn't (i.e. returned a 488 Not Acceptable Here), the 911 Application Server would have issued a CANCEL to the VoiceXML Media 912 Server. 914 3.3. Modifying the Media Session 916 The VoiceXML Media Server MUST allow the media session to be modified 917 via a re-INVITE and SHOULD support the UPDATE method [RFC3311] for 918 the same purpose. In particular, it MUST be possible to change 919 streams between sendrecv, sendonly, and recvonly as specified in 920 [RFC3264]. 922 Unidirectional streams are useful for announcement- or listening-only 923 (hotword). The preferred mechanism for putting the media session on 924 hold is specified in [RFC3264], i.e. the UA modifies the stream to be 925 sendonly and mutes its own stream. Modification of the media session 926 does not affect VoiceXML application execution (except that 927 recognition actions initiated while on hold will result in noinput 928 timeouts). 930 3.4. Audio and Video Codecs 932 For the purposes of achieving a basic level of interoperability, this 933 section specifies a minimal subset of codecs and RTP [RFC3550] 934 payload formats that MUST be supported by the VoiceXML Media Server. 936 For audio-only applications, G.711 mu-law and A-law MUST be supported 937 using the RTP payload type 0 and 8 [RFC3551]. Other codecs and 938 payload formats MAY be supported. 940 Video telephony applications, which employ a video stream in addition 941 to the audio stream, are possible in VoiceXML 2.0/2.1 through the use 942 of multimedia file container formats such as the .3gp [TS26244] and 943 .mp4 formats [IEC14496-14]. Video support is optional for this 944 specification. If video is supported then: 946 1. H.263 Baseline [RFC4629] MUST be supported. For legacy reasons, 947 the 1996 version of H.263 MAY be supported using the RTP payload 948 format defined in [RFC2190] (payload type 34 [RFC3551]). 950 2. AMR-NB audio [RFC4867] SHOULD be supported. 952 3. MPEG-4 video [RFC3016] SHOULD be supported. 954 4. MPEG-4 AAC audio [RFC3016] SHOULD be supported. 956 5. Other codecs and payload formats MAY be supported. 958 Video record operations carried out by the VoiceXML Media Server 959 typically require receipt of an intra-frame before the recording can 960 commence. The VoiceXML Media Server SHOULD use the mechanism 961 described in [RFC4585] to request that a new intra-frame be sent. 963 3.5. DTMF 965 DTMF events [RFC4733] MUST be supported. When the user agent does 966 not indicate support for [RFC4733] the VoiceXML Media Server MAY 967 perform DTMF detection using other means such as detecting DTMF tones 968 in the audio stream. Implementation note: the reason why only 969 [RFC4733] telephone-events must be used when the user agent indicates 970 support of it is to avoid the risk of double detection of DTMF if 971 detection on the audio stream was simultaneously applied. 973 4. Returning Data to the Application Server 975 This section discusses the mechanisms for returning data (e.g. 976 collected utterance or digit information) from the VoiceXML Media 977 Server to the Application Server. 979 4.1. HTTP Mechanism 981 At any time during the execution of the VoiceXML application, data 982 can be returned to the Application Server via a HTTP POST using 983 standard VoiceXML elements such as or . Notably, 984 the element in VoiceXML 2.1 [VXML21] allows data to be sent to 985 the Application Server efficiently without requiring a VoiceXML page 986 transition and is ideal for short VoiceXML applications such as 987 "prompt and collect". 989 For most applications, it is necessary to correlate the information 990 being passed over HTTP with a particular VoiceXML Session. One way 991 this can be achieved is to include the SIP Call-ID (accessible in 992 VoiceXML via the session.connection.protocol.sip.headers array) 993 within the HTTP POST fields. Alternatively, a unique "POST-back URI" 994 can be specified as an application-specific URI parameter in the 995 Request-URI of the initial INVITE (accessible in VoiceXML via the 996 session.connection.protocol.sip.requesturi array). 998 4.2. SIP Mechanism 1000 Data can be returned to the Application Server via the expr or 1001 namelist attribute on or the namelist attribute on 1002 . A VoiceXML Media Server MUST support encoding of the 1003 expr / namelist data in the message body of a BYE request sent from 1004 the VoiceXML Media Server as a result of encountering the or 1005 element. A VoiceXML Media Server MAY support inclusion 1006 of the expr / namelist data in the message body of the 200 OK message 1007 in response to a received BYE request (i.e. when the VoiceXML 1008 Application responds to the connection.disconnect.hangup event and 1009 subsequently executes an element with the expr or namelist 1010 attribute specified). 1012 Note that sending expr/namelist data in the 200 OK response requires 1013 that the VoiceXML Media Server delay the final response to the 1014 received BYE request until the VoiceXML Application's post-disconnect 1015 final processing state terminates. This mechanism is subject to the 1016 constraint that the VoiceXML Media Server must respond before the 1017 UAC's timer F expires (defaults to 32 seconds). Moreover, for 1018 unreliable transports, the UAC will retransmit the BYE request 1019 according to the rules of [RFC3261]. The VoiceXML Media Server 1020 SHOULD implement the recommendations of [RFC4320] regarding when to 1021 send the 100 Trying provisional response to the BYE request. 1023 If a VoiceXML Application executes a [VXML21] and then 1024 subsequently executes an with namelist information, the 1025 namelist information from the element is discarded. 1027 Namelist variables are first converted to to their JSON value 1028 equivalent [RFC4627] and encoded in the message body using the 1029 application/x-www-form-urlencoded format content type [HTML4]. The 1030 behavior resulting from specifying a recording variable in the 1031 namelist or an ECMAScript object with circular references is not 1032 defined. If the expr attribute is specified on the element 1033 instead of the namelist attribute, the reserved name __exit is used. 1035 To allow the application server to differentiate between a BYE 1036 resulting from a from one resulting from an , the 1037 reserved name __reason is used, with a value of "disconnect" (without 1038 brackets) to reflect the use of VoiceXML's element, and 1039 a value of "exit" (without brackets) to an explicit in the 1040 VoiceXML document. If the session terminates for other reasons (such 1041 as the media server encountering an error), this parameter may be 1042 omitted, or may take on platform-specific values prefixed with an 1043 underscore. 1045 This specification extends the application/x-www-form-urlencoded by 1046 replacing non-ASCII characters with one or more octets of the UTF-8 1047 representation of the character, with each octet in turn replaced by 1048 %HH, where HH represents the uppercase hexadecimal notation for the 1049 octet value and % is a literal character. As a consequence, the 1050 Content-Type header field in a BYE message containing expr/namelist 1051 data MUST be set to application/x-www-form-urlencoded;charset=utf-8. 1053 The following table provides some examples of usage and the 1054 corresponding result content. 1056 +----------------------------------------------------------------+ 1057 | Usage | Result Content | 1058 |------------------------------|---------------------------------| 1059 | | __reason=exit | 1060 | | __exit=5&__reason=exit | 1061 | | __exit="done"&__reason=exit | 1062 | | __exit=true&__reason=exit | 1063 | | pin=1234&errors=0&__reason=exit | 1064 +----------------------------------------------------------------+ 1065 assuming the following VoiceXML variables and values: 1066 userAuthorized = true 1067 pin = 1234 1068 errors = 0 1070 For example, consider the VoiceXML snippet: 1072 ... 1073 1074 ... 1076 If id equals 1234 and pin equals 9999, say, the BYE message would 1077 look similar to: 1079 BYE sip:user@pc33.example.com SIP/2.0 1080 Via: SIP/2.0/UDP 192.0.2.4;branch=z9hG4bKnashds10 1081 Max-Forwards: 70 1082 From: sip:dialog@example.com;tag=a6c85cf 1083 To: sip:user@example.com;tag=1928301774 1084 Call-ID: a84b4c76e66710 1085 CSeq: 231 BYE 1086 Content-Type: application/x-www-form-urlencoded;charset=utf-8 1087 Content-Length: 30 1089 id=1234&pin=9999&__reason=exit 1091 5. Outbound Calling 1093 Outbound calls can be triggered via the Application Server using 1094 either third party call control [RFC3725] or with the SIP REFER 1095 mechanism [RFC3515]. 1097 5.1. Third Party Call Control Mechanism 1099 Flow IV from [RFC3725] is recommended in conjunction with the 1100 VoiceXML Session preparation mechanism. This flow has several 1101 advantages over others, namely: 1103 1. Selection of a VoiceXML Media Server and preparation of the 1104 VoiceXML Application can occur before the call is placed to avoid 1105 the callee experiencing delays. 1107 2. Avoids timing difficulties that could occur with other flows due 1108 to the time taken to fetch and parse the initial VoiceXML 1109 document. 1111 3. The flow is IPv6 compatible. 1113 An example flow for an Application Server initiated outbound call is 1114 provided in section 2.6.2. 1116 5.2. REFER Mechanism 1118 The Application Server can place a REFER request to the VoiceXML 1119 Media Server outside of a SIP dialog to initiate an outbound call. 1120 The Request-URI in the REFER is constructed identical to that of an 1121 INVITE to the VoiceXML Media Server and carries the same semantics. 1122 The Refer-To header contains the URI for the VoiceXML Media Server to 1123 place the call to. 1125 On receipt of the REFER request, the VoiceXML Media Server MUST issue 1126 a provisional response, 100 Trying. The 202 Accepted response 1127 indicates that the VoiceXML document has been fetched and parsed 1128 correctly. The VoiceXML Media Server proceeds to place the outbound 1129 INVITE and will execute the application after the ACK is sent. 1131 If the VoiceXML Session cannot be started, then the VoiceXML Media 1132 Server MUST respond to the REFER request using the procedure defined 1133 in section 2.2 above. 1135 An example is of the REFER initiated outbound call is given below. 1136 The NOTIFY messages, which contain message/sipfrag bodies [RFC3515], 1137 allow the Application Server to monitor the progress of the outbound 1138 call attempt. 1140 Note: An in-dialog REFER will result in a 403 Forbidden response. 1142 HTTP VoiceXML SIP 1143 Application Media Application User 1144 Server Server Server Agent 1145 | | | | 1146 | |(1) REFER | | 1147 | |<-------------------| | 1148 | |(2) 100 Trying | | 1149 | |------------------->| | 1150 | |(3) NOTIFY | | 1151 | |------------------->| | 1152 | |(4) 200 OK | | 1153 | |<-------------------| | 1154 |(5) GET | | | 1155 |<-------------------| | | 1156 |(6) 200 OK [VXML] | | | 1157 |------------------->| | | 1158 | |(7) 202 Accepted | | 1159 | |------------------->| | 1160 | |(8) INVITE [offer] | 1161 | |------------------------------------>| 1162 | |(9) 200 OK [answer] | 1163 | |<------------------------------------| 1164 | |(10) NOTIFY | | 1165 | |------------------->| | 1166 | |(11) 200 OK | | 1167 | |<-------------------| | 1168 | |(12) ACK | 1169 | |------------------------------------>| 1170 | |(13) RTP/SRTP | 1171 | |.....................................| 1172 | | | 1174 6. Call Transfer 1176 While VoiceXML is at its core a dialog language, it also provides 1177 optional call transfer capability. VoiceXML's transfer capability is 1178 particularly suited to the PSTN IVR Service Node use-case described 1179 in section 1.1.2. It is NOT RECOMMENDED to use VoiceXML's call 1180 transfer capability in networks involving Application Servers. 1181 Rather, the Application Server itself can provide call routing 1182 functionality by taking signaling actions based on the data returned 1183 to it from the VoiceXML Media Server via HTTP or in the SIP BYE 1184 message. 1186 If VoiceXML transfer is supported, the mechanism described in this 1187 section MUST be employed. The transfer flows specified here are 1188 selected on the basis that they provide the best interworking across 1189 a wide range of SIP devices. CCXML<->VoiceXML implementations, which 1190 require tight-coupling in the form of bi-directional eventing to 1191 support all transfer types defined in VoiceXML, may benefit from 1192 other approaches, such as the use of SIP event packages [RFC3265]. 1194 In what follows, the provisional responses have been omitted for 1195 clarity. 1197 6.1. Blind 1199 The blind transfer sequence is initiated by the VoiceXML Media Server 1200 via a REFER message [RFC3515] on the original SIP dialog. The 1201 Refer-To header contains the URI for the called party, as specified 1202 via the 'dest' or 'destexpr' attributes on the VoiceXML 1203 tag. 1205 If the REFER request is accepted, in which case the VoiceXML Media 1206 Server will receive a 2xx response, the VoiceXML Media Server throws 1207 the connection.disconnect.transfer event and will terminate the 1208 VoiceXML Session with a BYE message. For blind transfers, 1209 implementations MAY use [RFC4488] to suppress the implicit 1210 subscription associated with the REFER message. 1212 If the REFER request results in a non-2xx response, the 's 1213 form item variable (or event raised) depends on the SIP response and 1214 is specified in the following table. Note that this indicates that 1215 the transfer request was rejected. 1217 +-------------------------+-----------------------------------+ 1218 | SIP Response | variable / event | 1219 +-------------------------+-----------------------------------+ 1220 | 404 Not Found | error.connection.baddestination | 1221 | 405 Method Not Allowed | error.unsupported.transfer.blind | 1222 | 503 Service Unavailable | error.connection.noresource | 1223 | (No response) | network_busy | 1224 | (Other 3xx/4xx/5xx/6xx) | unknown | 1225 +-------------------------+-----------------------------------+ 1227 An example is illustrated below (provisional responses and NOTIFY 1228 messages corresponding to provisional responses have been omitted for 1229 clarity). 1231 User Agent 1 VoiceXML User Agent 2 1232 (Caller) Media Server (Callee) 1233 | | | 1234 |(0) RTP/SRTP | | 1235 |.................| | 1236 | | | 1237 |(1) REFER | | 1238 |<----------------| | 1239 |(2) 202 Accepted | | 1240 |---------------->| | 1241 |(3) BYE | | 1242 |<----------------| | 1243 |(4) 200 OK | | 1244 |---------------->| | 1245 | | Stop RTP (0) | 1246 |(5) INVITE | 1247 |---------------------------------->| 1248 |(6) 200 OK | 1249 |<----------------------------------| 1250 |(7) NOTIFY | | 1251 |---------------->| | 1252 |(8) 200 OK | | 1253 |<--------------- | | 1254 |(9) ACK | 1255 |---------------------------------->| 1256 |(10) RTP/SRTP | 1257 |...................................| 1258 | | | 1260 If the "aai" or "aaiexpr" attribute is present on , it is 1261 appended to the Refer-To URI as a parameter named "aai" in the REFER 1262 method. Reserved characters are URL-encoded as required for SIP/SIPS 1263 URIs [RFC3261]. The mapping of values outside of the ASCII range is 1264 platform specific. 1266 6.2. Bridge 1268 The bridge transfer function results in the creation of a small 1269 multi-party session involving the Caller, the VoiceXML Media Server, 1270 and the Callee. The VoiceXML Media Server invites the Callee to the 1271 session and will eject the Callee if the transfer is terminated. 1273 If the "aai" or "aaiexpr" attribute is present on , it is 1274 appended to the Request-URI in the INVITE as a URI parameter named 1275 "aai". Reserved characters are URL-encoded as required for SIP/SIPS 1276 URIs [RFC3261]. The mapping of values outside of the ASCII range is 1277 platform specific. 1279 During the transfer attempt, audio specified in the transferaudio 1280 attribute of is streamed to User Agent 1. A VoiceXML 1281 Media Server MAY play early media received from the Callee to the 1282 Caller if the transferaudio attribute is omitted. 1284 The bridge transfer sequence is illustrated below. The VoiceXML 1285 Media Server (acting as a UAC) makes a call to User Agent 2 with the 1286 same codecs used by User Agent 1. When the call setup is complete, 1287 RTP flows between User Agent 2 and the VoiceXML Media Server. This 1288 stream is mixed with User Agent 1's. 1290 User Agent 1 VoiceXML User Agent 2 1291 (Caller) Media Server (Callee) 1292 | | | 1293 |(0)RTP/SRTP | | 1294 |...................| | 1295 | | | 1296 | |(1)INVITE [offer] | 1297 | |------------------>| 1298 | |(2) 200 OK [answer]| 1299 | |<------------------| 1300 | |(3) ACK | 1301 | |------------------>| 1302 | |(4) RTP/SRTP | 1303 | mix |...................| 1304 | (0)+(4)| | 1306 If a final response is not received from User Agent 2 from the INVITE 1307 and the connecttimeout expires (specified as an attribute of 1308 ), the VoiceXML Media Server will issue a CANCEL to 1309 terminate the transaction and the 's form item variable is 1310 set to noanswer. 1312 If INVITE results in a non-2xx response, the 's form item 1313 variable (or event raised) depends on the SIP response and is 1314 specified in the following table. 1316 +-------------------------+-----------------------------------+ 1317 | SIP Response | variable / event | 1318 +-------------------------+-----------------------------------+ 1319 | 404 Not Found | error.connection.baddestination | 1320 | 405 Method Not Allowed | error.unsupported.transfer.bridge | 1321 | 408 Request Timeout | noanswer | 1322 | 486 Busy Here | busy | 1323 | 503 Service Unavailable | error.connection.noresource | 1324 | (No response) | network_busy | 1325 | (Other 3xx/4xx/5xx/6xx) | unknown | 1326 +-------------------------+-----------------------------------+ 1328 The 405 Method Not Allowed response can be used by the AS to 1329 gracefully decline bridge transfers 1331 Once the transfer is established, the VoiceXML Media Server can 1332 "listen" to the media stream from User Agent 1 to perform speech or 1333 DTMF hotword, which when matched, results in a near-end disconnect, 1334 i.e. the VoiceXML Media Server issues a BYE to User Agent 2 and the 1335 VoiceXML Application continues with User Agent 1. A BYE will also be 1336 issued to User Agent 2 if the call duration exceeds the maximum 1337 duration specified in the maxtime attribute on . 1339 If User Agent 2 issues a BYE during the transfer, the transfer 1340 terminates and the VoiceXML 's form item variable receives 1341 the value far_end_disconnect. If User Agent 1 issues a BYE during 1342 the transfer, the transfer terminates and the VoiceXML event 1343 connection.disconnect.transfer is thrown. 1345 6.3. Consultation 1347 The consultation transfer (also called attended transfer [SIPEX]) is 1348 similar to a blind transfer except that the outcome of the transfer 1349 call setup is known and the Caller is not dropped as a result of an 1350 unsuccessful transfer attempt. 1352 Consultation transfer commences with the same flow as for bridge 1353 transfer except that the RTP streams are not mixed at step (4) and 1354 error.unsupported.transfer.consultation supplants 1355 error.unsupported.transfer.bridge. Assuming a new SIP dialog with 1356 User Agent 2 is created, the remainder of the sequence follows as 1357 illustrated below (provisional responses and NOTIFY messages 1358 corresponding to provisional responses have been omitted for 1359 clarity). Consultation transfer makes use of the Replaces: header 1360 [RFC3891] such that User Agent 1 calls User Agent 2 and replaces the 1361 latter's SIP dialog with the VoiceXML Media Server with a new SIP 1362 dialog between the Caller and Callee. 1364 User Agent 1 VoiceXML User Agent 2 1365 (Caller) Media Server (Callee) 1366 | | | 1367 |(0) RTP/SRTP | | 1368 |.................|(4) RTP/SRTP | 1369 | |.................| 1370 |(5) REFER | | 1371 |<----------------| | 1372 |(6) 202 Accepted | | 1373 |---------------->| | 1374 |(7) INVITE Replaces:ms1.example.com| 1375 |---------------------------------->| 1376 |(8) 200 OK | 1377 |<----------------------------------| 1378 |(9) ACK | 1379 |---------------------------------->| 1380 |(10) RTP/SRTP | 1381 |...................................| 1382 | |(11) BYE | 1383 | |<----------------| 1384 | |(12) 200 OK | 1385 | |---------------->| Stop 1386 |(13) NOTIFY | | RTP (4) 1387 |---------------->| | 1388 |(14) 200 OK | | 1389 |<----------------| | 1390 |(15) BYE | | 1391 |<----------------| | 1392 |(16) 200 OK | | 1393 |---------------->| Stop | 1394 | | RTP (0) | 1396 If a response other than 202 Accepted is recevied in response to the 1397 REFER request sent to User Agent 1, the transfer terminates, and an 1398 error.unsupported.transfer.consultation event is raised. In 1399 addition, a BYE is sent to User Agent 2 to terminate the established 1400 outbound leg. 1402 The VoiceXML Media Server uses receipt of a NOTIFY message with a 1403 sipfrag message of 200 OK to determine that the consultation transfer 1404 has succeeded. When this occurs, the connection.disconnect.transfer 1405 event will be thrown to the VoiceXML application, and a BYE is sent 1406 to User Agent 1 to terminate the session. A NOTIFY message with a 1407 non-2xx final response sipfrag message body will result in the 1408 transfer terminating and the associated VoiceXML input item variable 1409 being set to 'unknown'. Note that as a consequence of this 1410 mechanism, implementations MUST NOT use [RFC4488] to suppress the 1411 implicit subscription associated with the REFER message for 1412 consultation transfers. 1414 7. Contributors 1416 The editors gratefully acknowledge the following individuals and 1417 their companies who contributed to this specification: 1419 R. J. Auburn (Voxeo) 1421 Hans Bjurstrom (Hewlett-Packard) 1423 Dave Burke (Google) 1425 Emily Candell (Comverse) 1427 Peter Danielsen (Lucent) 1429 Brian Frasca (Tellme) 1431 Jeff Haynie (Hakano) 1433 Scott McGlashan (Hewlett-Packard) 1435 Matt Oshry (Tellme) 1437 Mark Scott (Genesys Telecommunications Laboratories, Inc) 1439 Rao Surapaneni (Tellme) 1441 8. Security Considerations 1443 Exposing network services with well-known addresses may not be 1444 desirable. The VoiceXML Media Server SHOULD authenticate and 1445 authorize requesting endpoints per local policy. This is 1446 particularly important for REFER-initated outbound calls. 1448 Some applications may choose to transfer confidential information to 1449 or from the VoiceXML Media Server. The VoiceXML Media Server SHOULD 1450 implement the sips: and https: schemes to provide data integrity. 1452 The VoiceXML Media Server SHOULD use authentication and TLS when 1453 establishing MRCP control sessions with a MRCPv2 Media Resource 1454 Server. 1456 To mitigate against the possibility for denial of service attacks, 1457 the VoiceXML Media Server SHOULD have local policies such as time- 1458 limiting VoiceXML application execution. 1460 The VoiceXML Media Server SHOULD support Secure RTP (SRTP) [RFC3711] 1461 to provide confidentiality, authentication, and replay protection for 1462 RTP media streams (including RTCP control traffic). 1464 9. IANA Considerations 1466 This document makes no request of IANA. 1468 Note to RFC Editor: this section may be removed on publication as an 1469 RFC. 1471 10. Changes since last version: 1473 o JSON used for serialization/deserialization of ECMAScript objects 1475 o Added description of "delegation model" 1477 o Clarified transfer not suitable for use in AS/MS architectures 1479 o Added inactive as a permissible value for 1480 session.connection.protocol.sip.media[x].direction 1482 o Clarified that some header / Request-URI parameters can only be 1483 accessed using the array access mechanism 1485 o Minor typographic corrections 1487 o Updated references 1489 11. References 1491 11.1. Normative References 1493 [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 1494 Specification", W3C Recommendation, Dec 1999. 1496 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1497 Requirement Levels", BCP 14, RFC 2119, March 1997. 1499 [RFC2190] Zhu, C., "RTP Payload Format for H.263 Video Streams", 1500 RFC 2190, September 1997. 1502 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1503 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1504 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1506 [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. 1507 Kimata, "RTP Payload Format for MPEG-4 Audio/Visual 1508 Streams", RFC 3016, November 2000. 1510 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1511 A., Peterson, J., Sparks, R., Handley, M., and E. 1512 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1513 June 2002. 1515 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1516 with Session Description Protocol (SDP)", RFC 3264, 1517 June 2002. 1519 [RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific 1520 Event Notification", RFC 3265, June 2002. 1522 [RFC3311] Rosenberg, J., "The Session Initiation Protocol (SIP) 1523 UPDATE Method", RFC 3311, October 2002. 1525 [RFC3326] Schulzrinne, H., Oran, D., and G. Camarillo, "The Reason 1526 Header Field for the Session Initiation Protocol (SIP)", 1527 RFC 3326, December 2002. 1529 [RFC3515] Sparks, R., "The Session Initiation Protocol (SIP) Refer 1530 Method", RFC 3515, April 2003. 1532 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1533 Jacobson, "RTP: A Transport Protocol for Real-Time 1534 Applications", STD 64, RFC 3550, July 2003. 1536 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1537 Video Conferences with Minimal Control", STD 65, RFC 3551, 1538 July 2003. 1540 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1541 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1542 RFC 3711, March 2004. 1544 [RFC3725] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. 1545 Camarillo, "Best Current Practices for Third Party Call 1546 Control (3pcc) in the Session Initiation Protocol (SIP)", 1547 BCP 85, RFC 3725, April 2004. 1549 [RFC3891] Mahy, R., Biggs, B., and R. Dean, "The Session Initiation 1550 Protocol (SIP) "Replaces" Header", RFC 3891, 1551 September 2004. 1553 [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing 1554 Tone Generation in the Session Initiation Protocol (SIP)", 1555 RFC 3960, December 2004. 1557 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1558 Resource Identifier (URI): Generic Syntax", STD 66, 1559 RFC 3986, January 2005. 1561 [RFC4240] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network 1562 Media Services with SIP", RFC 4240, December 2005. 1564 [RFC4244] Barnes, M., "An Extension to the Session Initiation 1565 Protocol (SIP) for Request History Information", RFC 4244, 1566 November 2005. 1568 [RFC4320] Sparks, R., "Actions Addressing Identified Issues with the 1569 Session Initiation Protocol's (SIP) Non-INVITE 1570 Transaction", RFC 4320, January 2006. 1572 [RFC4488] Levin, O., "Suppression of Session Initiation Protocol 1573 (SIP) REFER Method Implicit Subscription", RFC 4488, 1574 May 2006. 1576 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1577 "Extended RTP Profile for Real-time Transport Control 1578 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 1579 July 2006. 1581 [RFC4627] Crockford, D., "The application/json Media Type for 1582 JavaScript Object Notation (JSON)", RFC 4627, July 2006. 1584 [RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. 1586 Even, "RTP Payload Format for ITU-T Rec", RFC 4629, 1587 January 2007. 1589 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 1590 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 1591 December 2006. 1593 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1594 Formats", RFC 4855, February 2007. 1596 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 1597 "RTP Payload Format and File Storage Format for the 1598 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 1599 (AMR-WB) Audio Codecs", RFC 4867, April 2007. 1601 [VXML20] McGlashan, S., Burnett, D., Carter, J., Danielsen, P., 1602 Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., 1603 and S. Tryphonas, "Voice Extensible Markup Language 1604 (VoiceXML) Version 2.0", W3C Recommendation, March 2004. 1606 [VXML21] Oshry, M., Auburn, R J., Baggia, P., Bodell, M., Burke, 1607 D., Burnett, D., Candell, E., Kilic, H., McGlashan, S., 1608 Lee, A., Porter, B., and K. Rehor, "Voice Extensible 1609 Markup Language (VoiceXML) Version 2.1", W3C Candidate 1610 Recommendation, June 2005. 1612 11.2. Informative References 1614 [CCXML10] Auburn, R J., "Voice Browser Call Control: CCXML Version 1615 1.0", W3C Working Draft (work in progress), June 2005. 1617 [IEC14496-14] 1618 "Information technology. Coding of audio-visual objects. 1619 MP4 file format", ISO/IEC ISO/IEC 14496-14:2003, 1620 October 2003. 1622 [MRCPv2] Shanmugham, S. and D. Burnett, "Media Resource Control 1623 Protocol Version 2", draft-ietf-speechsc-mrcpv2-12 (work 1624 in progress), Mar 2007. 1626 [SIPEX] Johnston, A., Sparks, R., Cunningham, C., Donovan, S., and 1627 K. Summers, "Session Initiation Protocol Examples", 1628 draft-ietf-sipping-service-examples (work in progress), 1629 July 2005. 1631 [SIPVXML] Rosenberg, J., Mataga, P., and D. Ladd, "A SIP Interface 1632 to VoiceXML Dialog Servers", draft-rosenberg-sip-vxml-00 1633 (work in progress), July 2001. 1635 [TS23002] "3rd Generation Partnership Project: Network architecture 1636 (Release 6)", 3GPP TS 23.002 v6.6.0, December 2004. 1638 [TS26244] "Transparent end-to-end packet switched streaming service 1639 (PSS); 3GPP file format (3GP)", 3GPP TS 26.244 v6.4.0, 1640 December 2004. 1642 Authors' Addresses 1644 Dave Burke 1645 Google 1646 Belgrave House, 76 Buckingham Palace Road 1647 London SW1W 9TQ 1648 United Kingdom 1650 Email: daveburke@google.com 1652 Mark Scott 1653 Genesys 1654 1120 Finch Avenue West, 8th floor 1655 Toronto, Ontario M3J 3H7 1656 Canada 1658 Email: Mark.Scott@genesyslab.com 1660 Jeff Haynie 1661 Hakano Inc 1662 1840 North Creek Circle 1663 Alpharetta, GA 30004 1664 USA 1666 Email: jhaynie@hakano.com 1668 R.J. Auburn 1669 Voxeo 1670 100 East Pine Street #600 1671 Orlando, FL 32801 1672 USA 1674 Email: rj@voxeo.com 1676 Scott McGlashan 1677 Hewlett-Packard 1678 Gustav III:s boulevard 36 1679 SE-16985 Stockholm 1680 Sweden 1682 Email: Scott.McGlashan@hp.com 1684 Full Copyright Statement 1686 Copyright (C) The IETF Trust (2007). 1688 This document is subject to the rights, licenses and restrictions 1689 contained in BCP 78, and except as set forth therein, the authors 1690 retain all their rights. 1692 This document and the information contained herein are provided on an 1693 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1694 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1695 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1696 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1697 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1698 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1700 Intellectual Property 1702 The IETF takes no position regarding the validity or scope of any 1703 Intellectual Property Rights or other rights that might be claimed to 1704 pertain to the implementation or use of the technology described in 1705 this document or the extent to which any license under such rights 1706 might or might not be available; nor does it represent that it has 1707 made any independent effort to identify any such rights. Information 1708 on the procedures with respect to rights in RFC documents can be 1709 found in BCP 78 and BCP 79. 1711 Copies of IPR disclosures made to the IETF Secretariat and any 1712 assurances of licenses to be made available, or the result of an 1713 attempt made to obtain a general license or permission for the use of 1714 such proprietary rights by implementers or users of this 1715 specification can be obtained from the IETF on-line IPR repository at 1716 http://www.ietf.org/ipr. 1718 The IETF invites any interested party to bring to its attention any 1719 copyrights, patents or patent applications, or other proprietary 1720 rights that may cover technology that may be required to implement 1721 this standard. Please address the information to the IETF at 1722 ietf-ipr@ietf.org. 1724 Acknowledgment 1726 Funding for the RFC Editor function is provided by the IETF 1727 Administrative Support Activity (IASA).