idnits 2.17.1 draft-rosenberg-sip-vxml-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of too long lines in the document, the longest one being 6 characters in excess of 72. == There are 9 instances of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 131: '...ted), the server SHOULD generate a 501...' RFC 2119 keyword, line 135: '... The server SHOULD authenticate the ...' RFC 2119 keyword, line 141: '... request is allowed. It is RECOMMENDED...' RFC 2119 keyword, line 149: '... far, the server SHOULD fetch the scri...' RFC 2119 keyword, line 153: '...eway to HTTP. It MAY include a Warning...' (32 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 13, 2001) is 8323 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 658 looks like a reference -- Missing reference section? '2' on line 662 looks like a reference -- Missing reference section? '3' on line 666 looks like a reference -- Missing reference section? '4' on line 670 looks like a reference -- Missing reference section? '5' on line 674 looks like a reference -- Missing reference section? '6' on line 678 looks like a reference -- Missing reference section? '7' on line 682 looks like a reference -- Missing reference section? '8' on line 686 looks like a reference -- Missing reference section? '9' on line 690 looks like a reference -- Missing reference section? '10' on line 695 looks like a reference -- Missing reference section? '11' on line 699 looks like a reference -- Missing reference section? '12' on line 703 looks like a reference Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIP WG 3 Internet Draft Rosenberg,Mataga,Ladd 4 draft-rosenberg-sip-vxml-00.txt dynamicsoft 5 July 13, 2001 6 Expires: February 2001 8 A SIP Interface to VoiceXML Dialog Servers 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 To view the list Internet-Draft Shadow Directories, see 29 http://www.ietf.org/shadow.html. 31 Abstract 33 VoiceXML is an XML based scripting language for describing voice 34 dialogs. VoiceXML interpreters run within an interpreter context 35 that, among other tasks, provides a call control interface for 36 accessing the interpreter. It is very natural to provide a VoIP-based 37 interpreter context that uses SIP and RTP to communicate with the 38 outside world. In this document, we provide detailed specifications 39 for a SIP/RTP based interpreter context. 41 1 Introduction 43 VoiceXML [1] is an XML based scripting language for describing voice 44 dialogs. It supports user input through speech recognition and DTMF, 45 and can communicate with the user through text-to-speech or recorded 46 files. VoiceXML scripts are interpreted by a VoiceXML interpreter. 48 This interpreter, in turn, runs within an interpreter context. The 49 interpreter context is the interface between the outside world and 50 the interpreter. It typically handles the mechanisms by which the 51 script execution begins, and by which it is fed media to drive it. It 52 also provides the means for fetching documents from some form of 53 document server. 55 It is very natural to provide a VoiceXML interpeter context based 56 purely on IP. Specifically, based on VoIP using SIP [2] and RTP [3], 57 along with HTTP for document access. An incoming VoIP call triggers 58 the execution of the script, fetched from a server using HTTP. The 59 incoming RTP stream for the call is passed to the interpeter for 60 processing, and speech generated by the interpreter is sent over RTP 61 to the called party. We call a pure IP-based VoiceXML system an "IP 62 dialog server", or just "dialog server". 64 Dialog servers are a key part of the application story for SIP-based 65 networks, as described in the SIP application component architecture 66 [4]. That document describes SIP-based dialog servers, and provides a 67 high level overview of how the SIP interface works. This document 68 provides a stand-alone, self-contained, more thorough description of 69 a SIP-based VoIP VoiceXML interpreter context. 71 2 Script Initiation 73 The script execution begins when a session is established using an 74 INVITE request. 76 2.1 Script Naming 78 In SIP, the request-URI identifies the user or service that the call 79 is destined for. In the case of a dialog server, the dialog itself is 80 the target for the call. As such, the request URI should contain the 81 identifier for this dialog. This is consistent with the Request-URI 82 service invocation model of RFC 3087 [5]. This URL can be in one of 83 two formats. In the first, the VoiceXML script is identified directly 84 by an HTTP URL. In the second, the script is not specified. Rather, 85 the dialog server uses its configuration to map the incoming request 86 to a specific script. The format for the Request-URI in either case 87 is: 89 Request-URI = "sip:" service-ID "." dialog-type ["." dialog-specific] 90 "@" hostport url-parameters [headers] 91 service-ID = "dialog" | extension-token 92 dialog-type = "vxml" | service-token 93 dialog-specific = vxml-specific | service-token 94 service-token = 1*(alphanum | "-" | "!" | "%" | "*" 95 | "_" | "+" | "`" | "'" | "~{}" ) 96 vxml-specific = user-unreserved | unreserved | escaped 98 Since the request URI can indicate a request for a variety of 99 different services, of which a dialog server is only one type, the 100 request URI first begins with a service identifier, that indicates 101 the basic service required. This document specifies that dialog 102 servers are addressed by having the first part of the username in the 103 request-URI contain the service identifier "dialog" to indicate that 104 a dialog service is requested. This is followed by a period, and 105 after that, an identifier that indicates the means by which the 106 dialog is specified. Currently, one mechanism is defined - a VoiceXML 107 script. Other tokens can be used to indicate different mechanisms 108 (note that service-token is identical to the BNF for token from RFC 109 2543, except that the "." character is disallowed). After that comes 110 an optional period followed by dialog-mechanism specific 111 identification. For VoiceXML scripts, when present, this 112 identification information is always a URL-encoded version of the URL 113 which references the script to execute. When not present, the dialog 114 server uses server-specific configuration to determine which script 115 to execute. 117 Examples of URLs that invoke VoiceXML dialogs are: 119 sip:dialog.vxml.http%3a//dialogs.server.com/script32.vxml@vxmlservers.com 120 sip:dialog.vxml@vxmlservers.com 122 The first of these indicates that the dialog server (located at 123 vxmlservers.com) should invoke a VoiceXML script fetched from 124 http://dialogs.server.com/script32.vxml. Since the user part of the 125 SIP URL cannot contain the : character, this must be escaped to %3a. 127 2.2 Responding to the INVITE 129 If the server receiving the INVITE doesn't support the specifics of 130 the service request (for example, the requested VoiceXML version is 131 not supported), the server SHOULD generate a 501 response. It MAY 132 include a Warning header providing details on why the request could 133 not be serviced. 135 The server SHOULD authenticate the caller and verify that they are 136 authorized to access the requested service. It is anticipated that 137 dialog servers will generally be used in conjunction with an 138 application server which makes the actual authorization decision 139 about whether the call is to be processed. As a result, the dialog 140 server's authorization decision is simple - if it came from an 141 authorized upstream server, the request is allowed. It is RECOMMENDED 142 that a persistent TLS connection between the application server and 143 the dialog server be used to provide the authentication credentials 144 for this kind of scenario. 146 The server then validates that the SDP in the INVITE, if present, is 147 acceptable. It does so based on the procedures of Section 2.3. 149 If it has gotten this far, the server SHOULD fetch the script 150 identified by the request-URI before generating a final response to 151 the request. If the script cannot be fetched, or is invalid, the 152 server generates a 502 Bad Gateway response, since effectively the 153 server is a gateway to HTTP. It MAY include a Warning header 154 providing details on the reason for failure. 156 Once the script has been fetched, and is valid, and the offered SDP 157 is deemed acceptable, the server SHOULD generate a 200 OK response. 158 The generation of the response, and ACK processing, are based on 159 standard SIP semantics. 161 2.3 SDP Processing 163 If the INVITE contains SDP with an offer, the dialog server will 164 generate an answer as per SIP-bis [6]. The offer is deemed 165 unacceptable if it contains no media lines of type audio, or if the 166 dialog server supports none of the codecs listed for the audio 167 streams. Otherwise, it is deemed acceptable. 169 The answer generated by the dialog server SHOULD refuse all media 170 streams excepting the first offered audio stream. Choice of codecs 171 used by the dialog server is at the discretion of the implementor. 172 However, it is STRONGLY RECOMMENDED that all dialog servers support 173 G.711 and RFC 2833. If an offered media stream does not indicate 174 support for RFC 2833 tones, the dialog server SHOULD add that codec 175 to the answer. As described in RFC2543-bis, this allows the dialog 176 server to inform the caller that it can receive rfc2833 media, even 177 if the caller cannot receive it. 179 The server SHOULD allow sendonly, recvonly, and sendrecv media 180 streams, as well as streams on hold. The meaning of these for script 181 interpretation is discussed in Section 4. 183 If the INVITE from the caller did not contain an SDP, the dialog 184 server SHOULD generate an offer in the 2xx with a single audio media 185 line, listing all codecs supported by the dialog server. 187 2.4 Script Variables 189 In VoiceXML 1.0, the interpreter context provides the script with 190 several variables that provide information on the call control 191 interfaces. These variables are set in the following fashion: 193 session.telephone.ani: This variable is the value of the URL in 194 the From field of the INVITE that triggered the script. 196 session.telephone.dnis: This variable is the value of the URL in 197 the To field of the INVITE that triggered the script. 199 session.telephone.iidigits: If the Contact header in the INVITE 200 request uses the SIP caller preferences contact parameters 201 [7] to provide additional information on the initiating 202 device, the interpreter context SHOULD map these parameters 203 to closest II digit if possible. 205 session.telephone.uui: This variable is set only if the INVITE 206 request contained an embedded ISUP IAM request [8]. In that 207 case, the user-to-user information elements from that IAM 208 are extracted, and mapped to this variable. Support for 209 this is optional, but RECOMMENDED. 211 3 Document Acquisition 213 The interpreter context fetches the script using normal HTTP GET and 214 POST requests [9]. It MUST follow the caching behaviors specified in 215 VoiceXML 1.0. It MAY support other document acquisition protocols, 216 such as FTP. 218 4 Audio Input and Output 220 Audio input and output are provided through RTP. The implementation 221 platform SHOULD provide DTMF recognition on the incoming media 222 stream, indpendent of its codec type. This is greatly facilitated 223 through RFC 2833, which pushes the DTMF detection operation to the 224 originator. The implementation platform SHOULD provide speech 225 recognition on the incoming media stream as well. 227 To be very explicit, this means that the dialog server SHOULD support 228 recognition of DTMF and speech by processing a single incoming media 229 stream. Furthermore, this stream can be sent by the caller using one 230 of at least two codecs - G.711 and RFC 2833, and that the sender of 231 the media can switch codecs on the fly when it detects DTMF. This 232 means that RTP packets 1, 2 and 3 might be G.711, followed by RTP 233 packet 4 which is RFC 2833. Furthermore, despite the fact that the 234 sender can send RFC2833, the dialog server SHOULD still perform DTMF 235 detection on the media stream, in case the sender does not support 236 RFC 2833, or does support it, but misses a digit. 238 OPEN ISSUE: This is a strong statement; if the probability 239 of missed DTMF is small, the dialog server shouldn't have 240 to do detection if it knows the caller has done it. 241 Problem, though: since SDP has no way to indicate code- 242 specific directionalities in a sendrecv stream, a UA that 243 can only send RFC 2833 doesn't say anything about it in the 244 SDP in the INVITE. As a result, there is no way to know for 245 sure that the sender can do it until the first RFC 2833 246 packet shows up. The SDP FID [10] specification resolves 247 this. Should we make support for the FID spec mandatory for 248 dialog servers? 250 Some implementations we are aware of use a separate stream for the 251 DTMF and for the speech. This approach is NOT RECOMMENDED, since it 252 makes synchronization of the speech and DTMF difficult. 254 SDP allows media streams to be unidirectional. If a stream is one-way 255 from the caller to the dialog server, this means that script 256 processing SHOULD proceed normally, except that any audio which would 257 normally be output by the implementation platform is discarded. 258 Furthermore, if a stream is one-way from the dialog server to the 259 caller, script processing SHOULD proceed normally, except that the 260 implementation platform never delivers characters (i.e., DTMF digits) 261 or utterances to the interpreter. In other words, behavior is 262 identical to the case where the caller is simply not talking. 264 Unidirectional streams are very useful for applications which require 265 a "listener" on an existing media stream to look for a particular 266 utterance and DTMF digit, and deliver that to an application server 267 for event processing. Therefore, it is RECOMMENDED that they be 268 supported in dialog servers as described above. 270 SIP allows media streams to be placed on hold. This will happen when 271 the interpreter context receives a re-INVITE with an SDP with a 272 0.0.0.0 connection line. This is handled identically to the case of a 273 media stream which is unidirectional from the dialog server to the 274 caller, meaning that it's "just" disconnected, not an interpreter- 275 freeze. 277 SIP allows media streams to be disabled by setting the port to zero. 278 This has very specific meaning in the case of a dialog server. It has 279 the effect of requesting a freeze of the interpreter state. When the 280 interpreter context returns a 200 OK as a response, it indicates that 281 the interpreter has been frozen. The interpreter is truly frozen; the 282 behavior should be as if time were literally suspended as far as the 283 interpreter is concerned. To unfreeze the interpreter state, a re- 284 INVITE is needed to establish a new audio media stream. This will 285 cause processing of the script to continue at exactly the same place 286 it left off, using the media input and output from the new media 287 stream to drive the interpreter. It is critical that, as far as the 288 script is concerned, the freeze never even took place. 290 This capability is essential for supporting feature composition of 291 voice-based applications. Consider application A, which allows the 292 user to hear an announcement when a friend comes online. If the user 293 says yes, a call is placed to that friend. Another application, B, 294 allows the user to hear stock quotes. We'd like to compose these so 295 that both can happen simultaneously. For that to happen in a 296 reasonable fashion, one of these applications has the "focus", 297 meaning that it is the one processing the input and output from the 298 user. Consider the case where the stock quote application has the 299 focus. An the stock quote application runs on dialog server X, and 300 the presence application on dialog server Y. Application server Z is 301 the central point for all system events related to all applications. 302 The flow to consider is show in Figure 1. At the beginning of the 303 flow, the caller has a call leg to the AS, the the AS has used third 304 party call control [11] to connect the caller to dialog server X. 305 This means there is an RTP connection between the caller and this 306 dialog server, as shown. 308 An external event (such as a friend coming online), will cause an 309 application server to decide that the other voice application needs 310 to receive the focus. However, we don't want to terminate the stock 311 quote application; we merely wish to suspend it so that the user can 312 resume it after hearing that the friend came online. So, the 313 application server sends a re-INVITE (1) to the dialog server running 314 the stock quote application, and requests it to be frozen. When the 315 interpreter comples the current prompt block, the context freezes the 316 interpreter and returns a 200 OK. The AS then connects the user to 317 the dialog server running the presence application (4-9). Dialog 318 server Y will fetch the VoiceXML script from the AS (since the AS 319 knows the identity of the buddy that came online, it needs to be the 320 one that generates the VoiceXML script), but this is not shown. This 321 dialog runs, and assuming the user doesn't call the friend, the 322 script terminates, causing server Y to send a BYE (10). The AS 323 decides to resume the stock quote application. So, using 3pcc, it 324 reconnects the caller with server X (12-17). The re-INVITE to server 325 X (14) has the effect of unfreezing the context, so processing 326 continues where the call left off. 328 The result of this is that the user's experience is the following: 330 network: Please enter the stock to check. 331 user: Lucent 332 network: Lucent technologies is at six dollars. 333 network: Friend alert: Bob is online. Would you like to call him? 334 user: no 335 network: Please enter the name of the stock to check. 337 Note that The issue of when the interpreter can be suspended is being 338 worked in the W3C. 340 The key idea with this mechanism is that in NO CASE should the 341 VoiceXML script for the stock quote application need to know that 342 this external event (the buddy coming online) has occurred, so that 343 it can play the buddy announcement. Doing so is counter to the entire 344 concept of feature interaction; it is an intractable problem if every 345 application and feature needs to know about each other. In the 346 approach proposed here, each voice application remains independent. 347 The application server plays the role of composing them by activating 348 and deactivating the contexts as needed. This still requires the AS 349 to know the set of applications that are running, but in this case, 350 it doesn't need to know anything except the relative precedences of 351 the various applications and the events which trigger them. Logic for 352 that can, in principle, be constructed in a generic way, independent 353 of the specific applications. 355 This approach isn't perfect for all cases, but its simple enough to 356 get things started. 358 4.1 Processing Further SIP Messages 360 The interpreter context processes subsequent SIP messages in the 361 following fashion. 363 4.2 BYE 365 If a BYE request is received from the caller, this terminates the 366 call. The interpreter context SHOULD throw the telephone.disconnect 367 event to the interpreter. 369 4.3 re-INVITE 371 If a re-INVITE is received, it has the effect of changing some aspect 372 of the media input and output. Codec changes, port changes, and IP 373 Caller AS (Z) DS (X) DS (Y) 374 |RTP | | | 375 |...................................| | 376 | |friend online | | 377 | |<-------- | | 378 | |(1) INV disable | | 379 | |---------------->|request freeze | 380 | |(2) 200 OK | | 381 | |<----------------|frozen | 382 | |(3) ACK | | 383 | |---------------->| | 384 | |(4) INV no SDP | | 385 | |---------------------------------->| 386 | |(5) 200 SDP 1 | | 387 |(6) INV SDP 1 |<----------------------------------| 388 |<----------------| | | 389 |(7) 200 SDP 2 | | | 390 |---------------->|(8) ACK SDP 2 | | 391 |(9) ACK |---------------------------------->| 392 |<----------------| | | 393 | | | | 394 | RTP | | | 395 |.....................................................| 396 | | | | 397 | |(10) BYE | | 398 | |<----------------------------------| 399 | |(11) 200 OK | | 400 | |---------------------------------->| 401 |(12) INV no SDP | | | 402 |<----------------| | | 403 |(13) 200 SDP 3 | | | 404 |---------------->|(14) INV SDP 3 | | 405 | |---------------->|unfreeze | 406 | |(15) 200 SDP 4 | | 407 |(16) ACK SDP 4 |<----------------| | 408 |<----------------|(17) ACK | | 409 | |---------------->| | 410 |RTP | | | 411 |.................|.................| | 412 | | | | 413 | | | | 415 Figure 1: Voice Application Composition 416 address changes are handled normally as per bis [6]. Specific 417 processing is required for changes in stream direction, placing the 418 call on hold, disabling a media stream, and adding a new audio stream 419 after a previous re-INVITE disabled it. See Section 4. 421 4.4 INFO, MESSAGE 423 These messages are ignored by the interpreter context. 425 5 Tag Processing 427 Certain tags within the VoiceXML script have call control 428 implications. The following subsections describe how the interpreter 429 context handles them. 431 5.1 Exit 433 VoiceXML 1.0 says that the processing of the exit tag is entirely 434 context specific. 436 For SIP, the interpreter context SHOULD send a BYE to terminate the 437 call. 439 Ideally, the VoiceXML element would also post the given 440 namelist to a URI specified in the original call setup. For example, 441 the URI of an HTTP servlet running directly in the AS or in an 442 associated web application server would be an appropriate choice. 443 This would allow voice interactions to be completely independent of 444 the calling context, and therefore be re-usable across providers and 445 applications. The VoiceXML specification is silent on exactly what 446 should happen with the namelist. For this reason, we do not 447 specify specific processing at this time. 449 OPEN ISSUE: Should we specify something? We could provide 450 an additional URL at script initiation which is used to 451 post the namelist upon exit. 453 5.2 Disconnect 455 The interpreter context SHOULD send a BYE to terminate the call. As 456 per the VoiceXML specification, a telephone.disconnected.hangup event 457 is also thrown. 459 5.3 Transfer 461 VoiceXML 1.0 supports two styles of transfer, bridged and blind. 463 5.3.1 Blind 465 When the interpreter context needs to perform a blind transfer, it 466 SHOULD generate a REFER [12] request. The REFER request is sent to 467 the caller. It contains a Refer-To header which contains the target 468 URL specified in the URI in the value of the "dest" attribute of the 469 transfer tag. If the transfer tag contains a connecttimeout 470 attribute, the URI in the Refer-To has an Expires header parameter 471 appended to it, containing the duration from the attribute. 473 For example, if the following transfer tag was encountered: 475 478 The REFER would look like: 480 REFER sip:caller@pc13.company.com 481 Via: SIP/2.0/UDP server3.vxmlservers.com 482 From: sip:dialog.vxml20@vxmlservers.com;tag=8aa6s 483 CSeq: 3487 REFER 484 Call-ID: 9a8s9809s@102.3.4.4 485 To: sip:caller@company.com;tag=99as7 486 Refer-To: sip:support@foo.com?Expires=10 487 Referred-By: sip:dialog.vxml20@vxmlservers.com 489 If the REFER is rejected, the interpreter context outputs a 490 network_busy as the outcome of the transfer attempt. Otherwise, the 491 interpreter context remains suspended until a NOTIFY is received. 493 At some point before the expiration, the interpreter context will 494 receive a NOTIFY request containing the final response received for 495 the triggered INVITE. If this response is a 2xx, the interpreter 496 context throws a telephone.disconnect.transfer, and sends a BYE 497 request to terminate the call. 499 If the final response was a non-2xx response, the transfer attempt 500 failed. If the final response was a 486, the outcome of the transfer 501 attempt is set to busy, and form processing continues. If the final 502 response was a 408, the outcome of the transfer attempt is set to 503 noanswer, and form processing continues. For any other response, the 504 outcome of the transfer attempt is set to network_busy, and form 505 processing continues. 507 5.3.2 Bridged 509 In a bridged transfer, the interpreter context resumes after the 510 transfer call completes. VoiceXML 1.0 also allows the script to 511 specify a grammar within the transfer tag, allowing it to listen in 512 for DTMF that meets that grammar. When a match is found, the transfer 513 is terminated and control returns to the interpreter. 515 This function requires that the dialog server act as a UAC, and make 516 the outbound call to the transferred party. The flow is shown in 517 Figure 2. The caller connects to the dialog server with messages 1-3. 518 RTP flows between the caller and the dialog server. When the transfer 519 tag is encountered, the dialog server sends an outbound INVITE (4) 520 The outbound INVITE contains the same SDP, SDP 1, offered by the 521 caller. If the final response (5) is a 200 OK, this contains SDP3. 522 The dialog server continues to receive media from the caller. This is 523 passed on to the transfer target, using SDP3. However, media from the 524 transfer target to the caller goes direct, bypassing the dialog 525 server. 527 If the final response to the INVITE was a non-2xx response, the 528 transfer attempt failed. If the final response was a 486, the outcome 529 of the transfer attempt is set to busy, and form processing 530 continues. If the final response was a 408, the outcome of the 531 transfer attempt is set to noanswer, and form processing continues. 532 For any other response, the outcome of the transfer attempt is set to 533 network_busy, and form processing continues. 535 The INVITE should not be left pending for more than the amount of 536 time in the connecttimeout parameter, if specified. After that amount 537 of time has passed, the INVITE request is cancelled, and form 538 processing continues. The outcome of the transfer is set to noanswer. 540 If the final response to the INVITE was a 2xx response, the transfer 541 attempt succeeded. In addition to passing on the media to the 542 transfer target, the interpreter passes the media received from the 543 caller through the grammar present within the transfer tag, if 544 present. If the grammar is matched, the interpreter context sends a 545 BYE to the transfer target. Processing continues within the 546 interpreter. 548 If the transfer target sends a BYE, a 200 OK is returned. The outcome 549 of the script is set to far_end_disconnect. Form interpretation 550 continues. If the caller sends a BYE, a 200 OK is returned. The 551 dialog server sends a BYE to the transfer target. A 552 |(1) INVITE SDP1 | | 553 |-------------------->| | 554 |(2) 200 SDP2 | | 555 |<--------------------| | 556 |(3) ACK | | 557 |-------------------->| | 558 |RTP | | 559 |<...................>| | 560 | |(4) INVITE SDP1 | 561 | |------------------->| 562 | |(5) 200 SDP3 | 563 | |<-------------------| 564 | |(6) ACK | 565 | |------------------->| 566 | RTP from caller | | 567 |....................>| RTP from caller | 568 | |...................>| 569 | RTP to caller | 570 |<.........................................| 571 | | | 572 | | | 574 Caller DS Transfer 575 target 577 Figure 2: Bridged Transfer flow 579 telephone.disconnect.hangup event is thrown, and form processing 580 continues to allow cleanup. 582 OPEN ISSUE: When would it even be possible for the transfer 583 outcome to be near_end_disconnect? Wouldn't this terminate 584 the script, so that there is no transfer outcome? 586 If the transfer target sends a REFER (ie., the caller is to be 587 transferred elsewhere), the interpreter context responds with a 200 588 OK. It creates a new REFER with the same Refer-To header (but its own 589 value for Referred-By), and sends it to the caller. Upon receiving a 590 200 OK to the REFER, the dialog server sends a NOTIFY to the transfer 591 target, informing it of a successful REFER completion to the new 592 target. If a BYE is received from the transfer target, the 593 interpreter sends a BYE to the caller as well, and throws a 594 telephone.disconnect.transfer event. 596 6 Additional Requirements 598 In addition to the above behaviors, we also recommend that several 599 optional SIP capabilities be implemented by dialog servers. This is 600 to support their intended use cases as components in the application 601 server component architecture [4]. The following list of requirements 602 includes these recommended features, in addition to summarizing the 603 ones scattered above: 605 1. The dialog server SHOULD support SIP over persistent TCP 606 and TLS connections, and SHOULD support a configurable 607 authorization listing of allowed Distinguished Names which 608 can connect. This is useful when authorization decisions 609 are outsourced to an application server, as described 610 above. 612 2. The dialog server SHOULD fully support RFC 1889 and RFC 613 1890. Of particular importance is RTCP. 615 3. The dialog server SHOULD support G.711 and RFC 2833. 617 4. The dialog server SHOULD support the UA requirements 618 outlined in the third party call control specification 619 [11]. This is important for building more complex 620 applications, a common usage for dialog servers. 622 5. The dialog server SHOULD support the SDP FID attribute 623 [10], and SHOULD use it to allow processing to occur over a 624 collection of alternate streams with the same FID group. 626 6. The dialog server SHOULD support the REFER method [12], 627 needed for the blind transfer tag. It SHOULD also allow 628 itself to be referrred as a normal UAS. 630 7. The dialog server SHOULD allow any HTTP URL to be placed in 631 the request-URI for specifying the script to execute. 633 7 Authors Addresses 635 Jonathan Rosenberg 636 dynamicsoft 637 72 Eagle Rock Avenue 638 First Floor 639 East Hanover, NJ 07936 640 email: jdrosen@dynamicsoft.com 642 Peter Mataga 643 dynamicsoft 644 72 Eagle Rock Avenue 645 First Floor 646 East Hanover, NJ 07936 647 email: pmataga@dynamicsoft.com 649 David Ladd 650 dynamicsoft 651 72 Eagle Rock Avenue 652 First Floor 653 East Hanover, NJ 07936 654 email: dladd@dynamicsoft.com 656 8 Bibliography 658 [1] VoiceXML Forum, "Voice extensible markup language (VoiceXML) 659 version 1.00," VoiceXML forum specification, VoiceXML Forum, Mar. 660 2000. 662 [2] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: 663 session initiation protocol," Request for Comments 2543, Internet 664 Engineering Task Force, Mar. 1999. 666 [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a 667 transport protocol for real-time applications," Request for Comments 668 1889, Internet Engineering Task Force, Jan. 1996. 670 [4] J. Rosenberg, P. Mataga, and H. Schulzrinne, "An application 671 server component architecture for SIP," Internet Draft, Internet 672 Engineering Task Force, Mar. 2001. Work in progress. 674 [5] B. Campbell and R. Sparks, "Control of service context using SIP 675 Request-URI," Request for Comments 3087, Internet Engineering Task 676 Force, Apr. 2001. 678 [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: 679 Session initiation protocol," Internet Draft, Internet Engineering 680 Task Force, Nov. 2000. Work in progress. 682 [7] H. Schulzrinne and J. Rosenberg, "SIP caller preferences and 683 callee capabilities," Internet Draft, Internet Engineering Task 684 Force, Nov. 2000. Work in progress. 686 [8] E. Zimmerer, J. Peterson, A. Vemuri, L. Ong, F. Audet, M. Watson, 687 and M.Zonoun, "MIME media types for ISUP and QSIG objects," Internet 688 Draft, Internet Engineering Task Force, Mar. 2001. Work in progress. 690 [9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. 691 Leach, and T. Berners-Lee, "Hypertext transfer protocol -- HTTP/1.1," 692 Request for Comments 2616, Internet Engineering Task Force, June 693 1999. 695 [10] G. Camarillo, J. Holler, and G. Eriksson, "The SDP fid 696 attribute," Internet Draft, Internet Engineering Task Force, Apr. 697 2001. Work in progress. 699 [11] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, 700 "Third party call control in SIP," Internet Draft, Internet 701 Engineering Task Force, Mar. 2001. Work in progress. 703 [12] R. Sparks, "SIP call control," Internet Draft, Internet 704 Engineering Task Force, Feb. 2001. Work in progress.