idnits 2.17.1 draft-camarillo-sip-deaf-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 25 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 17, 2003) is 7740 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 949 looks like a reference -- Missing reference section? '2' on line 954 looks like a reference -- Missing reference section? '3' on line 1004 looks like a reference -- Missing reference section? '4' on line 1007 looks like a reference -- Missing reference section? '5' on line 1011 looks like a reference -- Missing reference section? '6' on line 1015 looks like a reference -- Missing reference section? '7' on line 1020 looks like a reference -- Missing reference section? '8' on line 1024 looks like a reference -- Missing reference section? '9' on line 1027 looks like a reference -- Missing reference section? '10' on line 1031 looks like a reference -- Missing reference section? '11' on line 1036 looks like a reference Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIP WG 3 Internet Draft G. Camarillo 4 Ericsson 5 E. Burger 6 SnowShore Networks 7 H. Schulzrinne 8 Columbia University 9 A. van Wijk 10 Viataal 11 draft-camarillo-sip-deaf-02.txt 12 February 17, 2003 13 Expires: August, 2003 15 Transcoding Services Invocation in the Session Initiation Protocol 17 STATUS OF THIS MEMO 19 This document is an Internet-Draft and is in full conformance with 20 all provisions of Section 10 of RFC2026. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress". 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 35 To view the list Internet-Draft Shadow Directories, see 36 http://www.ietf.org/shadow.html. 38 Abstract 40 This document describes how to discover the need of transcoding 41 services in a session established with SIP and how to invoke those 42 transcoding services. Two models for transcoding services invocation 43 are introduced; the conference bridge model and the third party call 44 control model. Both models meet the requirements for SIP regarding 45 transcoding services invocation to support deaf, hard of hearing and 46 speech-impaired individuals. 48 Table of Contents 50 1 Introduction ........................................ 3 51 2 Discovery of the Need for Transcoding Services ...... 3 52 3 Transcoding Services Invocation ..................... 4 53 3.1 Terminology ......................................... 5 54 3.2 Conference Bridge Transcoding Model ................. 5 55 3.2.1 Caller's Invocation ................................. 6 56 3.2.2 Callee's Invocation ................................. 6 57 3.3 Third Party Call Control Transcoding Model .......... 8 58 3.3.1 Callee's Invocation ................................. 8 59 3.3.2 Caller's Invocation ................................. 14 60 3.3.3 Receiving the Original Stream ....................... 16 61 3.3.4 Transcoding Services in Parallel .................... 17 62 3.3.5 Transcoding Services in Serial ...................... 21 63 4 Security Considerations ............................. 21 64 5 TODO List ........................................... 22 65 6 Authors' Addresses .................................. 22 66 7 Bibliography ........................................ 22 68 1 Introduction 70 Two user agents involved in a SIP [1] dialog may find it impossible 71 to establish a media session due to a variety of incompatibilities. 72 Assuming that both user agents understand the same session 73 description format (e.g., SDP), incompatibilities can be found at the 74 user agent level and at the user level. At the user agent level, both 75 terminals may not support any common codec or may not support common 76 media types (e.g., a text-only terminal and an audio-only terminal). 77 At the user level, a deaf person will not be able to understand what 78 it is said over an audio stream. 80 In order to make communications possible in the presence of 81 incompatibilities, user agents need to introduce intermediaries that 82 provide transcoding services to a session. From the SIP point of 83 view, the introduction of a transcoder is done in the same way to 84 resolve both user level and user agent level incompatibilities. 85 Therefore, the invocation mechanisms described in this document are 86 generally applicable to any type of incompatibility related to how 87 the information that needs to be communicated is encoded. 89 This document does not describe media server discovery. That is an 90 orthogonal problem that one can address using user agent provisioning 91 or other methods. 93 All the examples provided in this document use the Session 94 Description Protocol (SDP) [2]. However, other session description 95 formats can be used with the same call flows. 97 The remainder of this document is organized as follows. Section 2 98 deals with the discovery of the need of transcoding services for a 99 particular session. Section 3.2 introduces the conference bridge 100 transcoding invocation model, and Section 3.3 introduces the third 101 party call control model. Both models meet the requirements regarding 102 transcoding services invocation in RFC3351 [3] to support deaf, hard 103 of hearing and speech-impaired individuals. 105 2 Discovery of the Need for Transcoding Services 107 Following the one-party consent model defined in RFC 3238 [4], 108 transcoding invocation is best performed by one of the end-points 109 involved in the communication. Following the same principle, one of 110 the end-points should be the one detecting that transcoding is needed 111 for a particular session. 113 In order to decide whether or not transcoding is needed, a user agent 114 needs to know the capabilities of the remote user agent. A user agent 115 acting as an offerer typically obtains this knowledge by downloading 116 a presence document that includes media capabilities (e.g., Bob is 117 available on a terminal that only supports audio) or by getting an 118 SDP description of media capabilities as defined in RFC 3264 [5]. 119 Presence documents are typically received in a NOTIFY request and SDP 120 media capabilities descriptions are typically received in a 200 (OK) 121 response to an OPTIONS request or in a 488 (Not Acceptable Here) 122 response to an INVITE. 124 A user agent client acting as an answerer typically gets an offer 125 that it cannot accept. The user agent can send back a media 126 capabilities description hoping that the offerer will invoke some 127 type of transcoding services or it can invoke transcoding services 128 itself. 130 It is recommended that an offerer does not invoke transcoding 131 services before making sure that the answerer does not support the 132 capabilities needed for the session. Making wrong assumptions about 133 the answerer's capabilities can lead to situations where two 134 transcoders are introduced (one by the offerer and one by the 135 answerer) in a session that would not need any transcoding services 136 at all. 138 An example of the situation above is a call between two GSM 139 phones (without using transcoding-free operation). Both 140 phones use a GSM codec, but the speech is converted from 141 GSM to PCM by the originating MSC and from PCM back to GSM 142 by the terminating MSC. 144 Note that transcoding services can be symmetric (e.g., speech-to-text 145 plus text-to-speech) or asymmetric (e.g., a one-way speech-to-text 146 transcoding for a hearing impaired user that can talk). 148 3 Transcoding Services Invocation 150 Once the need for transcoding for a particular session has been 151 identified as described in Section 2, one of the user agents needs to 152 invoke transcoding services. 154 Invoking transcoding services from a server (T) for a session between 155 two user agents (A and B) involves establishing two media sessions; 156 one between A and T and another between T and B. How to invoke T's 157 services (i.e., how to establish both A-T and T-B sessions) depends 158 on how we model the transcoding service. We have considered two 159 models for invoking a transcoding service. The first is to use a 160 (dial-in and/or dial-out) conference bridge that negotiates the 161 appropriate media parameters on each individual leg (i.e., A-T and 162 T-B). The second is to use third party call control [6], also 163 referred to as 3pcc, to invoke the transcoding service. Section 3.2 164 describes the conference bridge transcoding invocation model, and 165 Section 3.3 describes the third party call control model. 167 3.1 Terminology 169 All the figures in this document follow the naming convention below: 171 SDP A: A session description generated by A. It contains, among 172 other things, the transport address/es (IP address and port 173 number) where A wants to receive media for each particular 174 stream. 176 SDP B: A session description generated by B. It contains, among 177 other things, the transport address/es where B wants to 178 receive media for each particular stream. 180 SDP A+B: A session description that contains, among other 181 things, the transport address/es where A wants to receive 182 media and the transport address/es where B wants to receive 183 media. 185 SDP TA: A session description generated by T and intended for A. 186 It contains, among other things, the transport address/es 187 where T wants to receive media from A. 189 SDP TB: A session description generated by T and intended for B. 190 It contains, among other things, the transport address/es 191 where T wants to receive media from B. 193 SDP TA+TB: A session description generated by T that contains, 194 among other things, the transport address/es where T wants 195 to receive media from A and the transport address/es where 196 T wants to receive media from B. 198 3.2 Conference Bridge Transcoding Model 200 A conference server typically establishes an audio stream with each 201 participant of a conference. The server sends over each individual 202 stream the media received over the rest of the streams, typically 203 performing some mixing. The conference server may have to send audio 204 to different participants using different audio codecs. We can think 205 of a transcoding service as a two-party conference server that may 206 change not only the codec in use, but also the format of the media 207 (e.g., audio to text). Using this model, the whole A-T-B session is 208 established in the same way as a conference [7]. Typically, the user 209 agent invoking the transcoding service sets up the media policy at 210 the bridge (possibly using a media policy control protocol) and sends 211 an INVITE to join the conference. The media policy for the session 212 determines the type of transcoding the bridge will perform. 214 Once the conference is set up and the invoker has joined it, the 215 remote user has to be added as a participant as well. Users have two 216 options to join a conference. A user can dial-in (i.e., send an 217 INVITE request to the conference bridge) to join a conference, or the 218 conference bridge can dial-out (i.e., send an INVITE request to the 219 user) to add the user to the conference. Both dial-in and dial-out 220 approaches are discussed in the following sections. Section 3.2.1 221 deals with caller's invocation and Section 3.2.2 deals with callee's 222 invocation of the service. 224 3.2.1 Caller's Invocation 226 Once the caller has set up the conference bridge and joined the 227 conference by sending an INVITE to the bridge, it has two options to 228 add the callee to the session; sending a REFER [8] to the bridge 229 (that will instruct the bridge to dial-out) or sending a REFER to the 230 callee (that will instruct the callee to dial-in). 232 We recommend the first option (i.e., REFER sent to the bridge). The 233 bridge, upon reception of the REFER, generates an INVITE towards the 234 callee. The session description of the INVITE is generated according 235 to the media policy set up by the caller. Figure 1 shows this 236 scenario's message flow. 238 Note that if the caller chooses to send the REFER directly to the 239 callee (rather than to the bridge) the callee may generate an INVITE 240 with a session description that contained media types the bridge was 241 not configured to handle. In addition to that, some user agents may 242 not support REFER or may not be able to handle out-of-the-blue REFER 243 requests. 245 3.2.2 Callee's Invocation 247 Similarly to the situation above, once the callee has set up the 248 conference bridge and joined the conference by sending an INVITE to 249 the bridge, it has two options to add the caller to the session; 250 sending a REFER to the bridge (that will instruct the bridge to 251 dial-out) or sending a REFER to the caller (that will instruct the 252 caller to dial-in). 254 We recommend the first option (i.e., REFER sent to the bridge). The 255 bridge, upon reception of the REFER, generates an INVITE with a 256 Replaces header field [9] header field towards the callee. The 257 session description of the INVITE is generated according to the media 258 policy set up by the callee. Figure 2 shows this scenario's message 259 A T B 261 | | | 262 |------(1) INVITE SDP A----->| | 263 | | | 264 |<----(2) 200 OK SDP TA------| | 265 | | | 266 |----------(3) ACK---------->| | 267 | | | 268 | ************************** | | 269 |* Media Policy Set-up *| | 270 | ************************** | | 271 | | | 272 |---------(4) REFER--------->| | 273 | | | 274 |<--------(5) 200 OK---------| | 275 | | | 276 | |-----(6) INVITE SDP TB----->| 277 | | | 278 | |<-----(7) 200 OK SDP B------| 279 | | | 280 | |----------(8) ACK---------->| 281 | | | 282 |<--------(9) NOTIFY---------| | 283 | | | 284 |---------(10) 200 OK------->| | 285 | | | 286 | ************************** | ************************** | 287 |* MEDIA *|* MEDIA *| 288 | ************************** | ************************** | 289 | 291 Figure 1: Caller's invocation of a conference bridge 293 flow. 295 The flow in Figure 2 requires that the caller supports the Replaces 296 header field. If the caller does not support it, the callee can send 297 a 488 (Not Accpetable Here) for the original INVITE and attempt to 298 establish the session acting as a caller (i.e., sending a new 299 INVITE). 301 Sending the REFER to the caller (instead of to the bridge) introduces 302 a number of issues, since there is currently no way for the callee to 303 inform the caller that the newly established session will substitute 304 the original session. 306 3.3 Third Party Call Control Transcoding Model 308 If we model T as a transcoding service rather than a special case of 309 a conferencing server, a single INVITE transaction from the invoker 310 of the service provides T with both A's and B's session descriptions. 311 In order to provide in a single session description information about 312 media streams that belong to different entities (A and B), the 313 session description format in use should provide a means to define 314 how these streams should be mapped. For instance, in a session 315 description with two audio streams and one text stream, a possible 316 mapping would be the following; the information received over the 317 first audio stream should be sent over the text stream and over the 318 second audio stream, and the incoming text should be sent only over 319 the first audio stream. SDP [2] can convey this information using the 320 source and sink attributes [10]. 322 As stated previously, the invocation of a transcoding service 323 consists of establishing two sessions; A-T and T-B. How these 324 sessions are established depends on which party, the caller (A) or 325 the callee (B), invokes the transcoding services. However, we have 326 followed a general principle to design our 3pcc flows; a 200 (OK) 327 response from the transcoding service have to be received before 328 contacting the callee. This tries to ensure that the transcoding 329 service will be available when the callee accepts the session. 331 However, note that the transcoding service does not know the exact 332 type of transcoding it will be performing until the callee accepts 333 the session. Therefore, there are always changes of failing to 334 provide transcoding services after the callee has accepted the 335 session. A system with tough requirements could use preconditions to 336 avoid this situation. When preconditions are used, the callee is not 337 alerted until everything is ready for the session. 339 3.3.1 Callee's Invocation 341 In this scenario, B receives an INVITE from A, and B decides to 342 introduce T in the session. Figure 3 shows the call flow for this 343 scenario. 345 In Figure 3 A can both hear and speak and B is a deaf user with a 346 speech impairment. A proposes to establish a session that consists of 347 an audio stream (1). B wants to send and receive only text, so it 348 invokes a transcoding service T that will perform both speech-to-text 349 A T B 351 | | | 352 |-------------------(1) INVITE SDP A--------------------->| 353 | | | 354 | |<-----(2) INVITE SDP B------| 355 | | | 356 | |------(3) 200 OK SDP TB---->| 357 | | | 358 | | ************************** | 359 | |* Media Policy Set-up *| 360 | | ************************** | 361 | | | 362 | |<--------(5) REFER----------| 363 | | | 364 | |---------(6) 200 OK-------->| 365 | | | 366 |<-----(7) INVITE SDP TA-----| | 367 | | | 368 |------(8) 200 OK SDP A----->| | 369 | | | 370 |<----------(9) ACK----------| | 371 | | | 372 | |---------(10) NOTIFY------->| 373 | | | 374 | |<--------(11) 200 OK--------| 375 | | | 376 |---------------------(12) CANCEL------------------------>| 377 | | | 378 |<--------------------(13) 200 OK-------------------------| 379 | | | 380 |<-------------(14) 487 Request Terminated----------------| 381 | | | 382 |-----------------------(15) ACK------------------------->| 383 | | | 384 | ************************** | ************************** | 385 |* MEDIA *|* MEDIA *| 386 | ************************** | ************************** | 387 | | | 389 Figure 2: Conference bridge transcoding model 390 A T B 392 | | | 393 |--------------------(1) INVITE SDP A-------------------->| 394 | | | 395 | |<---(2) INVITE SDP A+B------| 396 | | | 397 | |---(3) 200 OK SDP TA+TB---->| 398 | | | 399 | |<---------(4) ACK-----------| 400 | | | 401 |<-------------------(5) 200 OK SDP TA--------------------| 402 | | | 403 |------------------------(6) ACK------------------------->| 404 | | | 405 | ************************** | ************************** | 406 |* MEDIA *|* MEDIA *| 407 | ************************** | ************************** | 408 | | | 410 Figure 3: Callee's invocation of a transcoding service 412 and text-to-speech conversions (2). The session descriptions of 413 Figure 3 are partially shown below. 415 (1) INVITE SDP A 417 m=audio 20000 RTP/AVP 0 418 c=IN IP4 A.domain.com 420 (2) INVITE SDP A+B 422 m=audio 20000 RTP/AVP 0 423 c=IN IP4 A.domain.com 424 a=source:1 425 a=sink:2 426 m=text 40000 RTP/AVP 96 427 c=IN IP4 B.domain.com 428 a=rtpmap:96 t140/1000 429 a=source:2 430 a=sink:1 432 (3) 200 OK SDP TA+TB 434 m=audio 30000 RTP/AVP 0 435 c=IN IP4 T.domain.com 436 a=source:1 437 a=sink:2 438 m=text 30002 RTP/AVP 96 439 c=IN IP4 T.domain.com 440 a=rtpmap:96 t140/1000 441 a=source:2 442 a=sink:1 444 (5) 200 OK SDP TA 446 m=audio 30000 RTP/AVP 0 447 c=IN IP4 T.domain.com 449 Four media streams (i.e., two bi-directional streams) have been 450 established at this point: 452 1. Audio from A to T.domain.com:30000 454 2. Text from T to B.domain.com:40000 456 3. Text from B to T.domain.com:30002 458 4. Audio from T to A.domain.com:20000 460 When either A or B decide to terminate the session, B will send a BYE 461 to T indicating that the session is over. 463 If the first INVITE (1) received by B is empty (no session 464 description), the call flow is slightly different. Figure 4 shows the 465 messages involved. 467 B may have different reasons for invoking T before knowing A's 468 session description. B may want to hide its capabilities, and 469 therefore it wants to return a session description with all the 470 codecs B supports plus all the codecs T supports. Or T may provide 471 recording services (besides transcoding), and B wants T to record the 472 conversation, regardless of whether or not transcoding is needed. 474 This scenario (Figure 4) is a bit more complex than the previous one. 476 A T B 478 | | | 479 |----------------------(1) INVITE------------------------>| 480 | | | 481 | |<-----(2) INVITE SDP B------| 482 | | | 483 | |---(3) 200 OK SDP TA+TB---->| 484 | | | 485 | |<---------(4) ACK-----------| 486 | | | 487 |<-------------------(5) 200 OK SDP TA--------------------| 488 | | | 489 |-----------------------(6) ACK SDP A-------------------->| 490 | | | 491 | |<-------(7) INVITE----------| 492 | | | 493 | |---(8) 200 OK SDP TA+TB---->| 494 | | | 495 |<-----------------(9) INVITE SDP TA----------------------| 496 | | | 497 |------------------(10) 200 OK SDP A--------------------->| 498 | | | 499 |<-----------------------(11) ACK-------------------------| 500 | | | 501 | |<-----(12) ACK SDP A+B------| 502 | | | 503 | ************************** | ************************** | 504 |* MEDIA *|* MEDIA *| 505 | ************************** | ************************** | 507 Figure 4: Callee's invocation after initial INVITE without SDP 509 In INVITE (2), B still does not have SDP A, so it cannot provide T 510 with that information. When B finally receives SDP A in (6), it has 511 to send it to T. B sends an empty INVITE to T (7) and gets a 200 OK 512 with SDP TA+TB (8). In general, this SDP TA+TB can be different than 513 the one that was sent in (3). That is why B needs to send the updated 514 SDP TA to A in (9). A then sends a possibly updated SDP A (10) and B 515 sends it to T in (12). However, if T happens to return the same SDP 516 TA+TB in (8) as in (3), B can skip messages (9), (10) and (11). 517 Therefore, implementors of transcoding services are encouraged to 518 return the same session description in (8) as in (3) in this type of 519 scenario. The session descriptions of this flow are shown below: 521 (2) INVITE SDP A+B 523 m=audio 20000 RTP/AVP 0 524 c=IN IP4 0.0.0.0 525 a=source:1 526 a=sink:2 527 m=text 40000 RTP/AVP 96 528 c=IN IP4 B.domain.com 529 a=rtpmap:96 t140/1000 530 a=source:2 531 a=sink:1 533 (3) 200 OK SDP TA+TB 535 m=audio 30000 RTP/AVP 0 536 c=IN IP4 T.domain.com 537 a=source:1 538 a=sink:2 539 m=text 30002 RTP/AVP 96 540 c=IN IP4 T.domain.com 541 a=rtpmap:96 t140/1000 542 a=source:2 543 a=sink:1 545 (5) 200 OK SDP TA 547 m=audio 30000 RTP/AVP 0 548 c=IN IP4 T.domain.com 550 (6) ACK SDP A 552 m=audio 20000 RTP/AVP 0 553 c=IN IP4 A.domain.com 555 (8) 200 OK SDP TA+TB 557 m=audio 30004 RTP/AVP 0 558 c=IN IP4 T.domain.com 559 a=source:1 560 a=sink:2 561 m=text 30006 RTP/AVP 96 562 c=IN IP4 T.domain.com 563 a=rtpmap:96 t140/1000 564 a=source:2 565 a=sink:1 567 (9) INVITE SDP TA 569 m=audio 30004 RTP/AVP 0 570 c=IN IP4 T.domain.com 572 (10) 200 OK SDP A 574 m=audio 20002 RTP/AVP 0 575 c=IN IP4 A.domain.com 577 (12) ACK SDP A+B 579 m=audio 20002 RTP/AVP 0 580 c=IN IP4 A.domain.com 581 a=source:1 582 a=sink:2 583 m=text 40000 RTP/AVP 96 584 c=IN IP4 B.domain.com 585 a=rtpmap:96 t140/1000 586 a=source:2 587 a=sink:1 589 Four media streams (i.e., two bi-directional streams) have been 590 established at this point: 592 1. Audio from A to T.domain.com:30004 594 2. Text from T to B.domain.com:40000 596 3. Text from B to T.domain.com:30006 598 4. Audio from T to A.domain.com:20002 600 3.3.2 Caller's Invocation 601 In this scenario, A wishes to establish a session with B using a 602 transcoding service. A uses 3pcc to set up the session between T and 603 B. The call flow we provide here is slightly different than the ones 604 in [6]. In [6], the controller establishes a session between two user 605 agents, being the user agents the ones deciding the characteristics 606 of the streams. Here, A wants to establish a session between T and B, 607 but A wants to decide how many and which types of streams are 608 established. That is why A sends its session description in the first 609 INVITE (1) to T, as opposed to the media-less initial INVITE 610 recommended by [6]. Figure 5 shows the call flow for this scenario. 612 A T B 614 | | | 615 |-------(1) INVITE SDP A---->| | 616 | | | 617 |<----(2) 200 OK SDP TA+TB---| | 618 | | | 619 |----------(3) ACK---------->| | 620 | | | 621 |--------------------(4) INVITE SDP TA------------------->| 622 | | | 623 |<--------------------(5) 200 OK SDP B--------------------| 624 | | | 625 |-------------------------(6) ACK------------------------>| 626 | | | 627 |--------(7) INVITE--------->| | 628 | | | 629 |<---(8) 200 OK SDP TA+TB --| | 630 | | | 631 |--------------------(9) INVITE SDP TA------------------->| 632 | | | 633 |<-------------------(10) 200 OK SDP B--------------------| 634 | | | 635 |-------------------------(11) ACK----------------------->| 636 | | | 637 |------(12) ACK SDP A+B----->| | 638 | | | 639 | ************************** | ************************** | 640 |* MEDIA *|* MEDIA *| 641 | ************************** | ************************** | 642 | | | 644 Figure 5: Caller's invocation of a transcoding service 645 We do not include the session descriptions of this flow, since they 646 are very similar to the ones in Figure 4. In this flow, if T returns 647 the same SDP TA+TB in (8) as in (2), messages (9), (10) and (11) can 648 be skipped. 650 3.3.3 Receiving the Original Stream 652 Sometimes, as pointed out in the requirements for SIP in support of 653 deaf, hard of hearing and speech-impaired individuals [3], a user 654 wants to receive both the original stream (e.g., audio) and the 655 transcoded stream (e.g., the output of the speech-to-text 656 conversion). There are various possible solutions for this problem. 657 One solution consists of using the SDP group attribute with FID 658 semantics [11]. FID allows requesting that a stream is sent to two 659 different transport addresses in parallel, as shown below: 661 a=group:FID 1 2 662 m=audio 20000 RTP/AVP 0 663 c=IN IP4 A.domain.com 664 a=mid:1 665 m=audio 30000 RTP/AVP 0 666 c=IN IP4 T.domain.com 667 a=mid:2 669 The problem with this solution is that the majority of the SIP user 670 agents do not support FID. And even if FID is supported, many user 671 agents do not support sending simultaneous copies of the same media 672 stream at the same time. In addition to that, both copies of the 673 stream need to use the same codec. 675 Therefore, we recommend that T (instead of a user agent) replicates 676 the media stream. The following session description requests T to 677 perform speech-to-text and text-to-speech conversions between the 678 first audio stream and the text stream. In addition, it requests T to 679 copy of the first audio stream to the second audio stream and send it 680 to A. 682 m=audio 40000 RTP/AVP 0 683 c=IN IP4 B.domain.com 684 a=source:1 685 a=sink:2 686 m=audio 20000 RTP/AVP 0 687 c=IN IP4 A.domain.com 688 a=recvonly 689 a=sink:1 690 m=text 20002 RTP/AVP 96 691 c=IN IP4 A.domain.com 692 a=rtpmap:96 t140/1000 693 a=source:2 694 a=sink:1 696 3.3.4 Transcoding Services in Parallel 698 Transcoding services sometimes consist of human relays (e.g., a 699 person performing speech-to-text and text-to-speech conversions for a 700 session). If the same person is involved in both conversions (i.e., 701 from A to B and from B to A), he or she has access to all the 702 conversation. In order to provide some degree of privacy, sometimes 703 two different persons are allocated to do the job (i.e., one person 704 handles A->B and the other B->A). This type of disposition is also 705 useful for automated transcoding services, where one machine converts 706 text to synthetic speech (text-to-speech) and a different machine 707 performs voice recognition (speech-to-text). 709 The scenario just described involves four different sessions; A-T1, 710 T1-B, B-T2 and T2-A. Figure 6 shows the call flow where A invokes T1 711 and T2. 713 (1) INVITE SDP AT1 715 m=text 20000 RTP/AVP 96 716 c=IN IP4 A.domain.com 717 a=rtpmap:96 t140/1000 718 a=sendonly 719 a=source:1 720 m=audio 20000 RTP/AVP 0 721 c=IN IP4 0.0.0.0 722 a=recvonly 723 a=sink:1 725 (2) INVITE SDP AT2 727 m=text 20002 RTP/AVP 96 728 c=IN IP4 A.domain.com 729 a=rtpmap:96 t140/1000 730 a=recvonly 731 a=sink:1 732 m=audio 20000 RTP/AVP 0 733 c=IN IP4 0.0.0.0 734 a=sendonly 735 a=source:1 737 (3) 200 OK SDP T1A+T1B 739 m=text 30000 RTP/AVP 96 740 c=IN IP4 T1.domain.com 741 a=rtpmap:96 t140/1000 742 a=recvonly 743 a=source:1 744 m=audio 30002 RTP/AVP 0 745 c=IN IP4 T1.domain.com 746 a=sendonly 747 a=sink:1 749 (5) 200 OK SDP T2A+T2B 751 m=text 40000 RTP/AVP 96 752 c=IN IP4 T2.domain.com 753 a=rtpmap:96 t140/1000 754 a=sendonly 755 a=sink:1 756 m=audio 40002 RTP/AVP 0 757 c=IN IP4 T2.domain.com 758 a=recvonly 759 a=source:1 761 (7) INVITE SDP T1B+T2B 763 m=audio 30002 RTP/AVP 0 764 c=IN IP4 T1.domain.com 765 a=sendonly 766 m=audio 40002 RTP/AVP 0 767 c=IN IP4 T2.domain.com 768 a=recvonly 770 (8) 200 OK SDP BT1+BT2 772 m=audio 50000 RTP/AVP 0 773 c=IN IP4 B.domain.com 775 A T1 T2 B 777 | | | | 778 |----(1) INVITE SDP AT1--->| | | 779 | | | | 780 |----------------(2) INVITE SDP AT2-------------->| | 781 | | | | 782 |<-(3) 200 OK SDP T1A+T1B--| | | 783 | | | | 784 |---------(4) ACK--------->| | | 785 | | | | 786 |<---------------(5) 200 OK SDP T2A+T2B-----------| | 787 | | | | 788 |----------------------(6) ACK------------------->| | 789 | | | | 790 |-----------------------(7) INVITE SDP T1B+T2B----------------->| 791 | | | | 792 |<----------------------(8) 200 OK SDP BT1+BT2------------------| 793 | | | | 794 |------(9) INVITE--------->| | | 795 | | | | 796 |-------------------(10) INVITE------------------>| | 797 | | | | 798 |<-(11) 200 OK SDP T1A+T1B-| | | 799 | | | | 800 |<------------(12) 200 OK SDP T2A+T2B-------------| | 801 | | | | 802 |------------------(13) INVITE SDP T1B+T2B--------------------->| 803 | | | | 804 |<-----------------(14) 200 OK SDP BT1+BT2----------------------| 805 | | | | 806 |--------------------------(15) ACK---------------------------->| 807 | | | | 808 |---(16) ACK SDP AT1+BT1-->| | | 809 | | | | 810 |------------(17) ACK SDP AT2+BT2---------------->| | 811 | | | | 812 | ************************ | ********************************** | 813 |* MEDIA *|* MEDIA *| 814 | ************************ | ********************************** | 815 | | | | 816 | *********************************************** *********** 817 |* MEDIA *|* MEDIA *| 818 | *********************************************** | *********** | 819 | | | | 821 Figure 6: Transcoding services in parallel 822 a=recvonly 823 m=audio 50002 RTP/AVP 0 824 c=IN IP4 B.domain.com 825 a=sendonly 827 (11) 200 OK SDP T1A+T1B 829 m=text 30000 RTP/AVP 96 830 c=IN IP4 T1.domain.com 831 a=rtpmap:96 t140/1000 832 a=recvonly 833 a=source:1 834 m=audio 30002 RTP/AVP 0 835 c=IN IP4 T1.domain.com 836 a=sendonly 837 a=sink:1 839 (12) 200 OK SDP T2A+T2B 841 m=text 40000 RTP/AVP 96 842 c=IN IP4 T2.domain.com 843 a=rtpmap:96 t140/1000 844 a=sendonly 845 a=sink:1 846 m=audio 40002 RTP/AVP 0 847 c=IN IP4 T2.domain.com 848 a=recvonly 849 a=source:1 851 Since T1 have returned the same SDP in (11) as in (3) and T2 has 852 returned the same SDP in (12) as in (5), messages (13), (14) and (15) 853 can be skipped. 855 (16) ACK SDP AT1+BT1 857 m=text 20000 RTP/AVP 96 858 c=IN IP4 A.domain.com 859 a=rtpmap:96 t140/1000 860 a=sendonly 861 a=source:1 862 m=audio 50000 RTP/AVP 0 863 c=IN IP4 B.domain.com 864 a=recvonly 865 a=sink:1 867 (17) ACK SDP AT2+BT2 869 m=text 20002 RTP/AVP 96 870 c=IN IP4 A.domain.com 871 a=rtpmap:96 t140/1000 872 a=recvonly 873 a=sink:1 874 m=audio 50002 RTP/AVP 0 875 c=IN IP4 B.domain.com 876 a=sendonly 877 a=source:1 879 Four media streams have been established at this point: 881 1. Text from A to T1.domain.com:30000 883 2. Audio from T1 to B.domain.com:50000 885 3. Audio from B to T2.domain.com:40002 887 4. Text from T2 to A.domain.com:20002 889 Note that B, the user agent server, needs to support two media 890 streams; one sendonly and the other recvonly. At present, some user 891 agents, although they support a single sendrecv media stream, they do 892 not support a different media line per direction. Implementers are 893 encouraged to build support for this feature. 895 3.3.5 Transcoding Services in Serial 897 In a distributed environment, a complex transcoding service (e.g., 898 English text to Spanish speech) is often provided by several servers. 899 For example, one server performs English text to Spanish text 900 translation, and its output is feed into a server that performs 901 text-to-speech conversion. The flow in Figure 7 shows how A invokes 902 T1 and T2. 904 4 Security Considerations 906 This document describes how to use the REFER method and third party 907 call control to invoke transcoding services. It does not introduce 908 new security considerations besides the ones discussed in [8] and 909 [6]. 911 5 TODO List 913 We need to see whether or not it is possible to use the media policy 914 work in the 3pcc model as well (instead of source/sink). 916 6 Authors' Addresses 918 Gonzalo Camarillo 919 Ericsson 920 Advanced Signalling Research Lab. 921 FIN-02420 Jorvas 922 Finland 923 electronic mail: Gonzalo.Camarillo@ericsson.com 925 Eric W. Burger 926 SnowShore Networks, Inc. 927 Chelmsford, MA 928 USA 929 electronic mail: eburger@snowshore.com 931 Henning Schulzrinne 932 Dept. of Computer Science 933 Columbia University 1214 Amsterdam Avenue, MC 0401 934 New York, NY 10027 935 USA 936 electronic mail: schulzrinne@cs.columbia.edu 938 Arnoud van Wijk 939 Viataal 940 Research & Development 941 Afdeling RDS 942 Theerestraat 42 943 5271 GD Sint-Michielsgestel 944 The Netherlands 945 electronic mail: a.vwijk@viataal.nl 947 7 Bibliography 949 [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J. 950 Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session 951 initiation protocol," RFC 3261, Internet Engineering Task Force, June 952 2002. 954 [2] M. Handley and V. Jacobson, "SDP: session description protocol," 956 A T1 T2 B 958 | | | | 959 |----(1) INVITE SDP A-----> | | | 960 | | | | 961 |<-(2) 200 OK SDP T1A+T1T2- | | | 962 | | | | 963 |----------(3) ACK--------> | | | 964 | | | | 965 |-----------(4) INVITE SDP T1T2------------------>| | 966 | | | | 967 |<-----------(5) 200 OK SDP T2T1+T2B--------------| | 968 | | | | 969 |---------------------(6) ACK-------------------->| | 970 | | | | 971 |---------------------------(7) INVITE SDP T2B----------------->| 972 | | | | 973 |<--------------------------(8) 200 OK SDP B--------------------| 974 | | | | 975 |--------------------------------(9) ACK----------------------->| 976 | | | | 977 |---(10) INVITE-----------> | | | 978 | | | | 979 |------------------(11) INVITE------------------->| | 980 | | | | 981 |<-(12) 200 OK SDP T1A+T1T2-| | | 982 | | | | 983 |<-------------(13) 200 OK SDP T2T1+T2B-----------| | 984 | | | | 985 |---(14) ACK SDP T1T2+B---> | | | 986 | | | | 987 |-----------------------(15) INVITE SDP T2B-------------------->| 988 | | | | 989 |<----------------------(16) 200 OK SDP B-----------------------| 990 | | | | 991 |----------------(17) ACK SDP T1T2+B------------->| | 992 | | | | 993 |----------------------------(18) ACK-------------------------->| 994 | | | | 995 | ************************* | ******************* *********** | 996 |* MEDIA *|* MEDIA *|* MEDIA *| 997 | ************************* | ******************* | *********** | 998 | | | | 1000 Figure 7: Transcoding services in serial 1002 RFC 2327, Internet Engineering Task Force, Apr. 1998. 1004 [3] N. Charlton, M. Gasson, G. Gybels, M. Spanner, and A. van Wijk, 1005 RFC 3351, Internet Engineering Task Force, Aug. 2002. 1007 [4] S. Floyd and L. Daigle, "IAB architectural and policy 1008 considerations for open pluggable edge services," RFC 3238, Internet 1009 Engineering Task Force, Jan. 2002. 1011 [5] J. Rosenberg and H. Schulzrinne, "An offer/answer model with 1012 session description protocol (SDP)," RFC 3264, Internet Engineering 1013 Task Force, June 2002. 1015 [6] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo, 1016 "Best current practices for third party call control in the session 1017 initiation protocol," internet draft, Internet Engineering Task 1018 Force, June 2002. Work in progress. 1020 [7] J. Rosenberg, "A framework for conferencing with the session 1021 initiation protocol," internet draft, Internet Engineering Task 1022 Force, Nov. 2002. Work in progress. 1024 [8] R. Sparks, "The SIP refer method," internet draft, Internet 1025 Engineering Task Force, Dec. 2002. Work in progress. 1027 [9] B. Biggs, R. Dean, and R. Mahy, "The session inititation protocol 1028 (SIP)," internet draft, Internet Engineering Task Force, May 2002. 1029 Work in progress. 1031 [10] G. Camarillo, H. Schulzrinne, and E. Burger, "The source and 1032 sink attributes for the session description protocol," internet 1033 draft, Internet Engineering Task Force, Sept. 2002. Work in 1034 progress. 1036 [11] G. Camarillo, J. Holler, G. Eriksson, and H. Schulzrinne, 1037 "Grouping of m lines in SDP," internet draft, Internet Engineering 1038 Task Force, Feb. 2002. Work in progress. 1040 The IETF takes no position regarding the validity or scope of any 1041 intellectual property or other rights that might be claimed to 1042 pertain to the implementation or use of the technology described in 1043 this document or the extent to which any license under such rights 1044 might or might not be available; neither does it represent that it 1045 has made any effort to identify any such rights. Information on the 1046 IETF's procedures with respect to rights in standards-track and 1047 standards-related documentation can be found in BCP-11. Copies of 1048 claims of rights made available for publication and any assurances of 1049 licenses to be made available, or the result of an attempt made to 1050 obtain a general license or permission for the use of such 1051 proprietary rights by implementors or users of this specification can 1052 be obtained from the IETF Secretariat. 1054 The IETF invites any interested party to bring to its attention any 1055 copyrights, patents or patent applications, or other proprietary 1056 rights which may cover technology that may be required to practice 1057 this standard. Please address the information to the IETF Executive 1058 Director. 1060 Full Copyright Statement 1062 Copyright (c) The Internet Society (2003). All Rights Reserved. 1064 This document and translations of it may be copied and furnished to 1065 others, and derivative works that comment on or otherwise explain it 1066 or assist in its implementation may be prepared, copied, published 1067 and distributed, in whole or in part, without restriction of any 1068 kind, provided that the above copyright notice and this paragraph are 1069 included on all such copies and derivative works. However, this 1070 document itself may not be modified in any way, such as by removing 1071 the copyright notice or references to the Internet Society or other 1072 Internet organizations, except as needed for the purpose of 1073 developing Internet standards in which case the procedures for 1074 copyrights defined in the Internet Standards process must be 1075 followed, or as required to translate it into languages other than 1076 English. 1078 The limited permissions granted above are perpetual and will not be 1079 revoked by the Internet Society or its successors or assigns. 1081 This document and the information contained herein is provided on an 1082 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1083 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1084 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1085 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1086 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.