idnits 2.17.1 draft-camarillo-sipping-early-media-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 13 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 215: '... OPEN ISSUE: SHOULD THIS ATTRIBUTE B...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 29, 2002) is 7818 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 404 looks like a reference -- Missing reference section? '2' on line 409 looks like a reference -- Missing reference section? '3' on line 413 looks like a reference -- Missing reference section? '4' on line 416 looks like a reference -- Missing reference section? '5' on line 420 looks like a reference Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIP WG 3 Internet Draft G. Camarillo 4 Ericsson 5 H. Schulzrinne 6 Columbia University 7 draft-camarillo-sipping-early-media-00.txt 8 November 29, 2002 9 Expires: May, 2003 11 Early Media and Ringback Tone Generation in the Session Initiation Protocol 13 STATUS OF THIS MEMO 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress". 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 To view the list Internet-Draft Shadow Directories, see 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document describes how to manage early media in SIP. It also 37 describes which inputs need to be taken into consideration to define 38 local policies for ringback tone generation. 40 Table of Contents 42 1 Introduction ........................................ 3 43 2 Early Media in SIP .................................. 3 44 2.1 Status Codes ........................................ 4 45 2.2 Direction Attributes and Media Clipping ............. 4 46 2.3 Intention to Send Media ............................. 5 47 2.4 The SDP Intention Parameter ......................... 6 48 2.5 Applicability of the SDP Intention Parameter ........ 6 49 3 Forking ............................................. 6 50 4 Ringback Tone Generation ............................ 7 51 5 Interactions with Preconditions ..................... 8 52 6 Examples ............................................ 8 53 6.1 Remotely Generated Ringback Tone .................... 8 54 6.2 Locally Generated Ringback Tone ..................... 9 55 7 Acknowledgments ..................................... 9 56 8 Authors' Addresses .................................. 9 57 9 Bibliography ........................................ 10 59 1 Introduction 61 Early media refers to media (e.g., audio and/or video) that is 62 exchanged before a particular session is accepted by the called user. 63 Early media within a particular SIP dialog takes place from the 64 moment the initial INVITE is sent until the UAS generates a final 65 response. Early media can be unidirectional or bi-directional and can 66 be generated by the caller or/and the callee. Typical examples of 67 early media generated by the callee are ringback tone and 68 announcements (e.g., queuing status). Early media generated by the 69 caller typically consist of voice commands or DTMF tones to drive 70 IVRs. 72 The basic SIP spec [1] supports very simple early media, but UAs that 73 implement fully-featured early media need to support the PRACK [2] 74 and the UPDATE [3] methods. 76 The remainder of this document is organized as follows. Section 2 77 describes early media establishment in SIP and Section 4 describes 78 ringback tone generation. Section 5 analyzes interactions between 79 early media, ringback tone generation and preconditions and Section 6 80 provides examples of common scenarios that involve the usage of the 81 mechanisms described in Sections 2 and 4. 83 2 Early Media in SIP 85 SIP [1] uses the offer/answer model [4] to negotiate session 86 parameters. One of the user agents - the offerer - prepares a session 87 description that is called the offer. The other user agent - the 88 answerer - responds with another session description called the 89 answer. This two-way handshake allows both user agents to agree upon 90 the session parameters to be used to exchange media. 92 The idea behind the offer/answer model is to decouple the 93 offer/answer exchange from the mechanism used to transport the 94 session descriptions. For example, the offer can be sent in an INVITE 95 request and the answer can arrive in the 200 (OK) response for that 96 INVITE. Or, alternatively, the offer can be sent in the 200 (OK) for 97 an empty INVITE and the answer be sent in the ACK. When reliable 98 provisional responses [2] and/or UPDATE requests [3] are used, there 99 are many more possible ways to exchange offers and answers. 101 The offer/answer model is not even coupled to SIP. Other transport 102 mechanisms such as email attachments or instant messages can be used 103 to perform an offer/answer exchange. 105 This decoupling between the offer/answer model and the particular 106 messages used for a particular offer/answer exchange implies that the 107 negotiation of media parameters is not affected by the status of the 108 session. If an INVITE contains an offer, it does not matter that the 109 answer is received in a 183 (Session Progress), a 180 (Ringing) or a 110 200 (OK) response. The resulting media session will be the same in 111 the three scenarios. 113 Note that in the past, some people wrongly believed that a 114 UAC receiving a particular answer had to set up different 115 early media sessions if the answer was received in a 180 116 response (all the media streams were "magically" considered 117 inactive) or in a 183 response (the media streams were 118 established following the normal offer/answer model). 120 2.1 Status Codes 122 As a consequence of the previously mentioned decoupling, the status 123 code of a particular 1xx or 2xx SIP response is independent of the 124 offer/answer model. For example, if a UAS is alerting the user, it 125 will send a 180 (Ringing) response, regardless of the presence (or 126 absence) of early media. Early media is driven by the offer/answer 127 model, NOT by the status codes. 129 2.2 Direction Attributes and Media Clipping 131 The direction attribute (i.e., sendrecv, sendonly, recvonly or 132 inactive) for a particular stream contains the status of the media 133 tools handling that stream at the end-points. Therefore, the 134 direction attribute indicates whether or not the media tools at the 135 end-point are ready to receive/send media over a particular media 136 stream. 138 The problem is that the offer/answer model does not distinguish 139 between a sender that does not intend to send media and a receiver 140 that does not accept incoming media. This distinction is useful to 141 avoid media clipping in certain situations. We have the following 142 alternatives for a particular direction of a stream: 144 1. Sender intends to send; receiver accepts media 146 2. Sender intends to send; receiver does NOT accept media 148 3. Sender does NOT intend to send; receiver accepts media 150 4. Sender does NOT intend to send; receiver does NOT accept 151 media 153 We have to analyze in which of these 4 scenarios there is a chance of 154 having media clipping when the media resumes being sent over the 155 stream. If is obvious that in scenario 1 there is already media 156 flowing from sender to receiver, so we do not need to analyze it. 158 If in scenario 2 the receiver decides to start accepting media, it 159 will configure his media tool so that it is ready to receive media, 160 and it will send an offer to the sender indicating so. Since the 161 receiver configures his media tool before sending the offer, there is 162 no media clipping. 164 In scenario 3, if the sender decides to start sending media, it will 165 have to send an offer to the receiver indicating so. However, since 166 SIP signalling typically traverses a different path than the media 167 packets, the first media packets may arrive to the receiver before 168 the offer. This is not a problem, since the receiver was accepting 169 media anyway. There is no media clipping. 171 In scenario 4, if the sender decides to start sending media, it will 172 have to send an offer to the receiver indicating so. However, the 173 sender cannot start sending media to the receiver until it gets the 174 answer back. Otherwise, all the media would be discarded by the 175 receiver, since it was not accepting any media at that point in time. 176 This leads to media clipping. The sender will not typically be able 177 to send the first "hello" pronounced by the user. 179 The problem with the offer/answer model is that it can establish 180 scenario 4, but it cannot establish scenario 3. Therefore, when a 181 sender that was quiet resumes sending media, there can be media 182 clipping. The solution to this problem consists of using the SDP 183 direction attribute to indicate media acceptance by the receiver and 184 a new SDP parameter to indicate intention to send media by the 185 sender. Such a parameter is defined in Section 2.4. 187 2.3 Intention to Send Media 189 To resolve the problem above, some proposed keeping the sender from 190 signalling that it did not intend to send media. That would transform 191 scenario 3 into scenario 1, eliminating media clipping. However, 192 knowing whether or not the sender intends to send media may be 193 important to drive GUIs in certain situations, as shown in the 194 following example. 196 Two users, A and B, are involved in a videoconference using a 197 sendrecv video stream. B wants to have a moment of privacy, so he 198 switches off his camera for a minute. B issues an offer indicating 199 that it does not intend to send video. However, the offer indicates 200 that A and B should still keep their video tools configured as 201 sendrecv, so that when B switches on his camera again, they can 202 perform a "soft" media resume (i.e., without media clipping). 204 B's intention of not sending video is now used to drive A's GUI 205 (e.g., minimizing the window where A was watching B's face). If B's 206 intention had not been signalled, A's GUI would have probably 207 continued showing the last video frame that was received over the 208 stream. A would not have been able to distinguish this situation from 209 a massive packet loss in the network (RTCP timers are usually too 210 long for this purpose). Therefore, signalling the intention of 211 sending or not sending media is important to drive GUIs. 213 2.4 The SDP Intention Parameter 215 OPEN ISSUE: SHOULD THIS ATTRIBUTE BE DEFINED IN AN MMUSIC DRAFT OR IS 216 IT OK TO DEFINE IT IN THIS SECTION? IT PROBABLY BELONGS TO MMUSIC, 217 BECAUSE IT IS NOT EARLY MEDIA SPECIFIC. 219 A new "intention" SDP media level attribute is defined. It is used to 220 indicate whether or not the entity generating the session description 221 intends to send media at a particular point in time over a particular 222 stream. Its formatting in SDP is described by the following BNF: 224 intention-attribute = "a=intention:" intention-value 225 intention-value = "send" | "nosend" 227 2.5 Applicability of the SDP Intention Parameter 229 The SDP intention parameter should be used by systems that want to 230 provide information to drive GUIs and that want to avoid media 231 clipping. Systems whose requirements regarding media clipping are not 232 strict can signal scenario 4 instead. Systems that do not wish to 233 provide information to drive GUIs can signal scenario 1 instead. 235 3 Forking 237 If an INVITE forks, the UAC can receive multiple provisional 238 responses that establish different early media streams. It is up to 239 the UAC's local policy how to render the media received over those 240 streams. When a UAC has to deal with several video streams, it seems 241 natural, if the GUI supports it, to use a different window to show 242 each individual stream. However, a UAC receiving several audio 243 streams will probably have to choose one to be played, because mixing 244 them all may not be useful. 246 Note that if the INVITE that forked contained an offer, all the UASs 247 will send their early media to the same transport address of the UAC. 249 The UAC should be ready to temporarily demultiplex them based on the 250 RTP SSRCs and send a new offer within the early dialog as soon as the 251 offer/answer rules allow it. 253 4 Ringback Tone Generation 255 In the PSTN, telephone switches typically play ringback tones to the 256 caller to indicate that the called user is being alerted. When, where 257 and how these ringback tones are generated has been standardized 258 (i.e., the local exchange of the callee generates a standardized 259 ringback tone while the callee is being alterted). A standardized 260 approach to provide this type of feedback for the user makes sense in 261 a homogeneous environment such as the PSTN, where all the terminals 262 have a similar user interface. 264 This homogeneity is not found among SIP user agents. SIP user agents 265 have different capabilities, different user interfaces and may be 266 used to establish sessions that do not involve audio at all. Because 267 of this, the way a SIP UA provides the user with information about 268 the progress of session establishment is a matter of local policy. 269 This local policy in a given SIP UA has two main inputs; the status 270 of the INVITE transaction and the availability of incoming early 271 media. 273 The status of the INVITE transaction is given by the status code of 274 the latest response (e.g., 180 Ringing). The availability of incoming 275 early media is given by the offer/answer model and its direction 276 attributes and the intention attribute. 278 For example, a POTS-like SIP UA could implement the following local 279 policy: 281 1. If there is at least one audio stream in sendrecv or 282 recvonly mode, play out the audio received over that 283 stream. 285 2. If the callee is being alerted and there are no audio 286 streams in sendrecv or recvonly mode, play a locally- 287 generated ringback tone to the user. 289 And a SIP UA with a graphical user interface could follow the local 290 policy below: 292 1. If there are audio or/and video streams in sendrecv or 293 recvonly mode, play out whatever it is received over those 294 streams. 296 2. If the callee is being alerted, display the message "The 297 callee is being alerted" for the user. 299 3. If a provisional response other than alerting is received, 300 display its reason phrase to the user (e.g., Trying, Call 301 is Being Forwarded, Queued) 303 Note that while it is not desirable to standardize a common local 304 policy to be followed by every SIP UA, a particular subset of more or 305 less homogeneous SIP UAs could use the same local policy by 306 convention. Examples of such subsets of SIP UAs may be "all the 307 PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the 308 particular common policy that such groups of SIP devices may use is 309 outside the scope of this document. 311 5 Interactions with Preconditions 313 RFC 3312 [5] defines a framework for preconditions for SIP. The 314 negotiation of preconditions does not interact with the negotiation 315 or early media. Every precondition has a direction attribute (e.g., 316 QoS in the sendonly direction) that may differ from the direction 317 attribute of the media stream. Since the presence of early media is 318 signalled with the latter attribute, there are no interactions 319 between preconditions and early media. 321 For example, a UA can request sendrecv QoS for a media stream that 322 will be in recvonly mode for early media and will be set to sendrecv 323 when the session is accepted. 325 6 Examples 327 The following examples assume SIP UAs following the local policy 328 below: 330 1. If there is at least one audio stream in sendrecv or 331 recvonly mode, play out the audio received over that 332 stream. 334 2. If the callee is being alerted and there are no audio 335 streams in sendrecv or recvonly mode, play a locally- 336 generated ringback tone to the user. 338 6.1 Remotely Generated Ringback Tone 340 The UAS of Figure 1 receives an initial INVITE (1) with an offer that 341 contains an audio stream in sendrecv mode. The UAS will play an 342 announcement, but it will not accept incoming (early) media until 343 user B accepts the session. The UAS sends a 183 (Session Progress) 344 response with an answer that sets the audio stream to sendonly (2). 346 After playing the announcement, the UAS starts alerting user B (5). 347 The UAS will be generating a special ringback tone on the media 348 stream, but since the audio stream was already in sendonly mode, 349 there is no need of a new offer/answer exchange. 351 When user B accepts the session the UAS sends a 200 (OK) response (8) 352 for the INVITE and an UPDATE (9) to set the audio stream to sendrecv 353 in parallel. The UAC sends the ACK (10) and the 200 (OK) response 354 (11) for the UPDATE in parallel. 356 Since the audio stream is in sendrecv or recvonly mode (from the 357 UAC's prespective) all the time, the UAC applies the first bullet of 358 its local policy. It plays out whatever it is received over the audio 359 stream (i.e., first the announcement and then the remotely generated 360 ringback tone). 362 6.2 Locally Generated Ringback Tone 364 The UAS of Figure 2 receives an initial INVITE (1) with an offer that 365 contains an audio stream in sendrecv mode. The UAS will play an 366 announcement, but it will not accept incoming (early) media until 367 user B accepts the session. The UAS sends a 183 (Session Progress) 368 response with an answer that sets the audio stream to sendonly (2). 369 After playing the announcement, the UAS starts alerting user B, but 370 it will not be generating any ringback tone on the media stream. 371 Therefore, it sends a 180 (Ringing) response (5) and sets the audio 372 stream to inactive with an UPDATE (8). At this point in time, the UAC 373 uses the second bullet of its local policy and generates ringback 374 tone locally. 376 When user B accepts the session the UAS sends a 200 (OK) response 377 (10) for the INVITE and an UPDATE (11) to set the audio stream to 378 sendrecv in parallel. The UAC sends the ACK (12) and the 200 (OK) 379 response (13) for the UPDATE in parallel. 381 7 Acknowledgments 383 Paul Kyzivat, Christer Holmberg, Jon Peterson and William Marshall 384 provided useful comments and suggestions. 386 8 Authors' Addresses 388 Gonzalo Camarillo 389 Ericsson 390 Advanced Signalling Research Lab. 391 FIN-02420 Jorvas 392 Finland 393 electronic mail: Gonzalo.Camarillo@ericsson.com 395 Henning Schulzrinne 396 Dept. of Computer Science 397 Columbia University 1214 Amsterdam Avenue, MC 0401 398 New York, NY 10027 399 USA 400 electronic mail: schulzrinne@cs.columbia.edu 402 9 Bibliography 404 [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. 405 Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session 406 initiation protocol," RFC 3261, Internet Engineering Task Force, June 407 2002. 409 [2] J. Rosenberg and H. Schulzrinne, "Reliability of provisional 410 responses in session initiation protocol (SIP)," RFC 3262, Internet 411 Engineering Task Force, June 2002. 413 [3] J. Rosenberg, "The session initiation protocol (SIP) UPDATE 414 method," RFC 3311, Internet Engineering Task Force, Oct. 2002. 416 [4] J. Rosenberg and H. Schulzrinne, "An offer/answer model with 417 session description protocol (SDP)," RFC 3264, Internet Engineering 418 Task Force, June 2002. 420 [5] "Integration of resource management and session initiation 421 protocol (SIP)," RFC 3312, Internet Engineering Task Force, Oct. 422 2002. 424 Full Copyright Statement 426 Copyright (c) The Internet Society (2002). All Rights Reserved. 428 This document and translations of it may be copied and furnished to 429 others, and derivative works that comment on or otherwise explain it 430 or assist in its implementation may be prepared, copied, published 431 and distributed, in whole or in part, without restriction of any 432 kind, provided that the above copyright notice and this paragraph are 433 included on all such copies and derivative works. However, this 434 document itself may not be modified in any way, such as by removing 435 the copyright notice or references to the Internet Society or other 436 Internet organizations, except as needed for the purpose of 437 developing Internet standards in which case the procedures for 438 copyrights defined in the Internet Standards process must be 439 A B 441 | | 442 |---------------(1) INVITE -------------->| 443 | a=sendrecv | 444 |<------(2) 183 Session Progress-------------| 445 | a=sendonly | 446 |-----------------(3) PRACK----------------->| 447 | | 448 |<-----------(4) 200 OK (PRACK)--------------| 449 | * | 450 | ****************************************** | 451 |* User B will be with you shortly * | 452 | ****************************************** | 453 | * | 454 |<------------(5) 180 Ringing----------------| 455 | | 456 |-----------------(6) PRACK----------------->| 457 | | 458 |<-----------(7) 200 OK (PRACK)--------------| 459 | * | 460 | ****************************************** | 461 |* Ringback tone * | 462 | ****************************************** | 463 | * | 464 |<-----------(8) 200 OK (INVITE)-------------| 465 | | 466 |<--------------(9) UPDATE ---------------| 467 | a=sendrecv | 468 | | 469 |-----------------(10) ACK------------------>| 470 | | 471 |------------(11) 200 OK (UPDATE)----------->| 472 | a=sendrecv | 473 | * * | 474 | ****************************************** | 475 |* Bi-directional conversation *| 476 | ****************************************** | 477 | * * | 478 | | 480 Figure 1: Remotely generated ringback tone 481 A B 483 | | 484 |---------------(1) INVITE -------------->| 485 | a=sendrecv | 486 |<------(2) 183 Session Progress-------------| 487 | a=sendonly | 488 |-----------------(3) PRACK----------------->| 489 | | 490 |<-----------(4) 200 OK (PRACK)--------------| 491 | * | 492 | ****************************************** | 493 |* User B will be with you shortly * | 494 | ****************************************** | 495 | * | 496 |<------------(5) 180 Ringing----------------| 497 | a=inactive | 498 |<--------------(6) UPDATE ---------------| 499 | a=inactive | 500 | | 501 |-------------(7) PRACK--------------------->| 502 |-------------(8) 200 OK (UPDATE)----------->| 503 | a=inactive | 504 |<-----------(9) 200 OK (PRACK)--------------| 505 | | 506 | | 507 | | 508 |<----------(10) 200 OK (INVITE)-------------| 509 | | 510 |<-------------(11) UPDATE ---------------| 511 | a=sendrecv | 512 | | 513 |-----------------(12) ACK------------------>| 514 | | 515 |------------(13) 200 OK (UPDATE)----------->| 516 | a=sendrecv | 517 | * * | 518 | ****************************************** | 519 |* Bi-directional conversation *| 520 | ****************************************** | 521 | * * | 522 | | 524 Figure 2: Locally generated ringback tone 525 followed, or as required to translate it into languages other than 526 English. 528 The limited permissions granted above are perpetual and will not be 529 revoked by the Internet Society or its successors or assigns. 531 This document and the information contained herein is provided on an 532 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 533 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 534 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 535 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 536 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.