idnits 2.17.1 draft-camarillo-sipping-early-media-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 12 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 29, 2003) is 7601 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 430 looks like a reference -- Missing reference section? '2' on line 435 looks like a reference -- Missing reference section? '3' on line 439 looks like a reference -- Missing reference section? '4' on line 443 looks like a reference -- Missing reference section? '5' on line 446 looks like a reference -- Missing reference section? '6' on line 451 looks like a reference -- Missing reference section? '7' on line 455 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIP WG 3 Internet Draft G. Camarillo 4 Ericsson 5 H. Schulzrinne 6 Columbia University 7 draft-camarillo-sipping-early-media-02.txt 8 June 29, 2003 9 Expires: December, 2003 11 Early Media and Ringing Tone Generation 12 in the Session Initiation Protocol 14 STATUS OF THIS MEMO 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress". 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 To view the list Internet-Draft Shadow Directories, see 33 http://www.ietf.org/shadow.html. 35 Abstract 37 This document describes how to manage early media in SIP using two 38 models; the gateway model and the application server model. It also 39 describes which inputs need to be taken into consideration to define 40 local policies for ringing tone generation. 42 Table of Contents 44 1 Introduction ........................................ 3 45 2 Session Establishment in SIP ........................ 3 46 3 The Gateway Model ................................... 4 47 3.1 Forking ............................................. 5 48 3.2 Ringing Tone Generation ............................. 6 49 3.3 Absence of an Early Media Indicator ................. 7 50 3.4 Applicability of the Gateway Model .................. 8 51 4 The Application Server Model ........................ 8 52 4.1 In-Band Versus Out-of-Band Session Progress 53 Information ......................................... 9 54 5 Alert-Info Header Field ............................. 9 55 6 Acknowledgments ..................................... 10 56 7 Authors' Addresses .................................. 10 57 8 Bibliography ........................................ 10 59 1 Introduction 61 Early media refers to media (e.g., audio and/or video) that is 62 exchanged before a particular session is accepted by the called user. 63 Early media within a particular SIP dialog takes place from the 64 moment the initial INVITE is sent until the UAS generates a final 65 response. Early media can be unidirectional or bi-directional and can 66 be generated by the caller or/and the callee. Typical examples of 67 early media generated by the callee are ringing tone and 68 announcements (e.g., queuing status). Early media generated by the 69 caller typically consist of voice commands or DTMF tones to drive 70 IVRs. 72 The basic SIP spec [1] only supports very simple early media. In 73 order to support fully-featured early media, UAs need to implement 74 some extensions in addition to the basic SIP spec. This document 75 describes two models to implement early media and the extensions 76 needed in each model. 78 Section 2 describes the offer/answer model in absence of early media. 79 Section 3 introduces the gateway model. In this model, the early 80 media session is established using the early dialog established by 81 the original INVITE. Section 3.1, Section 3.2 and Section 3.4 82 describe the limitations of the gateway model and the scenarios where 83 it is appropriate to use this model. Section 4 introduces the 84 application server model, which resolves some of the issues present 85 in the gateway model. Section 5 discusses the interactions between 86 the Alter-Info header field in both early media models. 88 2 Session Establishment in SIP 90 Before presenting both early media models, we will briefly summarize 91 how session establishment works in SIP. This will let us keep 92 separate features that are intrinsic to SIP (e.g., media being played 93 before the 200 (OK) to avoid media clipping) from early media 94 operations. 96 SIP [1] uses the offer/answer model [2] to negotiate session 97 parameters. One of the user agents - the offerer - prepares a session 98 description that is called the offer. The other user agent - the 99 answerer - responds with another session description called the 100 answer. This two-way handshake allows both user agents to agree upon 101 the session parameters to be used to exchange media. 103 The idea behind the offer/answer model is to decouple the 104 offer/answer exchange from the messages used to transport the session 105 descriptions. For example, the offer can be sent in an INVITE request 106 and the answer can arrive in the 200 (OK) response for that INVITE. 108 Or, alternatively, the offer can be sent in the 200 (OK) for an empty 109 INVITE and the answer be sent in the ACK. When reliable provisional 110 responses [3] and UPDATE requests [4] are used, there are many more 111 possible ways to exchange offers and answers. 113 Media clipping occurs when the user (or the machine generating media) 114 believes that the media session is already established but the 115 establishment process has not finished yet. The user starts speaking 116 (i.e., generating media) and the first few syllables or even the 117 first few words are lost. 119 When the offer/answer exchange takes place in the 200 (OK) response 120 and in the ACK, media clipping is unavoidable. The called user starts 121 speaking at the same time as the 200 (OK) is sent, but the UAS cannot 122 send any media until the answer from the UAC arrives in the ACK. 124 However, SIP provides a solution to avoid media clipping in the most 125 common offer/answer exchange; an INVITE with an offer and a 200 (OK) 126 with an answer. SIP signalling and media packets typically traverse 127 different paths. Therefore, the UAC cannot count on the reception of 128 the 200 (OK) to start playing out media for the caller; media packets 129 could arrive before the 200 (OK) response. The UAC needs to be ready 130 to play incoming media packets as soon as it sends its offer. 132 Another form of media clipping (not related to early media either) 133 occurs in the caller->callee direction. If the callee picks up and 134 starts speaking, the UAS will send a 200 (OK) response with an answer 135 and the first media packets in parallel. If the first media packets 136 arrive to the UAC before the answer, and the caller starts speaking 137 as well, the UAC will not be able to send media until the 2xx 138 response from the UAS arrives. 140 3 The Gateway Model 142 As describes in Section 2, SIP uses the offer/answer model to 143 negotiate session parameters. An offer/answer exchange that takes 144 place before a final response for the INVITE is sent establishes an 145 "early" media session. Early media sessions terminate when a final 146 response for the INVITE is sent. If the final response is a 2xx, the 147 early media session transitions to a regular media session. If the 148 final response is a non-2xx final response, the early media session 149 is simply terminated. 151 Media exchanged within an early media session is, not surprisingly, 152 referred to as early media. The gateway model consists of managing 153 early media sessions using offer/answer exchanges in reliable 154 provisional responses, PRACKs and UPDATEs. 156 The gateway model presents serious limitations in presence of 157 forking, as described in Section 3.1. Therefore, its use in only 158 acceptable where the UA cannot distinguish between early and regular 159 media, as described in Section 3.4. In any other situation (the 160 majority of UAs), it is strongly recommended that the application 161 server model described in Section 4 is used instead. 163 3.1 Forking 165 In the absence of forking, assuming that the initial INVITE contains 166 an offer, the gateway model does not introduce media clipping. 167 Following normal SIP procedures, the UAC is ready to play any 168 incoming media as soon as it sends the initial offer in the INVITE. 169 The UAS sends the answer in a reliable provisional response and 170 starts sending media right away. Even if the first media packets 171 arrive to the UAS before the 1xx response, the UAS will play them. 173 Note that, in some situations, the UAC does need to receive 174 the answer before being able to play any media. UAs in such 175 a situation (e.g., QoS, media authorization or media 176 encryption is required) use preconditions to avoid media 177 clipping. 179 However, if the INVITE forks, the gateway model may introduce media 180 clipping. This happens when the UAC receives different answers to its 181 offer in several provisional responses from different UASs. The UAC 182 has to deal with bandwidth limitations and early media session 183 selection. 185 If the UAC receives early media from different UASs, it needs to 186 present it to the user. If the early media consists of audio, playing 187 several audio streams to the user at the same time can be confusing. 188 Other media types (e.g., video), on the other hand, can be presented 189 to the user at the same time. The UAC can, for example, build a 190 mosaic with the different inputs. 192 However, even with media types that can be played at the same time to 193 the user, if the UAC has limited bandwidth, it will not be able to 194 receive early media from all the different UASs at the same time. 195 Therefore, many times, the UAC needs to choose a single early media 196 session and "mute" the rest of them sending UPDATE requests. 198 It is difficult to decide which early media session carry 199 more important information from the caller's perspective. 200 In fact, in some scenarios, the UA cannot even correlate 201 media packets with their particular SIP early dialog. 202 Therefore, UACs typically pick up one early dialog randomly 203 and mute the rest. 205 If one of the early media sessions that was muted transitions to a 206 regular media session (i.e., the UAS sends a 2xx response), media 207 clipping is likely to appear. The UAC typically sends an UPDATE with 208 a new offer (upon reception of the 200 OK for the INVITE) to unmute 209 the media session. The UAS cannot send any media until it receives 210 the offer from the UAC. Therefore, if the caller starts speaking 211 before the offer from the UAC is received, his words will get lost. 213 Having the UAS send the UPDATE to unmute the media session 214 (instead of the UAC) does not avoid media clipping in the 215 backward direction and it causes possible race conditions. 217 3.2 Ringing Tone Generation 219 In the PSTN, telephone switches typically play ringing tones to the 220 caller to indicate that the callee is being alerted. When, where and 221 how these ringing tones are generated has been standardized (i.e., 222 the local exchange of the callee generates a standardized ringing 223 tone while the callee is being alerted). A standardized approach to 224 provide this type of feedback for the user makes sense in a 225 homogeneous environment such as the PSTN, where all the terminals 226 have a similar user interface. 228 This homogeneity is not found among SIP user agents. SIP user agents 229 have different capabilities, different user interfaces and may be 230 used to establish sessions that do not involve audio at all. Because 231 of this, the way a SIP UA provides the user with information about 232 the progress of session establishment is a matter of local policy. 233 For example, a UA with a GUI may choose to display a message on the 234 screen when the callee is being alerted while another UA may choose 235 to show a picture of a phone ringing instead. Many SIP UAs choose to 236 imitate the user interface of the PSTN phones. They provide a ringing 237 tone to the caller when the callee is being alerted. Such a UAC is 238 supposed to generate ringing tones locally for its user as long as no 239 early media is received from the UAS. If the UAS generates early 240 media (e.g., an announcement or a special ringing tone), the UAC is 241 supposed to play it rather than generating the ringing tone locally. 243 The problem is that, sometimes, it is not an easy task for a UAC to 244 know whether it should generate local ringing or it will be receiving 245 early media. A UAS can send early media without using reliable 246 provisional responses (very simple UASs do that) or it can send an 247 answer in a reliable provisional response without any intention of 248 sending early media (this is the case when preconditions are used). 249 Therefore, by only looking at the SIP signalling, a UAC cannot be 250 sure whether or not there will be early media for a particular 251 session. The UAC needs to check if media packets are arriving at a 252 given moment. 254 An implementation could even choose to look at the contents 255 of the media packets, since they could carry only silence 256 or comfort noise. 258 With this in mind, a UAC should develop its local policy regarding 259 local ringing generation. For example, a POTS-like SIP UA could 260 implement the following local policy: 262 1. Unless a 180 (Ringing) response is received, never generate 263 local ringing. 265 2. If a 180 (Ringing) has been received but there are no 266 incoming media packets, generate local ringing. 268 3. If a 180 (Ringing) has been received and there are incoming 269 media packets, play them and do not generate local ringing. 271 Note that a 180 (Ringing) response means that the callee is 272 being alerted, and a UAS should send such a response if the 273 callee is being alerted, regardless of the status of the 274 early media session. 276 At first sight, such a policy may look difficult to implement in 277 decomposed UAs (i.e., media gateway controller and media gateway). 278 However, this policy is the same as the one described in Section 2, 279 which must be implemented by any UA; any UA should play incoming 280 media packets (and stop local ringing tone generation if it was being 281 performed) in order to avoid media clipping, even if the 200 (OK) 282 response has not arrived. Therefore, the tools to implement this 283 early media policy are available already to any UA that uses SIP. 285 Note that, while it is not desirable to standardize a common local 286 policy to be followed by every SIP UA, a particular subset of more or 287 less homogeneous SIP UAs could use the same local policy by 288 convention. Examples of such subsets of SIP UAs may be "all the 289 PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the 290 particular common policy that such groups of SIP devices may use is 291 outside the scope of this document. 293 3.3 Absence of an Early Media Indicator 295 SIP, as opposed to other signalling protocols, does not provide an 296 early media indicator. That is, there is no information about the 297 presence or absence of early media in SIP. Such an indicator could be 298 potentially used to avoid generation of local ringing tone by the UAC 299 when UAS intends to provide in-band ringing tone or some type of 300 announcement. However, due to the way SIP works, such an indicator 301 would, in the majority of the cases, be of little use. 303 One important reason that would limit the benefit of a potential 304 early media indicator is the loosely coupling between SIP signalling 305 and the media path. SIP signalling traverse a different path than the 306 media. The media path is typically optimized to reduce the end-to-end 307 delay (e.g., minimum number of intermediaries) whereas the SIP 308 signalling path typically traverses a number of proxies providing 309 different services for the session. Due to that reason, it is very 310 likely that the media packets with early media reach the UAC before 311 any SIP message which could contain an early media indicator. 313 However, sometimes, SIP responses arrive at the UAC before any media 314 packet. There are situations when the UAS intends to send early media 315 but cannot do it straight away. For example, UAs using ICE [5] and 316 ALT [6] may need to exchange several STUN messages before being able 317 to exchange media. In this situations, an early media indicator would 318 keep the UAC from generating local ringing tone during this time. 319 However, while the early media is not arriving to the UAC, the user 320 would not be aware of the fact that the remote user is being alerted, 321 even though a 180 (Ringing) had been received. Therefore, a better 322 solution would be to apply local ringing tone until the early media 323 packets could be sent from the UAS to the UAC. This solution does not 324 require any early media indicator. 326 Note that migrations from local ringing tone to early media 327 at the UAC happen in the presence of forking as well; one 328 UAS sends a 180 (Ringing) response, and later, another UAS 329 starts sending early media. 331 3.4 Applicability of the Gateway Model 333 Section 3 described some of the limitations of the gateway model. It 334 produces media clipping in forking scenarios and requires media 335 detection to generate local ringing properly. These issues are 336 addressed by the application server model, described in Section 4, 337 which is the recommended way of generating early media that is not 338 continuous with the regular media that will be generated during the 339 session. 341 The gateway model is, therefore, acceptable in situations where the 342 UA cannot distinguish between early media and regular media. A PSTN 343 gateway is an example of this type of situation. The PSTN gateway 344 receives media from the PSTN over a circuit, and sends it to the IP 345 network. The gateway is not aware of the contents of the media, and 346 it does not exactly know when the transition from early to regular 347 media takes place. From the PSTN perspective, the circuit is a 348 continuous source of media. 350 4 The Application Server Model 351 The application server model consists of having the UAS behave as any 352 other application server in the session [7]. The UAC includes a Join 353 header field in the initial INVITE. In order to send early media, the 354 UAS establishes a new dialog by sending a new INVITE to the URI in 355 the Join header field. 357 Sending early media using a different dialog than the one used for 358 sending regular media helps avoid media clipping in case of forking. 359 The UAC can reject or mute new invitations for early media without 360 muting the sessions that will carry media when the original INVITE is 361 accepted. The UAC can give priority to media received over the latter 362 sessions. This way, the application server model achieves a smooth 363 transition from early to regular media. 365 Having a separate dialog for early media also helps UAs decide 366 whether or not local ringing should be generated. If a new dialog to 367 send early media is established, and that dialog contains at least an 368 audio stream, the UAC can assume that there will be incoming early 369 media and it can then avoid generating local ringing. 371 An alternative model would consist of adding a new stream 372 labeled as "early media" to the original session between 373 the UAC and the UAS using an UPDATE, instead of 374 establishing a new session. We have chosen to establish a 375 new session to be coherent with the mechanism used by 376 application servers that are NOT co-located with the UAS. 377 This way, the UAS uses the same mechanism as any other 378 application server in the network to interact with the UAC. 380 4.1 In-Band Versus Out-of-Band Session Progress Information 382 Note that, even when the application server model is used, a UA will 383 have to choose which early media sessions are muted and which ones 384 are rendered to the user. In order to make this choice easier to UAs, 385 it is strongly recommended that information that is not essential for 386 the session is not transmitted using early media. For instance, UAs 387 should not use early media to send special ringing tones. SIP already 388 provides a means to inform the remote user about session 389 establishment progress which does not cause any of the problems 390 associated with early media; the status code and the reason phrase in 391 provisional responses. 393 5 Alert-Info Header Field 395 The Alert-Info header field allows specifying an alternative ringing 396 tone to the UAC. This header field tells the UAC which tone should be 397 played in case local ringing is generated, but it does not tell the 398 UAC when to generate local ringing. A UAC should follow the rules 399 described above for ringing tone generation in both models. If, after 400 following those rules, the UAC decides to play local ringing, it can 401 then use the Alert-Info header field to generate it. 403 6 Acknowledgments 405 Jon Peterson provided useful ideas on the separation between the 406 gateway model and the application server model. 408 Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John 409 Hearty, Adam Roach and Rohan Mahy provided useful comments and 410 suggestions. 412 7 Authors' Addresses 414 Gonzalo Camarillo 415 Ericsson 416 Advanced Signalling Research Lab. 417 FIN-02420 Jorvas 418 Finland 419 electronic mail: Gonzalo.Camarillo@ericsson.com 421 Henning Schulzrinne 422 Dept. of Computer Science 423 Columbia University 1214 Amsterdam Avenue, MC 0401 424 New York, NY 10027 425 USA 426 electronic mail: schulzrinne@cs.columbia.edu 428 8 Bibliography 430 [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J. 431 Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session 432 initiation protocol," RFC 3261, Internet Engineering Task Force, June 433 2002. 435 [2] J. Rosenberg and H. Schulzrinne, "An offer/answer model with 436 session description protocol (SDP)," RFC 3264, Internet Engineering 437 Task Force, June 2002. 439 [3] J. Rosenberg and H. Schulzrinne, "Reliability of provisional 440 responses in session initiation protocol (SIP)," RFC 3262, Internet 441 Engineering Task Force, June 2002. 443 [4] J. Rosenberg, "The session initiation protocol (SIP) UPDATE 444 method," RFC 3311, Internet Engineering Task Force, Oct. 2002. 446 [5] J. Rosenberg, "Interactive connectivity establishment (ICE): a 447 methodology for nettwork address translator (NAT) traversal for the 448 session initiation protocol (SIP)," internet draft, Internet 449 Engineering Task Force, Feb. 2003. Work in progress. 451 [6] G. Camarillo and J. Rosenberg, "The alternative semantics for the 452 session description protocol grouping framework," internet draft, 453 Internet Engineering Task Force, June 2003. Work in progress. 455 [7] J. Rosenberg, "A framework and requirements for application 456 interaction in SIP," internet draft, Internet Engineering Task Force, 457 Nov. 2002. Work in progress. 459 The IETF takes no position regarding the validity or scope of any 460 intellectual property or other rights that might be claimed to 461 pertain to the implementation or use of the technology described in 462 this document or the extent to which any license under such rights 463 might or might not be available; neither does it represent that it 464 has made any effort to identify any such rights. Information on the 465 IETF's procedures with respect to rights in standards-track and 466 standards-related documentation can be found in BCP-11. Copies of 467 claims of rights made available for publication and any assurances of 468 licenses to be made available, or the result of an attempt made to 469 obtain a general license or permission for the use of such 470 proprietary rights by implementors or users of this specification can 471 be obtained from the IETF Secretariat. 473 The IETF invites any interested party to bring to its attention any 474 copyrights, patents or patent applications, or other proprietary 475 rights which may cover technology that may be required to practice 476 this standard. Please address the information to the IETF Executive 477 Director. 479 Full Copyright Statement 481 Copyright (c) The Internet Society (2003). All Rights Reserved. 483 This document and translations of it may be copied and furnished to 484 others, and derivative works that comment on or otherwise explain it 485 or assist in its implementation may be prepared, copied, published 486 and distributed, in whole or in part, without restriction of any 487 kind, provided that the above copyright notice and this paragraph are 488 included on all such copies and derivative works. However, this 489 document itself may not be modified in any way, such as by removing 490 the copyright notice or references to the Internet Society or other 491 Internet organizations, except as needed for the purpose of 492 developing Internet standards in which case the procedures for 493 copyrights defined in the Internet Standards process must be 494 followed, or as required to translate it into languages other than 495 English. 497 The limited permissions granted above are perpetual and will not be 498 revoked by the Internet Society or its successors or assigns. 500 This document and the information contained herein is provided on an 501 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 502 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 503 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 504 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 505 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.