idnits 2.17.1 draft-ietf-sipping-early-media-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 12 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 18, 2003) is 7459 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 431 looks like a reference -- Missing reference section? '2' on line 436 looks like a reference -- Missing reference section? '3' on line 440 looks like a reference -- Missing reference section? '4' on line 444 looks like a reference -- Missing reference section? '5' on line 447 looks like a reference -- Missing reference section? '6' on line 452 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force SIPPING WG 3 Internet Draft G. Camarillo 4 Ericsson 5 H. Schulzrinne 6 Columbia University 7 draft-ietf-sipping-early-media-01.txt 8 November 18, 2003 9 Expires: May, 2004 11 Early Media and Ringing Tone Generation 12 in the Session Initiation Protocol 14 STATUS OF THIS MEMO 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress". 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 To view the list Internet-Draft Shadow Directories, see 33 http://www.ietf.org/shadow.html. 35 Abstract 37 This document describes how to manage early media in SIP using two 38 models; the gateway model and the application server model. It also 39 describes the inputs one needs to consider to define local policies 40 for ringing tone generation. 42 Table of Contents 44 1 Introduction ........................................ 3 45 2 Session Establishment in SIP ........................ 3 46 3 The Gateway Model ................................... 4 47 3.1 Forking ............................................. 5 48 3.2 Ringing Tone Generation ............................. 6 49 3.3 Absence of an Early Media Indicator ................. 7 50 3.4 Applicability of the Gateway Model .................. 8 51 4 The Application Server Model ........................ 8 52 4.1 In-Band Versus Out-of-Band Session Progress 53 Information ......................................... 9 54 5 Alert-Info Header Field ............................. 9 55 6 Acknowledgments ..................................... 10 56 7 Authors' Addresses .................................. 10 57 8 Bibliography ........................................ 10 59 1 Introduction 61 Early media refers to media (e.g., audio and video) that is exchanged 62 before a particular session is accepted by the called user. Within a 63 dialog, early media occurs from the moment the initial INVITE is sent 64 until the UAS generates a final response. It may be unidirectional or 65 bi-directional, and can be generated by the caller, the callee, or 66 both. Typical examples of early media generated by the callee are 67 ringing tone and announcements (e.g., queuing status.) Early media 68 generated by the caller typically consists of voice commands or DTMF 69 tones to drive IVRs. 71 The basic SIP spec [1] only supports very simple early media. In 72 order to support fully-featured early media, UAs need to implement 73 some extensions in addition to the basic SIP spec. This document 74 describes two models to implement early media and the extensions 75 needed in each model. 77 Section 2 describes the offer/answer model in absence of early media, 78 and Section 3 introduces the gateway model. In this model, the early 79 media session is established using the early dialog established by 80 the original INVITE. Section 3.1, Section 3.2 and Section 3.4 81 describe the limitations of the gateway model and the scenarios where 82 it is appropriate to use this model. Section 4 introduces the 83 application server model, which resolves some of the issues present 84 in the gateway model. Section 5 discusses the interactions between 85 the Alter-Info header field in both early media models. 87 2 Session Establishment in SIP 89 Before presenting both early media models, we will briefly summarize 90 how session establishment works in SIP. This will let us keep 91 separate features that are intrinsic to SIP (e.g., media being played 92 before the 200 (OK) to avoid media clipping) from early media 93 operations. 95 SIP [1] uses the offer/answer model [2] to negotiate session 96 parameters. One of the user agents - the offerer - prepares a session 97 description that is called the offer. The other user agent - the 98 answerer - responds with another session description called the 99 answer. This two-way handshake allows both user agents to agree upon 100 the session parameters to be used to exchange media. 102 The idea behind the offer/answer model is to decouple the 103 offer/answer exchange from the messages used to transport the session 104 descriptions. For example, the offer can be sent in an INVITE request 105 and the answer can arrive in the 200 (OK) response for that INVITE, 106 or, alternatively, the offer can be sent in the 200 (OK) for an empty 107 INVITE and the answer be sent in the ACK. When reliable provisional 108 responses [3] and UPDATE requests [4] are used, there are many more 109 possible ways to exchange offers and answers. 111 Media clipping occurs when the user (or the machine generating media) 112 believes that the media session is already established but the 113 establishment process has not finished yet. The user starts speaking 114 (i.e., generating media) and the first few syllables or even the 115 first few words are lost. 117 When the offer/answer exchange takes place in the 200 (OK) response 118 and in the ACK, media clipping is unavoidable. The called user starts 119 speaking at the same time as the 200 (OK) is sent, but the UAS cannot 120 send any media until the answer from the UAC arrives in the ACK. 122 On the other hand, media clipping does not appear in the most common 123 offer/answer exchange (an INVITE with an offer and a 200 (OK) with an 124 answer). UACs are ready to play incoming media packets as soon as 125 they send an offer. They do this because they cannot count on the 126 reception of the 200 (OK) to start playing out media for the caller; 127 SIP signalling and media packets typically traverse different paths, 128 and so, media packets may arrive before the 200 (OK) response. 130 Another form of media clipping (not related to early media either) 131 occurs in the caller->callee direction. When the callee picks up and 132 starts speaking, the UAS sends a 200 (OK) response with an answer and 133 the first media packets in parallel. If the first media packets 134 arrive to the UAC before the answer, and the caller starts speaking 135 as well, the UAC cannot send media until the 2xx response from the 136 UAS arrives. 138 3 The Gateway Model 140 SIP uses the offer/answer model to negotiate session parameters (as 141 described in Section 2). An offer/answer exchange that takes place 142 before a final response for the INVITE is sent establishes an "early" 143 media session. Early media sessions terminate when a final response 144 for the INVITE is sent. If the final response is a 2xx, the early 145 media session transitions to a regular media session. If the final 146 response is a non-2xx final response, the early media session is 147 simply terminated. 149 Media exchanged within an early media session is, not surprisingly, 150 referred to as early media. The gateway model consists of managing 151 early media sessions using offer/answer exchanges in reliable 152 provisional responses, PRACKs, and UPDATEs. 154 The gateway model presents serious limitations in presence of 155 forking, as described in Section 3.1. Therefore, its use in only 156 acceptable when the UA cannot distinguish between early and regular 157 media, as described in Section 3.4. In any other situation (the 158 majority of UAs), it is strongly recommended that the application 159 server model described in Section 4 is used instead. 161 3.1 Forking 163 In the absence of forking, assuming that the initial INVITE contains 164 an offer, the gateway model does not introduce media clipping. 165 Following normal SIP procedures, the UAC is ready to play any 166 incoming media as soon as it sends the initial offer in the INVITE. 167 The UAS sends the answer in a reliable provisional response and can 168 send media as soon as there is media to send. Even if the first media 169 packets arrive to the UAC before the 1xx response, the UAC will play 170 them. 172 Note that, in some situations, the UAC does need to receive 173 the answer before being able to play any media. UAs in such 174 a situation (e.g., QoS, media authorization or media 175 encryption is required) use preconditions to avoid media 176 clipping. 178 On the other hand, if the INVITE forks, the gateway model may 179 introduce media clipping. This happens when the UAC receives 180 different answers to its offer in several provisional responses from 181 different UASs. The UAC has to deal with bandwidth limitations and 182 early media session selection. 184 If the UAC receives early media from different UASs, it needs to 185 present it to the user. If the early media consists of audio, playing 186 several audio streams to the user at the same time may be confusing. 187 Other media types (e.g., video), on the other hand, can be presented 188 to the user at the same time. The UAC can, for example, build a 189 mosaic with the different inputs. 191 However, even with media types that can be played at the same time to 192 the user, if the UAC has limited bandwidth, it will not be able to 193 receive early media from all the different UASs at the same time. 194 Therefore, many times, the UAC needs to choose a single early media 195 session and "mute" the rest of them sending UPDATE requests. 197 It is difficult to decide which early media session carry 198 more important information from the caller's perspective. 199 In fact, in some scenarios, the UA cannot even correlate 200 media packets with their particular SIP early dialog. 201 Therefore, UACs typically pick up one early dialog randomly 202 and mute the rest. 204 If one of the early media sessions that was muted transitions to a 205 regular media session (i.e., the UAS sends a 2xx response), media 206 clipping is likely to appear. The UAC typically sends an UPDATE with 207 a new offer (upon reception of the 200 OK for the INVITE) to unmute 208 the media session. The UAS cannot send any media until it receives 209 the offer from the UAC. Therefore, if the caller starts speaking 210 before the offer from the UAC is received, his words will get lost. 212 Having the UAS send the UPDATE to unmute the media session 213 (instead of the UAC) does not avoid media clipping in the 214 backward direction and it causes possible race conditions. 216 3.2 Ringing Tone Generation 218 In the PSTN, telephone switches typically play ringing tones to the 219 caller to indicate that the callee is being alerted. When, where and 220 how these ringing tones are generated has been standardized (i.e., 221 the local exchange of the callee generates a standardized ringing 222 tone while the callee is being alterted). A standardized approach to 223 provide this type of feedback for the user makes sense in a 224 homogeneous environment such as the PSTN, where all the terminals 225 have a similar user interface. 227 This homogeneity is not found among SIP user agents. SIP user agents 228 have different capabilities, different user interfaces and may be 229 used to establish sessions that do not involve audio at all. Because 230 of this, the way a SIP UA provides the user with information about 231 the progress of session establishment is a matter of local policy. 232 For example, a UA with a GUI may choose to display a message on the 233 screen when the callee is being alerted while another UA may choose 234 to show a picture of a phone ringing instead. Many SIP UAs choose to 235 imitate the user interface of the PSTN phones. They provide a ringing 236 tone to the caller when the callee is being alerted. Such a UAC is 237 supposed to generate ringing tones locally for its user as long as no 238 early media is received from the UAS. If the UAS generates early 239 media (e.g., an announcement or a special ringing tone), the UAC is 240 supposed to play it rather than generating the ringing tone locally. 242 The problem is that, sometimes, it is not an easy task for a UAC to 243 know whether it should generate local ringing or it will be receiving 244 early media. A UAS can send early media without using reliable 245 provisional responses (very simple UASs do that) or it can send an 246 answer in a reliable provisional response without any intention of 247 sending early media (this is the case when preconditions are used). 248 Therefore, by only looking at the SIP signalling, a UAC cannot be 249 sure whether or not there will be early media for a particular 250 session. The UAC needs to check if media packets are arriving at a 251 given moment. 253 An implementation could even choose to look at the contents 254 of the media packets, since they could carry only silence 255 or comfort noise. 257 With this in mind, a UAC should develop its local policy regarding 258 local ringing generation. For example, a POTS-like SIP UA could 259 implement the following local policy: 261 1. Unless a 180 (Ringing) response is received, never generate 262 local ringing. 264 2. If a 180 (Ringing) has been received but there are no 265 incoming media packets, generate local ringing. 267 3. If a 180 (Ringing) has been received and there are incoming 268 media packets, play them and do not generate local ringing. 270 Note that a 180 (Ringing) response means that the callee is 271 being alerted, and a UAS should send such a response if the 272 callee is being alerted, regardless of the status of the 273 early media session. 275 At first sight, such a policy may look difficult to implement in 276 decomposed UAs (i.e., media gateway controller and media gateway), 277 but this policy is the same as the one described in Section 2, which 278 must be implemented by any UA. That is, any UA should play incoming 279 media packets (and stop local ringing tone generation if it was being 280 performed) in order to avoid media clipping, even if the 200 (OK) 281 response has not arrived. So, the tools to implement this early media 282 policy are available already to any UA that uses SIP. 284 Note that, while it is not desirable to standardize a common local 285 policy to be followed by every SIP UA, a particular subset of more or 286 less homogeneous SIP UAs could use the same local policy by 287 convention. Examples of such subsets of SIP UAs may be "all the 288 PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the 289 particular common policy that such groups of SIP devices may use is 290 outside the scope of this document. 292 3.3 Absence of an Early Media Indicator 294 SIP, as opposed to other signalling protocols, does not provide an 295 early media indicator. That is, there is no information about the 296 presence or absence of early media in SIP. Such an indicator could be 297 potentially used to avoid generation of local ringing tone by the UAC 298 when UAS intends to provide in-band ringing tone or some type of 299 announcement. However, due to the way SIP works, such an indicator 300 would, in the majority of the cases, be of little use. 302 One important reason that would limit the benefit of a potential 303 early media indicator is the loose coupling between SIP signalling 304 and the media path. SIP signalling traverses a different path than 305 the media. The media path is typically optimized to reduce the end- 306 to-end delay (e.g., minimum number of intermediaries) while the SIP 307 signalling path typically traverses a number of proxies providing 308 different services for the session. Due to that reason, it is very 309 likely that the media packets with early media reach the UAC before 310 any SIP message which could contain an early media indicator. 312 Nevertheless, sometimes, SIP responses arrive at the UAC before any 313 media packet. There are situations when the UAS intends to send early 314 media but cannot do it straight away. For example, UAs using ICE [5] 315 may need to exchange several STUN messages before being able to 316 exchange media. In this situations, an early media indicator would 317 keep the UAC from generating local ringing tone during this time. 318 However, while the early media is not arriving to the UAC, the user 319 would not be aware of the fact that the remote user is being alerted, 320 even though a 180 (Ringing) had been received. Therefore, a better 321 solution would be to apply local ringing tone until the early media 322 packets could be sent from the UAS to the UAC. This solution does not 323 require any early media indicator. 325 Note that migrations from local ringing tone to early media 326 at the UAC happen in the presence of forking as well; one 327 UAS sends a 180 (Ringing) response, and later, another UAS 328 starts sending early media. 330 3.4 Applicability of the Gateway Model 332 Section 3 described some of the limitations of the gateway model. It 333 produces media clipping in forking scenarios and requires media 334 detection to generate local ringing properly. These issues are 335 addressed by the application server model, described in Section 4, 336 which is the recommended way of generating early media that is not 337 continuous with the regular media generated during the session. 339 The gateway model is, therefore, acceptable in situations where the 340 UA cannot distinguish between early media and regular media. A PSTN 341 gateway is an example of this type of situation. The PSTN gateway 342 receives media from the PSTN over a circuit, and sends it to the IP 343 network. The gateway is not aware of the contents of the media, and 344 it does not exactly know when the transition from early to regular 345 media takes place. From the PSTN perspective, the circuit is a 346 continuous source of media. 348 4 The Application Server Model 349 The application server model consists of having UAS behave as an 350 application server to establish early media sessions with the UAC. 351 The UAC indicates support for the early-session disposition type 352 (defined in [6]) using the early-session option tag. This way, UASs 353 know that they can keep offer/answer exchanges for early media 354 (early-session disposition type) and for regular media (session 355 disposition type) separate. 357 Sending early media using a different offer/answer exchange than the 358 one used for sending regular media helps avoid media clipping in case 359 of forking. The UAC can reject or mute new offers for early media 360 without muting the sessions that will carry media when the original 361 INVITE is accepted. The UAC can give priority to media received over 362 the latter sessions. This way, the application server model 363 transitions from early to regular media at the right moment. 365 Having a separate offer/answer exchange for early media also helps 366 UACs decide whether or not local ringing should be generated. If a 367 new early session is established and that early session contains at 368 least an audio stream, the UAC can assume that there will be incoming 369 early media and it can then avoid generating local ringing. 371 An alternative model would consist of adding a new stream 372 labeled as "early media" to the original session between 373 the UAC and the UAS using an UPDATE, instead of 374 establishing a new early session. We have chosen to 375 establish a new early session to be coherent with the 376 mechanism used by application servers that are NOT co- 377 located with the UAS. This way, the UAS uses the same 378 mechanism as any application server in the network to 379 interact with the UAC. 381 4.1 In-Band Versus Out-of-Band Session Progress Information 383 Note that, even when the application server model is used, a UA will 384 have to choose which early media sessions are muted and which ones 385 are rendered to the user. In order to make this choice easier to UAs, 386 it is strongly recommended that information that is not essential for 387 the session is not transmitted using early media. For instance, UAs 388 should not use early media to send special ringing tones. SIP already 389 provides a means to inform the remote user about session 390 establishment progress which does not cause any of the problems 391 associated with early media; the status code and the reason phrase in 392 provisional responses. 394 5 Alert-Info Header Field 395 The Alert-Info header field allows specifying an alternative ringing 396 content, such as ringing tone, to the UAC. This header field tells 397 the UAC which tone should be played in case local ringing is 398 generated, but it does not tell the UAC when to generate local 399 ringing. A UAC should follow the rules described above for ringing 400 tone generation in both models. If, after following those rules, the 401 UAC decides to play local ringing, it can then use the Alert-Info 402 header field to generate it. 404 6 Acknowledgments 406 Jon Peterson provided useful ideas on the separation between the 407 gateway model and the application server model. 409 Paul Kyzivat, Christer Holmberg, Bill Marshall, Francois Audet, John 410 Hearty, Adam Roach, Eric Burger, and Rohan Mahy provided useful 411 comments and suggestions. 413 7 Authors' Addresses 415 Gonzalo Camarillo 416 Ericsson 417 Advanced Signalling Research Lab. 418 FIN-02420 Jorvas 419 Finland 420 electronic mail: Gonzalo.Camarillo@ericsson.com 422 Henning Schulzrinne 423 Dept. of Computer Science 424 Columbia University 1214 Amsterdam Avenue, MC 0401 425 New York, NY 10027 426 USA 427 electronic mail: schulzrinne@cs.columbia.edu 429 8 Bibliography 431 [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J. 432 Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session 433 initiation protocol," RFC 3261, Internet Engineering Task Force, June 434 2002. 436 [2] J. Rosenberg and H. Schulzrinne, "An offer/answer model with 437 session description protocol (SDP)," RFC 3264, Internet Engineering 438 Task Force, June 2002. 440 [3] J. Rosenberg and H. Schulzrinne, "Reliability of provisional 441 responses in session initiation protocol (SIP)," RFC 3262, Internet 442 Engineering Task Force, June 2002. 444 [4] J. Rosenberg, "The session initiation protocol (SIP) UPDATE 445 method," RFC 3311, Internet Engineering Task Force, Oct. 2002. 447 [5] J. Rosenberg, "Interactive connectivity establishment (ICE): a 448 methodology for network address translator (NAT) traversal for the 449 session initiation protocol (SIP)," Internet draft, Internet 450 Engineering Task Force, July 2003. Work in progress. 452 [6] G. Camarillo, "The early session disposition type for the session 453 initiation protocol (SIP)," Internet Draft, Internet Engineering Task 454 Force, Oct. 2003. Work in progress. 456 The IETF takes no position regarding the validity or scope of any 457 intellectual property or other rights that might be claimed to 458 pertain to the implementation or use of the technology described in 459 this document or the extent to which any license under such rights 460 might or might not be available; neither does it represent that it 461 has made any effort to identify any such rights. Information on the 462 IETF's procedures with respect to rights in standards-track and 463 standards-related documentation can be found in BCP-11. Copies of 464 claims of rights made available for publication and any assurances of 465 licenses to be made available, or the result of an attempt made to 466 obtain a general license or permission for the use of such 467 proprietary rights by implementors or users of this specification can 468 be obtained from the IETF Secretariat. 470 The IETF invites any interested party to bring to its attention any 471 copyrights, patents or patent applications, or other proprietary 472 rights which may cover technology that may be required to practice 473 this standard. Please address the information to the IETF Executive 474 Director. 476 Full Copyright Statement 478 Copyright (c) The Internet Society (2003). All Rights Reserved. 480 This document and translations of it may be copied and furnished to 481 others, and derivative works that comment on or otherwise explain it 482 or assist in its implementation may be prepared, copied, published 483 and distributed, in whole or in part, without restriction of any 484 kind, provided that the above copyright notice and this paragraph are 485 included on all such copies and derivative works. However, this 486 document itself may not be modified in any way, such as by removing 487 the copyright notice or references to the Internet Society or other 488 Internet organizations, except as needed for the purpose of 489 developing Internet standards in which case the procedures for 490 copyrights defined in the Internet Standards process must be 491 followed, or as required to translate it into languages other than 492 English. 494 The limited permissions granted above are perpetual and will not be 495 revoked by the Internet Society or its successors or assigns. 497 This document and the information contained herein is provided on an 498 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 499 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 500 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 501 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 502 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.