idnits 2.17.1 draft-camarillo-sip-sdp-01.txt: -(50): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == There are 6 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 8 longer pages, the longest (page 2) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 309: '...he fid attribute MUST add it to any SD...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2001) is 8349 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2543 (ref. '1') (Obsoleted by RFC 3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265) ** Obsolete normative reference: RFC 2327 (ref. '2') (Obsoleted by RFC 4566) == Outdated reference: A later version (-01) exists of draft-rosenberg-sip-app-components-00 -- Possible downref: Normative reference to a draft: ref. '3' ** Obsolete normative reference: RFC 2326 (ref. '4') (Obsoleted by RFC 7826) ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) == Outdated reference: A later version (-04) exists of draft-westberg-realtime-cellular-02 -- Possible downref: Normative reference to a draft: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' Summary: 11 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Gonzalo Camarillo 3 Internet draft Jan Holler 4 Goran AP Eriksson 5 Ericsson 6 November 2000 7 Expires June 2001 8 10 SDP media alignment in SIP 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. Internet-Drafts are draft documents valid for a maximum of 21 six months and may be updated, replaced, or obsoleted by other 22 documents at any time. It is inappropriate to use Internet- Drafts 23 as reference material or to cite them other than as "work in 24 progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This document defines an SDP media attribute. This attribute is 34 intended to be used in conjunction with SIP in order to align 35 different media streams belonging to a session. The use of this 36 attribute allows sending media from a single flow (several media 37 streams), encoded in different formats during the session, to 38 different ports and host interfaces. 40 1. Introduction 42 SIP [1] is an application layer protocol for establishing, 43 terminating and modifying multimedia sessions. SIP carries session 44 descriptions in the bodies of the SIP messages but is independent 45 from the protocol used for describing sessions. SDP [2] is one of 46 the protocols that can be used for this purpose. 48 Appendix B of [1] describes the usage of SDP in relation to SIP. It 49 states: "The caller and callee align their media description so that 50 the nth media stream ("m=" line) in the caller�s session description 51 corresponds to the nth media stream in the callee�s description." 53 Camarillo/Holler/Eriksson 1 54 SDP media alignment in SIP 56 This way of performing the media alignment is not efficient when a 57 single flow comprises several media streams. This is a common 58 situation when AP (Application Sever) components [3] are employed. 59 It is also common for systems that handle different codecs on 60 different port numbers (or on different interfaces). 62 2. Media flow definition 64 The RTSP RFC [4] defines a media stream as "a single media instance, 65 e.g., an audio stream or a video stream as well as a single 66 whiteboard or shared application group. When using RTP, a stream 67 consists of all RTP and RTCP packets created by a source within an 68 RTP session". 70 This definition assumes that a single audio (or video) stream maps 71 into an RTP session. The RTP RFC [5] defines an RTP session as 72 follows: "For each participant, the session is defined by a 73 particular pair of destination transport addresses (one network 74 address plus a port pair for RTP and RTCP)". 76 However, there are situations where a single media instance, e.g., 77 an audio stream or a video stream is sent using more than one RTP 78 session. Two examples (among many others) of this kind of situation 79 are cellular systems using SIP and systems receiving DTMF tones on a 80 different host than the voice. Both examples are described in later 81 sections. 83 We introduce the definition of media flow: 85 Media flow consists of a single media instance, e.g., an audio 86 stream or a video stream as well as a single whiteboard or shared 87 application group. When using RTP, a media flow comprises one or 88 more RTP sessions. 90 For instance, in a two party call where the voice exchanged can be 91 encoded using GSM or PCM, the receiver wants to receive GSM on a 92 port number and PCM on a different port number. Two RTP sessions 93 will be established, one carrying GSM and the other carrying PCM. 95 At any particular moment just one codec is in use. Therefore, at any 96 moment one of the RTP sessions will not transport any voice. Here 97 the systems are dealing with a single flow (one audio stream) and 98 two RTP sessions. 100 2.1 SIP and cellular access 102 Systems using a cellular access (such as UMTS or EDGE) and SIP as a 103 signalling protocol need to receive media over the air. During a 104 session the media can be encoded using different codecs. The encoded 105 media has to traverse the radio interface. The radio interface is 106 generally characterized by being bit error prone and associated with 107 relatively high packet transfer delays. In addition, radio interface 108 resources in a cellular environment are scarce and thus expensive, 110 Camarillo/Holler/Eriksson 2 111 SDP media alignment in SIP 113 which calls for special measures in providing a highly efficient 114 transport [6]. In order to get an appropriate speech quality in 115 combination with an efficient transport, precise knowledge of codec 116 properties are required so that a proper radio bearer for the RTP 117 session can be configured before transferring the media. These radio 118 bearers are dedicated bearers per media type, i.e. codec. 120 In UMTS, for instance, when the RTP packets shall be delivered over 121 the air interface, a packet filtering function routes the packets to 122 the proper radio bearer towards the UMTS/SIP terminal. The packet 123 filtering function operates using a Traffic Flow Template (TFT) [7], 124 which is established when configuring the radio bearer. The TFT 125 hence specifies the profile of the data that should be carried by 126 the radio bearer. A TFT can contain the following data: 128 -Source Address and Subnet Mask. 129 -Protocol Number (IPv4) / Next Header (IPv6). 130 -Destination Port Range. 131 -Source Port Range. 132 -IPSec Security Parameter Index (SPI). 133 -Type of Service (TOS) (IPv4) / Traffic class (IPv6) and Mask. 134 -Flow Label (IPv6). 136 It is worth noticing that just certain combinations of these 137 parameters are allowed. 139 The media has to have different destination port numbers for the 140 different possible codecs in order to be filtered and routed 141 properly to the correct radio bearer. Therefore, several RTP 142 sessions are used for a single media flow. 144 2.2 DTMF tones 146 Some voice sessions include DTMF tones. Sometimes the voice handling 147 is performed by a different host than the DTMF handling (e.g. 148 section 5.4, figures 3 and 4 of [3]). In this situations it is 149 necessary to establish two RTP sessions: one for the voice and the 150 other for the DTMF tones. Both RTP sessions are logically part of 151 the same media flow. 153 3. Flow identification attribute 155 A new "flow identification" media attribute is defined. It is used 156 for identifying media flows within a session. It provides a means 157 for aligning a number of flows (rather than a number of media 158 streams) within a session between members participating in the 159 session. Its formatting in SDP is described by the following BNF: 161 fid-attribute = "a=fid:" identification-tag 162 identification-tag = token 164 The identification tag is unique within the SDP session description. 165 The following examples illustrate its usage. 167 Camarillo/Holler/Eriksson 3 168 SDP media alignment in SIP 170 4. Examples of flow identification attribute 172 4.1 UMTS/SIP terminal 174 In the following example John uses a traditional access such as an 175 ethernet while Laura has a UMTS/SIP terminal. The caller John sends 176 an INVITE with the following session description to the callee 177 Laura. 179 v=0 180 o=John 289085535 289085535 IN IP4 first.example.com 181 t=0 0 182 c=IN IP4 111.111.111.111 183 m=audio 20000 RTP/AVP 0 8 184 a=fid:1 186 The callee Laura is on a UMTS/SIP terminal. She configures the 187 necessary radio bearers and implements the TFTs: 189 All the incoming IP packets with destination port UDP 30000 will be 190 carried by the radio access bearer configured for G-711 u-law 191 (payload type 0). 193 All the incoming IP packets with destination port UDP 30002 will be 194 carried by the radio access bearer configured for G-711 A-law 195 (payload type 8). 197 Accordingly, the following SDP is returned to the caller in a 200 OK 198 response: 200 v=0 201 o=Laura 289083124 289083124 IN IP4 second.example.com 202 t=0 0 203 c=IN IP4 222.222.222.222 204 m=audio 30000 RTP/AVP 0 205 a=fid:1 206 m=audio 30002 RTP/AVP 8 207 a=fid:1 209 With the current way of performing SDP media alignment in SIP the 210 callee would have accepted the call and immediately after re-INVITEd 211 the caller with the new SDP. The fid attribute saves many RTTs. 213 Besides saving bandwidth and RTTs the fid attribute provides a means 214 for describing a logical relationship between media streams that 215 belong to the same flow. 217 4.2 Application Server Components 219 Camarillo/Holler/Eriksson 4 220 SDP media alignment in SIP 222 In section 5.4 of "An Application Server Architecture for SIP" [3] 223 contains two examples (figures 3 and 4) where DTMF tones are 224 received by a different host than the voice stream. In both 225 situations using the fid attribute to perform media alignment would 226 save a tremendous amount of messages exchanged and reduce the golbal 227 session establishment time. 229 Let us take figure 4. A UAC sends an INVITE with just a voice 230 stream. There are two ASs in the path that want to receive DTMF 231 tones. 233 Three steps are needed in order to set the session up: 234 1) A session is established between the UAC and the callee. This 235 involves three messages from the caller�s point of view (INVITE- 236 200 OK-ACK). 237 2) The session is modified by A (one of the ASs that wants to 238 receive DTMF tones). It adds an "m" line to the session 239 description indicating that it wants to receive DTMF tones. This 240 involves three more messages from the caller�s point of view 241 (INVITE-200 OK-ACK) 242 3) The session is modified once more by B (the other AS that also 243 wants to receive DTMF tones). It adds another "m" line indicating 244 that it wants to receive DTMF tones. This involves three more 245 messages from the caller�s point of view (INVITE-200 OK-ACK). 247 Caller A B Callee 248 | | | | 249 |(1) SIP INV | | | 250 |-------------->|(2) SIP INV | | 251 | |--------------->|(3) SIP INV | 252 | | |---------------->| 253 | | |(4) 200 OK | 254 | |(5) 200 OK |<----------------| 255 |(6) 200 OK |<---------------| | 256 |<--------------| | | 257 |(7) SIP ACK | | | 258 |-------------->|(8) SIP ACK | | 259 | |--------------->|(9) SIP ACK | 260 | | |---------------->| 261 |(10) SIP INV | | | 262 |<--------------| | | 263 |(11) 200 OK | | | 264 |-------------->| | | 265 |(12) SIP ACK | | | 266 |<--------------| | | 267 | | | | 268 | |(13) SIP INV | | 269 |(14) SIP INV |<---------------| | 270 |<--------------| | | 271 |(15) 200 OK | | | 272 |-------------->|(16) 200 OK | | 273 | |--------------->| | 274 | |(17) SIP ACK | | 276 Camarillo/Holler/Eriksson 5 277 SDP media alignment in SIP 279 |(18) SIP ACK |<---------------| | 280 |<--------------| | | 281 | | | | 283 Figure 4 of "An AS Component Architecture for SIP" [3] 285 The whole session is not correctly set up until the end of this 286 sequence of messages. If the caller is using a low-rate access this 287 can take a long time. 289 The use of the fid attribute would reduce these nine messages that 290 the caller sees to just three (INVITE-200 OK-ACK). B would add an 291 "m" line to the 200 OK from the callee with the same fid value as 292 the voice stream. Then A would add another "m" line, again with the 293 same fid value than the two previous "m" lines. 295 As a result, the caller receives a 200 OK indicating that just one 296 flow is established, but also that all the DTMF tones should be sent 297 to A and B. For a low-rate access the establishment time has been 298 reduced a lot. 300 5. Media-level versus session-level attribute 302 Syntactically fid is a media-level attribute. It provides 303 information about a media stream defined by an "m" line. 304 Semantically fid would be defined as a session-level attribute since 305 it provides flow hierarchy inside a session description. 307 6. Backward compatibility 309 A system that understands the fid attribute MUST add it to any SDP 310 session description that it generates. 312 If a response to a request that included the fid attribute also 313 includes it media alignment is performed based on the fid attribute 314 rather than on matching of nth lines. 316 6.1 Caller does not support fid 318 This situation does not represent a problem. The SDP in the INVITE 319 will not contain any fid attribute and the callee will use the "nth- 320 line" method to perform media alignment. 322 The callee will need a re-INVITE in order to receive the proper 323 media encoding on the proper interface. 325 6.2 Callee does not support fid 327 The callee will ignore the fid attribute. It will consider that the 328 session comprises several media streams. 330 Different implementations would behave in different ways. 332 Camarillo/Holler/Eriksson 6 333 SDP media alignment in SIP 335 In the case of audio and different "m" lines for different codecs an 336 implementation might decide to act as a mixer with the different 337 incoming RTP sessions, which is the correct behavior. 339 If an implementation decides to refuse the request (e.g. 488 Not 340 acceptable here or 606 Not Acceptable) the caller should re-try the 341 request without the fid attribute and only one "m" line per flow. 342 Note that even re-INVITEs without the fid attribute adding new "m" 343 lines would probably fail in this situation because the callee does 344 not support multiple "m" lines. Therefore, this problem is related 345 to UAs that do not handle multiple "m" lines rather than to the fid 346 attribute. 348 7. Acronyms 350 AP Application Server 351 BNF Backus-Naur Form 352 DTMF Dual Tone Multi Frequency 353 EDGE Enhanced Data rates for GSM and TDMA/136 Evolution 354 GSM Global System for Mobile communication 355 IP Internet Protocol 356 PCM Pulse Code Modulation 357 RFC Request For Comments 358 RTCP RTP Control Protocol 359 RTP Real-time Transport Protocol 360 RTSP Real-Time Streaming Protocol 361 RTT Round Trip Time 362 SDP Session Description Protocol 363 SIP Session Initiation Protocol 364 TFT Traffic Flow Template 365 UA User Agent 366 UAC User Agent Client 367 UMTS Universal Mobile Telecommunication System 368 WLAN Wireless Local Area Network 370 8. Acknowledgments 372 The authors would like to thank Adam Roach for his feedback on this 373 document. 375 9. References 377 [1] M. Handley/H. Schulzrinne/E. Schooler/J. Rosenberg, "SIP: 378 Session Initiation Protocol", RFC 2543, IETF; Mach 1999. 380 [2] M. Handley/V. Jacobson, "SDP: Session Description Protocol", RFC 381 2327, IETF; April 1998. 383 [3] J. Rosemberg/P.Mataga/H.Schulzrinne, "An Applcation Server 384 Component Architecture for SIP", draft-rosenberg-sip-app-components- 385 00.txt, IETF; November 2000. 387 Camarillo/Holler/Eriksson 7 388 SDP media alignment in SIP 390 [4] H. Schulzrinne/A. Rao/R. Lanphier, "Real Time Streaming Protocol 391 (RTSP)", RFC 2326, IETF; April 1998. 393 [5] H. Schulzrinne/S. Casner/R. Frederick/V. Jacobson, "RTP: A 394 Transport Protocol for Real-Time Applications", RFC 1889, IETF; 395 January 1996. 397 [6] L. Westberg/M. Lindqvist, "Realtime Traffic over Cellular Access 398 Networks", draft-westberg-realtime-cellular-02.txt, IETF; May 2000. 399 Work in progress. 401 [7] 3G TS 23.060 v3.2.1 General Packet Radio Service Description. 403 10. Authors� Addresses 405 Gonzalo Camarillo 406 Ericsson 407 Advanced Signalling Research Lab. 408 FIN-02420 Jorvas 409 Finland 410 Phone: +358 9 299 3371 411 Fax: +358 9 299 3052 412 Email: Gonzalo.Camarillo@ericsson.com 414 Jan Holler 415 Ericsson Research 416 S-16480 Stockholm 417 Sweden 418 Phone: +46 8 58532845 419 Fax: +46 8 4047020 420 Email: Jan.Holler@era.ericsson.se 422 Goran AP Eriksson 423 Ericsson Research 424 S-16480 Stockholm 425 Sweden 426 Phone: +46 8 58531762 427 Fax: +46 8 4047020 428 Email: Goran.AP.Eriksson@era.ericsson.se 430 Camarillo/Holler/Eriksson 8