idnits 2.17.1 draft-roach-mmusic-mlines-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 31, 2013) is 4096 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-01 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MMUSIC A. B. Roach 3 Internet-Draft Mozilla 4 Intended status: Informational January 31, 2013 5 Expires: August 4, 2013 7 Thoughts on syntax for representing multiple media streams 8 draft-roach-mmusic-mlines-00 10 Abstract 12 This document briefly explores the ramifications of combining 13 multiple media streams into one SDP m= section versus expressing each 14 in its own m= section. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on August 4, 2013. 33 Copyright Notice 35 Copyright (c) 2013 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 1. Introduction 50 As part of the ongoing RTCWEB and CLUE work, it has become clear that 51 the current mechanisms in SDP are insufficient for describing complex 52 sessions with multiple streams. Two competing schools of thought 53 have emerged. One holds that the m= lines should apply to RTP 54 sessions, regardless of how many media streams they contain. Another 55 holds that m= lines should apply to media streams exclusively, and 56 that an additional mechanism should be applied to combine multiple 57 streams into a single RTP session, if necessary. 59 2. Alternatives 61 2.1. Alternative 1: Multiple streams per m= section 63 One approach to specifying multiple streams in a single RTP session 64 is to put information for several streams into a single m= section; 65 and, by doing do, implicitly combine them into a single session. 67 To maintain some level of backwards compataibility with SDP, this 68 approach might choose to have one m= section for audio and a second 69 for video (with additional m= sections for other media types if they 70 are used in the future), combining those sections with a=group:BUNDLE 71 [I-D.ietf-mmusic-sdp-bundle-negotiation]; we will call this 72 "Alternative 1a". An alternate approach would be the definition of a 73 new media type which effectively allows transmission of any kind of 74 media, thereby avoiding the need to bundle multiple sections together 75 at all. A syntax for such an approach is proposed by 76 [I-D.holmberg-mmusic-sdp-mmt-negotiation]. We will call this 77 "Alternative 1b". 79 In both of the cases described above, certain SDP attributes might be 80 targeted at only one of the streams in an RTP session. These 81 attributes can be matched up with individual streams using the 82 "a=ssrc" extension defined in [RFC5576]. 84 For "Alternative 1a", we have the additional challenge of specifying 85 attributes that apply to the entire RTP session, such as a=rtcp-fb 86 and ICE candidate parameters. One approach would be inclusion of 87 such parameters only in the first m= section within a bundle, with 88 the implication that they apply to the entire session. 90 2.1.1. Alternative 1a: One section per RTP session per type 92 v=0 93 o=- 2890844526 2890844526 IN IP4 host.example.com 94 s= 95 c=IN IP4 host.example.com 96 t=0 0 97 a=group:BUNDLE c1 c2 98 m=audio 10000 RTP/AVP 0 8 97 99 a=mid:c1 100 a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host 101 a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr 102 192.0.2.240 rport 51091 103 a=rtpmap:0 PCMU/8000 104 a=rtpmap:8 PCMA/8000 105 a=rtpmap:97 iLBC/8000 106 a=ssrc:11111 label:speaker-audio 107 a=ssrc:22222 label:floor-mic 108 m=video 10000 RTP/AVP 31 32 109 a=mid:c2 110 a=rtpmap:31 H261/90000 111 a=rtpmap:32 MPV/90000 112 a=ssrc:33333 label:speaker-video 113 a=ssrc:44444 label:slides 115 2.1.2. Alternative 1b: One section per RTP session 117 v=0 118 o=- 2890844526 2890844526 IN IP4 host.example.com 119 s= 120 c=IN IP4 host.example.com 121 t=0 0 122 a=group:MMT foo bar zoe 123 m=anymedia 10000 RTP/AVP 0 8 97 31 32 124 a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host 125 a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr 126 192.0.2.240 rport 51091 127 a=rtpmap:0 PCMU/8000 128 a=rtpmap:8 PCMA/8000 129 a=rtpmap:97 iLBC/8000 130 a=rtpmap:31 H261/90000 131 a=rtpmap:32 MPV/90000 132 a=mmtype:0 audio 133 a=mmtype:8 audio 134 a=mmtype:97 audio 135 a=mmtype:31 video 136 a=mmtype:32 video 137 a=ssrc:11111 label:speaker-audio 138 a=ssrc:22222 label:floor-mic 139 a=ssrc:33333 label:speaker-video 140 a=ssrc:44444 label:slides 142 2.2. Alternative 2: Single stream per m= section 144 An alternate proposal is constraining one m= section to talk about a 145 single media stream. Like alternative 1a, above, the BUNDLE 146 extension is used to combine several m= sections into a single RTP 147 session. Any attributes that are applicable to a single media stream 148 can be correlated by putting them in the corresponding m= section. 149 Any attributes that apply to the transport paramters (e.g., rtcp-fb, 150 ICE parameters) are conveyed in the first m= section within the 151 bundle (alternate schemes are possible, but this seems the simplest 152 and most straightforward). 154 v=0 155 o=- 2890844526 2890844526 IN IP4 host.example.com 156 s= 157 c=IN IP4 host.example.com 158 t=0 0 159 a=group:BUNDLE c1 c2 c3 c4 160 m=audio 10000 RTP/AVP 0 8 97 161 a=mid:c1 162 a=label:speaker-audio 163 a=rtpmap:0 PCMU/8000 164 a=rtpmap:8 PCMA/8000 165 a=rtpmap:97 iLBC/8000 166 a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host 167 a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr 168 192.0.2.240 rport 51091 169 m=audio 10000 RTP/AVP 0 8 97 170 a=mid:c2 171 a=label:floor-mic 172 a=rtpmap:0 PCMU/8000 173 a=rtpmap:8 PCMA/8000 174 a=rtpmap:97 iLBC/8000 175 m=video 10000 RTP/AVP 31 32 176 a=mid:c3 177 a=label:speaker-video 178 a=rtpmap:31 H261/90000 179 a=rtpmap:32 MPV/90000 180 m=video 10000 RTP/AVP 31 32 181 a=mid:c4 182 a=label:slides 183 a=rtpmap:31 H261/90000 184 a=rtpmap:32 MPV/90000 186 2.3. Pros and Cons 188 2.3.1. Codec Selection 190 Currently, in SDP and the various documents that rely on it (such as 191 [RFC3264]), there are certain assumptions made about the ordinality 192 of streams to m= sections. Consider, for example, wanting to convey 193 two audio streams with a low-bandwidth voice codec preferred for one, 194 but a high-quailty codec preferred for the other. RFC 3264 has rules 195 indicating that codecs are conveyed in the order of their preference. 196 With alternative 2, it is trivial to provide different ordering (or 197 even a different set) of codecs to acheive such a goal. Alternatives 198 1a and 1b lack the ability to do so without additional extensions. 200 This set of facts supports alternative 2 in preference to 201 alternatives 1a and 1b. 203 2.3.2. Port Number Handling 205 When multiple sections are used to represent a single session, we 206 need to make a decision regarding the port number conveyed in the m= 207 line itself. One option is to use the same port number in all 208 related m= sections. According to Cullen Jennings, this interacts 209 very poorly with existing implementations that use SDP. The other 210 alternative is to indicate bogus port numbers in all (or all but one) 211 of the m= lines. According to Hadriel Kaplan, this usage will lead 212 to certain media intermediaries destroying the session when it 213 determines that a signaled port is going unused. 215 Alternative 1b avoids this problem altogether by having only one m= 216 per IP/port combination, thereby completely sidestepping the question 217 of what to put in subsequent m= lines. 219 This set of facts supports alternative 1b in preference to 220 alternatives 1a and 2. 222 2.3.3. Attribute handling 224 Attributes that appear inside m= sections can be generally broken 225 down into three categories: those intended to apply to a single media 226 stream (e.g., framerate); those intended to apply to an RTP session 227 (e.g., rtcp-fb), and those that are explicitly bound to the m= line 228 itself (e.g., rtpmap). By and large, these attributes have been 229 defined with an assumption that each RTP session had one stream and 230 vice-versa. 232 By specifying a model that breaks this one-to-one correspondence, we 233 have created the need to be able designate a specific media stream 234 within an RTP session (for alternatives 1a and 1b), or the need to be 235 able to talk about session-level attributes (for alternatives 1a and 236 2). 238 Alternatives 1a and 1b can perform stream-level designation through 239 the use of the ssid attribute specified in [RFC5576]. Alternatives 240 1a and 2 can apply a convention that any RTP-session-level attributes 241 are placed in the first m= section in a bundle (although other, more 242 complicated approaches may also be possible). 244 Note, in particular, that alternative 1a inherits both problems of 245 being able to designate attributes as applying to a single stream, as 246 well as being able to talk about session-level attributes when 247 multiple m=lines are bundled together. 249 This set of facts supports alternatives 1b and 2 in preference to 250 alternative 1a. 252 2.3.4. What We're Unaware of Not Knowing 254 It is worth noting that the problem described in Section 2.3.1 was 255 not discovered for quite a long time after the discussion of multiple 256 media streams had begun. In the characterization of "known knowns," 257 "known unknowns," and "unknown unknowns," this issue remained an 258 unknown unknown for more than a little time. 260 Generally, addressing these unknown unknowns is likely to be easiest 261 if we have the highest granularity of control. Alternative 2, by 262 breaking each stream apart into its own instance of the control 263 structure that has historically been used to work with media (the m= 264 section), provides this high granularity where alternatives 1a and 1b 265 do not. 267 It is the author's opinion that the probable existance of such 268 unknown unknowns favors alternative 2 over 1a or 1b. 270 2.4. Red Herrings 272 During the course of discussing this topic, several points have been 273 raised that, while relevant, do not bias the selection of one 274 solution over another. 276 One issue that has been brought up is that SDP offer/answer requires 277 signaling of the number of m= sections in the offer, to allow clear 278 semantics for negotiation. Some proponents of solutions 1a and 1b 279 have indicated a belief that allowing multiple streams per m= section 280 avoides this restriction. This assertion has a number of problems. 281 First, it assumes that implementations can perform reasonable 282 operations on dynamically created media streams that begin and end 283 without any signaling. It further assumes that the problems that the 284 offer/answer model imposed the m-line restrictions for are no longer 285 applicable (at least, not on a stream level). Finally, this 286 assertion assumes that no control surfaces are necessary to talk 287 about and/or manipulate the individual streams (alternately, if such 288 control surfaces are introduced, then additional SDP round-trips to 289 exchange information about those controls is necessary, making them 290 semantically equivalent to a new offer/answer exchange -- which 291 eliminates any purported advantage). 293 It has also been observed that, in addition to being sometimes 294 applicable to streams and sometimes applicable to sessions, attribute 295 are also sometimes unidirectional, and sometimes bidirectional. 296 While an astute observation, this does not appear to have any bearing 297 on the ultimate solution selected, as all three alternatives face 298 exactly the same challenges in dealing with issues of directionality. 300 Finally, it should be noted that any decision to include multiple 301 sections within a single m= section does little to simplify 302 implementation. Even if native RTCWEB implementations generate the 303 fewest m= sections necessary to convey their desired session state, 304 the selection of alternatives 1a and 1b does not obviate the 305 requirement that implementations must be able to receive SDP with 306 several m=audio sections (for example). Interoperation with legacy 307 implementations, even through a gateway, will require that proper 308 handling of such session descriptions is present in every RTCWEB 309 implementation. 311 2.5. Summary 313 The following table summarizes the pros and cons conveyed in the 314 preceding sections on a per-solution basis. 316 +---------------+----+----+---+ 317 | Issue | 1a | 1b | 2 | 318 +---------------+----+----+---+ 319 | Section 2.3.1 | - | - | + | 320 | Section 2.3.2 | - | + | - | 321 | Section 2.3.3 | - | + | + | 322 | Section 2.3.4 | - | - | + | 323 +---------------+----+----+---+ 325 Based on these criteria, it is the author's belief that Alternative 2 326 provides the most benefit, with Alternative 1b providing a close 327 second place. 329 Alternative 1a has the remarkable property of combining all of the 330 drawbacks of solutions 1b and 2, forming a kind of "sweet-spot" of 331 ill-advisement, and thereby maximizing the amount of work required of 332 the MMUSIC, RTCWEB,and CLUE working groups. 334 3. IANA Considerations 336 This document makes no requests of IANA. 338 4. Security Considerations 340 The author does not beleive that the syntax under discussion has an 341 impact on the security properties of those protocols that make use of 342 SDP. 344 5. Normative References 346 [I-D.holmberg-mmusic-sdp-mmt-negotiation] 347 Holmberg, C., Alvestrand, H., and J. Lennox, "Multiplexed 348 Media Types (MMT) Using Session Description Protocol (SDP) 349 Port Numbers", 350 draft-holmberg-mmusic-sdp-mmt-negotiation-00 (work in 351 progress), October 2012. 353 [I-D.ietf-mmusic-sdp-bundle-negotiation] 354 Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation 355 Using Session Description Protocol (SDP) Port Numbers", 356 draft-ietf-mmusic-sdp-bundle-negotiation-01 (work in 357 progress), August 2012. 359 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 360 with Session Description Protocol (SDP)", RFC 3264, 361 June 2002. 363 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 364 Media Attributes in the Session Description Protocol 365 (SDP)", RFC 5576, June 2009. 367 Author's Address 369 Adam Roach 370 Mozilla 371 Dallas, TX 372 US 374 Email: adam@nostrum.com