idnits 2.17.1 draft-alvestrand-rtp-sess-neutral-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 17, 2012) is 4331 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-00 == Outdated reference: A later version (-07) exists of draft-westerlund-avtcore-transport-multiplexing-01 -- Obsolete informational reference (is this intentional?): RFC 4288 (Obsoleted by RFC 6838) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Alvestrand 3 Internet-Draft Google 4 Intended status: Informational June 17, 2012 5 Expires: December 19, 2012 7 Why RTP Sessions Should Be Content Neutral 8 draft-alvestrand-rtp-sess-neutral-01 10 Abstract 12 This document is not intended for publication as an RFC. 14 It gives the underpinning arguments for why the idea that RTP 15 sessions and MIME top level types are related is a deeply broken 16 paradigm, and that we need to get away from it. 18 These arguments are solely the opinion of the listed author. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in RFC 2119 [RFC2119]. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 19, 2012. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Stuff that can be carried over RTP . . . . . . . . . . . . . . 3 62 3. What the network can do to "help" a flow . . . . . . . . . . . 4 63 4. The definition of an RTP "session" . . . . . . . . . . . . . . 5 64 5. Proper and improper use of RTP sessions . . . . . . . . . . . 6 65 6. The Pernicious Effect of SDP on the Media Type System . . . . 8 66 7. The Mixer Fallacy . . . . . . . . . . . . . . . . . . . . . . 8 67 8. Corrective Actions . . . . . . . . . . . . . . . . . . . . . . 9 68 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 69 10. Security Considerations . . . . . . . . . . . . . . . . . . . 9 70 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 71 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 12.1. Normative References . . . . . . . . . . . . . . . . . . 10 73 12.2. Informative References . . . . . . . . . . . . . . . . . 10 74 Appendix A. Change log . . . . . . . . . . . . . . . . . . . . . 10 75 A.1. Version -00 to -01 . . . . . . . . . . . . . . . . . . . 11 76 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 78 1. Introduction 80 The RTP universe of functionality can, for the purposes of this 81 argument, be reduced to two components: The RTP wire protocol 82 [RFC3550] (consisting of the RTP packet format, the RTCP reporting 83 format, and the handling rules for RTP sessions), and the SDP session 84 description language. For the purposes of this argument, the SDP 85 functionality for describing non-RTP sessions is ignored, as is the 86 ability to negotiate RTP sessions by other means than SDP. 88 This document argues that the RTP mechanisms of multiple RTP sessions 89 make sense for a lot of purposes, but does NOT make sense for a 90 mandated separation between different top-level MIME media types. 92 2. Stuff that can be carried over RTP 94 RTP, according to its own description an application layer framework 95 component, is a suitable protocol for framing data that needs to 96 travel across the network in a time-sensitive fashion, with the idea 97 that it is going to be presented at the receiving end in a time 98 sequence. Normally, the data (usually called "media") is streamed 99 across the network at a rate approximately equal to the speed at 100 which it is intended to be presented ("real time data"). 102 Examples of data carried over RTP include: 104 o G.711 Audio - 64 Kbits/second, completely fixed bitrate 106 o GSM AMR Audio - 4.75 to 12.2 Kbits/second, variable bitrate 108 o OPUS audio compressed into near-incomprehensibility - 6 kbits/ 109 second, variable bitrate 111 o OPUS audio carrying high fidelity music - 500 kbits/second, 112 variable bitrate 114 o QQVGA (160x120) video at 15 FPS in H.264 compression - 50 Kbits/ 115 second, variable bitrate, lots of schemes for error concealment 116 and correction 118 o HD video at 1920x1080@60 in H.264 compression - 1.4 Mbits/second 120 o Real-time text (T.140) - very few bits/second 122 o [RFC4733] DTMF tone signalling - very few bits/second 124 Schemes designed to increase the reliability of data carried across 125 RTP include: 127 o Forward error correction (FEC) 129 o Duplicated streams, codec-independent 131 o Duplicate sending of important information within the codec 133 o NAK-based resends signalled over RTCP 135 o Stream reset requests signalled over RTCP 137 Some of these are only applicable to media types (in particular, 138 "send me a new I-frame" doesn't make sense if you don't have 139 I-frames). Others can be used with any type of data. 141 3. What the network can do to "help" a flow 143 The network can apply various things to help the session data arrive 144 according to policy: 146 o Capacity reservation for specific flows 148 o Priority queueing, sending certain types of data faster than 149 others 151 o Filtering or blocking certain types of communication that the 152 managers deem inappropriate 154 The network can do these things in multiple ways, including so-called 155 "deep packet inspection", but the most common techniques require 156 being able to identify either the requested handling of the packets 157 (DiffServ using DSCP codepoints) or recognizing the flow based on its 158 5-tuple (source and destination address and port + protocol), 159 possibly correlating the 5-tuple with information carried to the 160 router through some kind of management interface (either connected to 161 the session setup protocol or managed via some other interface such 162 as RSVP/IntServ), and behaving accordingly. 164 All techniques have limitations; DSCP requires a certain trust in the 165 endpoints using the codepoints for "deserving traffic"; deep packet 166 inspection requires that packets be unencrypted, and stream control 167 requires that 5-tuples be related back to their putative purpose 168 either by heuristics or by being connected to management protocols. 170 4. The definition of an RTP "session" 172 An RTP session is defined in RFC 3550 section 3: 174 "RTP Session: An association among a set of participants 175 communicating with RTP. A participant may be involved in multiple 176 RTP sessions at the same time. In a multimedia session, each medium 177 is typically carried in a separate RTP session with its own RTCP 178 packets unless the encoding itself multiplexes multiple media into a 179 single data stream. A participant distinguishes multiple RTP 180 sessions by reception of different sessions using different pairs of 181 destination transport addresses, where a pair of transport addresses 182 comprises one network address plus a pair of ports for RTP and RTCP. 183 All participants in an RTP session may share a common destination 184 transport address pair, as in the case of IP multicast, or the pairs 185 may be different for each participant, as in the case of individual 186 unicast network addresses and port pairs. In the unicast case, a 187 participant may receive from all other participants in the session 188 using the same pair of ports, or may use a distinct pair of ports for 189 each. 191 The distinguishing feature of an RTP session is that each maintains a 192 full, separate space of SSRC identifiers (defined next). The set of 193 participants included in one RTP session consists of those that can 194 receive an SSRC identifier transmitted by any one of the participants 195 either in RTP as the SSRC or a CSRC (also defined below) or in RTCP. 196 For example, consider a three- party conference implemented using 197 unicast UDP with each participant receiving from the other two on 198 separate port pairs. If each participant sends RTCP feedback about 199 data received from one other participant only back to that 200 participant, then the conference is composed of three separate point- 201 to-point RTP sessions. If each participant provides RTCP feedback 202 about its reception of one other participant to both of the other 203 participants, then the conference is composed of one multi-party RTP 204 session. The latter case simulates the behavior that would occur 205 with IP multicast communication among the three participants. 207 The RTP framework allows the variations defined here, but a 208 particular control protocol or application design will usually impose 209 constraints on these variations." 211 An RTP session is thus characterized by: 213 o A single SSRC space 215 o A single reporting space - all participants see all RTCP messages 216 o Non overlapping transport addresses 218 As we can see here, it is not possible to tell from a single packet 219 whether it belongs to the same session as another packet or not; if 220 we observe two packets with the same source and destination 221 addresses, it seems safe to assume that they belong to the same 222 session, but for all other cases, deciding whether or not two packets 223 or packet streams are in the same session requires knowledge of the 224 configuration of the session. 226 5. Proper and improper use of RTP sessions 228 Section 5.2 of RFC 3550 gives the canonical statement of RTP session 229 (mis)use: 231 "In RTP, multiplexing is provided by the destination transport 232 address (network address and port number) which is different for each 233 RTP session. For example, in a teleconference composed of audio and 234 video media encoded separately, each medium SHOULD be carried in a 235 separate RTP session with its own destination transport address." 237 This sentence makes two very important leaps of faith: 239 o That distinguishing sessions by destination transport address is 240 necessary and sufficient 242 o That it is appropriate to give strong guidance about the 243 distribution of media streams across RTP sessions 245 Both of these are shaky. 247 As the cost of connecting ports has increased due to NATs, firewalls 248 and IPv4 exhaustion, there has been a strong push towards using fewer 249 ports, and indeed fewer 5-tuples, so that it is not uncommon to see 250 flows that can be distinguished only by source address; there have 251 also been proposals floated for putting multiple RTP sessions across 252 one 5-tuple [draft-westerlund-avtcore-transport-multiplexing]. 254 The cost of ports is also one factor pushing towards multiple media 255 types in one RTP session; however, the more important underlying 256 challenge is that this distinction is neither necessary nor 257 sufficient to distinguish the cases in which RTP media streams want 258 to have differential treatment from the network, and thus need to 259 assign streams either to the same session (to guarantee the same 260 treatment) or to different sessions (to allow for differential 261 treatment). 263 Consider the list of scenarios above, and imagine RTP being used for: 265 o A videoconference between 3 people who know each other well, using 266 low end equipment and barely-sufficient bandwidth pipes 268 o A Berlin Philharmonic concert broadcast featuring Brahms' "Tragic 269 Overture" 271 o A point-to-point transmission of a Manchester United vs Liverpool 272 football match 274 o A professor's lecture, with "talking head" presentation 275 simultaneous with slides, and opportunity for students to ask 276 questions 278 In each of these contexts, the tradeoff between audio and video is 279 different; in the Brahms case, the audio (which is the point of the 280 transmission) is likely to be transmitted at higher bandwidth than 281 the video, and if one of them has to have his bandwidth reduced, the 282 video should be reduced in quality before the audio is. In contrast, 283 in the football match, spectators care about seeing the action; as 284 long as they can understand the commentator's voice, the audio 285 quality is "good enough". 287 In the lecture case, quality of the lecturer's slides and voice is 288 critical; video from students is almost irrelevant to the larger 289 purpose. 291 A logical arrangement of media streams in RTP sessions would be to 292 group them by importance, and send them with appropriate traffic 293 engineering structuring; in the lecturer case, the slides and the 294 professor's voice would be carried in a high priority media stream, 295 while the professor's picture would have second priority, and voice 296 and video from students would be made available on an "if it works, 297 it works" basis. Someone may easily decide that the student feedback 298 track is not worth listening to, or remove the talking head of the 299 professor; it would be strange indeed to try to listen to the lecture 300 without viewing the supporting material. 302 This illustrates two points: 304 o The RTP session mechanism, using the 5-tuple as the unit of 305 differentiation, is a simple, effective and readily deployed 306 mechanism for separating streams that require different treatment 307 from the network in easily distinguished partitions. 309 o The assignment of media to such partitions is application 310 dependent, and the decision on how to group and how to prioritize 311 needs to be taken by the application developer. 313 6. The Pernicious Effect of SDP on the Media Type System 315 In the list of reasons to argue against the inappropriate advice 316 quoted above from RFC 3550, its pernicious influence on the MIME type 317 system bears mentioning. 319 The MIME type system, as described in [RFC4288], consists of a two- 320 level hierarchy: A top level media type (text, audio, video, 321 application and so on), and a media subtype that identifies (to some 322 level of precision) the format of the data being carried. 324 The system has mostly been respected, with some types (for instance 325 PDF) forever being borderline between the various categories, but 326 over the years, a few types have been entered into the system with 327 their top level types being decided, not by the nature of their 328 content, but by *the type with which their proponents wished to have 329 them multiplexed in an RTP session*. 331 This includes the types that designate repair mechanism (audio/ 332 parityfec, audio/red), timed data transfer (audio/clearmode) and that 333 ultimate triumph of expediency over cleanliness: audio/t140c, audio/ 334 3gpp-tt and video/3gpp-tt: Text types registered as audio and video. 336 For each of these, there is a fairly natural fit in the normal MIME 337 hierarchy (application/ for the mechanism types and text/ for the 338 text types); the assignment of them to the "media" top level types 339 has been done as an expediency in order to get around the stultifying 340 results of the advice given in RFC 3550. 342 7. The Mixer Fallacy 344 One of the arguments in favour of the RFC 3550 separation has been 345 that a mixer can be deployed that knows nothing of the semantics of 346 the media streams; it can "just mix them". 348 This applies partly to exactly one type of application: The audio 349 conference bridge. 351 For a video mixer, it does not apply; external logic (such as 352 listening to the audio voume of the corresponding audio channel, or 353 explicit flow control) is needed to select the right video stream to 354 send out. And even for larger audio bridges, it is common to have 355 functions like floor control, remote mute and other participant 356 management tools in order to control the bridge - as soon as such 357 tools are introduced, they are as relevant for a multi-media-type RTP 358 session as they are to a single-media-type RTP session. 360 8. Corrective Actions 362 There are not many protocol changes that really need to be taken to 363 solve this problem. 365 The basic mechanism of RTP is media type independent. There are some 366 RTCP issues with dealing with RTP flows of wildly varying bandwidth, 367 but as can be seen from the table of media types in the introduction, 368 this issue isn't solved by separating them; the bandwidth ranges of 369 the types overlap. 371 The thing that binds most in the current protocol suite is the 372 conservation of the inappropriate binding in the SDP media 373 description/negotiation format, where the MIME type is represented in 374 two pieces, one of which is tied to the RTP session rather than to 375 the payload type it is associated with, and there are fairly well- 376 understood ways to get around that, such as the BUNDLE grouping 377 extension [I-D.ietf-mmusic-sdp-bundle-negotiation]. Better designed 378 negotiation protocols would not have this problem at all. 380 In order to get out of the bind that SDP places us in, a change such 381 as BUNDLE should be adopted, and the IETF should record that the 382 advice from RFC 3550 is to be considered *advice*, not command: It is 383 sometimes appropriate to separate media streams according to top 384 level type, and sometimes not appropriate to do so. The application 385 is the one that needs to make this decision. 387 9. IANA Considerations 389 This document makes no request of IANA. 391 Note to RFC Editor: this section may be removed on publication as an 392 RFC. 394 10. Security Considerations 396 This note does not discuss any change that the author thinks would 397 have any significant influence on the security of RTP traffic. 399 11. Acknowledgements 401 This note has benefited greatly from exchanges with Colin Perkins, 402 whose unwavering support of a sharply differing viewpoint has served 403 to inform the arguments presented in this document. Magnus 404 Westerlund and Christer Holmberg also deserve special mention for 405 engaging constructively in the discussion. 407 12. References 409 12.1. Normative References 411 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 412 Requirement Levels", BCP 14, RFC 2119, March 1997. 414 12.2. Informative References 416 [I-D.ietf-mmusic-sdp-bundle-negotiation] 417 Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation 418 Using Session Description Protocol (SDP) Port Numbers", 419 draft-ietf-mmusic-sdp-bundle-negotiation-00 (work in 420 progress), February 2012. 422 [I-D.westerlund-avtcore-transport-multiplexing] 423 Westerlund, M. and C. Perkins, "Multiple RTP Session on a 424 Single Lower-Layer Transport", 425 draft-westerlund-avtcore-transport-multiplexing-01 (work 426 in progress), October 2011. 428 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 429 Jacobson, "RTP: A Transport Protocol for Real-Time 430 Applications", STD 64, RFC 3550, July 2003. 432 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 433 Registration Procedures", BCP 13, RFC 4288, December 2005. 435 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 436 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 437 December 2006. 439 Appendix A. Change log 440 A.1. Version -00 to -01 442 Version number bump, since the debate is ongoing. A few nits fixed. 443 Added the "Mixer Fallacy" section. Updated reference to "bundle" to 444 new draft name. 446 This should be the last version, since the author is in the process 447 of working with the authors of 448 [I-D.westerlund-avtcore-transport-multiplexing] to achieve a jointly 449 agreeable text. Hopefully this will take lesss than 6 months. 451 Author's Address 453 Harald T. Alvestrand 454 Google 455 Kungsbron 2 456 Stockholm, 11122 457 Sweden 459 Email: harald@alvestrand.no