idnits 2.17.1 draft-rosenberg-rtcweb-rtpmux-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2011) is 4680 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-03) exists of draft-perkins-rtcweb-rtp-usage-01 == Outdated reference: A later version (-16) exists of draft-ietf-rtcweb-use-cases-and-requirements-01 -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTCWEB J. Rosenberg 3 Internet-Draft Skype 4 Intended status: Informational C. Jennings 5 Expires: January 5, 2012 Cisco 6 J. Peterson 7 Neustar 8 M. Kaufman 9 Skype 10 E. Rescorla 11 RTFM 12 T. Terriberry 13 Mozilla 14 July 4, 2011 16 Multiplexing of Real-Time Transport Protocol (RTP) Traffic for Browser 17 based Real-Time Communications (RTC) 18 draft-rosenberg-rtcweb-rtpmux-00 20 Abstract 22 This document argues that multiplexing of voice and video traffic 23 over a single RTP session should be specified as the baseline mode of 24 operation for multimedia traffic in RTC web. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 5, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. RTP Muxing with SSRC . . . . . . . . . . . . . . . . . . . . . 3 62 3. Arguments in Favor of Multiplexing . . . . . . . . . . . . . . 4 63 3.1. NAT Resource Preservation . . . . . . . . . . . . . . . . 4 64 3.2. Improved Failure Modes . . . . . . . . . . . . . . . . . . 5 65 3.3. Setup Time . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3.4. Complexity . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4. Responding to draft-perkins-rtcweb-rtp-usage . . . . . . . . . 5 68 4.1. Requires Additional Signaling . . . . . . . . . . . . . . 6 69 4.2. QoS and Traffic Engineering . . . . . . . . . . . . . . . 6 70 4.3. Scalability . . . . . . . . . . . . . . . . . . . . . . . 7 71 4.4. RTP Retransmission . . . . . . . . . . . . . . . . . . . . 7 72 4.5. Forward Error Correction . . . . . . . . . . . . . . . . . 8 73 4.6. RTCP Issues . . . . . . . . . . . . . . . . . . . . . . . 8 74 5. Arguing Against a Shim . . . . . . . . . . . . . . . . . . . . 9 75 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 76 7. Informative References . . . . . . . . . . . . . . . . . . . . 10 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 79 1. Introduction 81 The RTCweb working group is chartered to specify a framework and 82 protocols for enabling real-time communications services within a 83 browser, without the need for plugins 84 [I-D.rosenberg-rtcweb-framework]. It is envisioned that this will 85 enable many use cases [I-D.ietf-rtcweb-use-cases-and-requirements], 86 the most basic of which is a video call between two users on the web. 88 In order to enable this functionality, the specifications produced by 89 the IETF will mandate a specific set of protocols that must be 90 implemented within the browser. It is anticipated that these 91 protocols will include the Real-Time Transport Protocol [RFC3550], 92 and either in full or in part, Interactive Connectivity Establishment 93 (ICE) [RFC5245]. 95 The usage of RTP raises the question of multiplexing - whether or not 96 RTCP and RTP should run on the same port, and furthermore, whether or 97 not voice, video, and possibly data, should also run on the same 98 port. To provide guidance on this, Perkins et. al. produced 99 [I-D.perkins-rtcweb-rtp-usage], which recommends that voice and video 100 utilize different RTP sessions, and thus different UDP ports. 102 This document argues against this conclusion, and advocates that a 103 single transport session (i.e., a single UDP port) is used to carry 104 voice and video traffic, using the SSRC for demux. 106 2. RTP Muxing with SSRC 108 This document recommends that all of the associated media content of 109 the call - the voice, video, and RTCP traffic for both the voice and 110 video sessions, utilize a single transport session (i.e., single UDP 111 port). In cases where there are multiple video streams (for example, 112 screen sharing), the single transport session would carry all of the 113 video. Furthemore, that demultiplexing voice and video traffic is 114 done by assigning a different SSRC to each. This recommendation 115 applies to the case of a single unicast communications session 116 between a pair of endpoints (e.g., this document does not consider 117 the case of running a multi-user service like a gateway). 119 To enable multiplexing, we propose that the 32-bit SSRC value in the 120 RTP header be broken up into the following sub-fields: 122 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 123 | Magic Cookie |Type | StreamID |x| 124 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 125 SSRC Field 127 The Magic Cookie is two bytes, with a value of 0xf7b3. It is meant 128 to facilitate DPI applications which can use its value to - with high 129 confidence - determine that this RTP packet uses the encoding format 130 defined here. The type is a 3 bit value, corresponding to the top- 131 level MIME type of the media (mapping table TBD). It too is meant to 132 facilitate DPI applications which want to separate voice and video. 133 The streamID is a 12 bit field which represents the unique ID for 134 this stream. It is signaled between participants out of band. The 135 final bit, 'x' is set to zero and is reserved for future usage. 137 3. Arguments in Favor of Multiplexing 139 This section outlines several arguments in favor of multiplexing. 141 3.1. NAT Resource Preservation 143 Today's Internet is full of Network Address Translators (NAT), a 144 situation which is likely to get worse as IPv4 address exhaustion 145 continues. When NAT is in use, the constraint on the number of 146 endpoints behind the NAT is based on the number of parallel transport 147 sessions that need to be supported. If, for example, a NAT has a 148 single external IP address, it can support 64k UDP sessions while 149 having an endpoint-independent mapping behavior [RFC4787]. Thus, in 150 the presence of NAT, parallel transport sessions becomes the scarce 151 resource. 153 If rtcweb specifies that audio and video run on a separate port, this 154 will double the number of transport session resources consumed in 155 intervening NATs. While the usage of port as an application layer 156 demux point made sense when RTP was designed back in 1992 (the year 157 the first RTP draft was published), the Internet has changed 158 substantially since then. Continuing to perpetuate this design 159 optimizes preseveration of legacy against protection of resources in 160 the modern Internet. We feel that this optimizes in the wrong 161 direction. 163 Given that we anticipate widespread usage of rtcweb, this design 164 choice may create a non-trivial load on the transport session 165 capacity of the Internet at large. Real-time video communications on 166 the Internet has seen huge growth in recent years. For Skype, 167 approximately 40% of its Skype-to-Skype calls are video based. A 168 recent report by Sandvine reports that Skype alone is the third 169 largest source of upload traffic on the Internet as a whole, largely 170 attributed to Skype video calling. . The conclusion from this is that the costs of a 174 separate voice and video port cannot be ignored. 176 Simply put, the usage of transport ports for application 177 demultiplexing should be considered harmful for the Internet. 179 3.2. Improved Failure Modes 181 The usage of separate transport sessions for the audio, video or 182 other content of the call introduces a variety of partial failure 183 modes. The transport session for one type of media might get 184 established; but a NAT capacity problem might cause the transport 185 session for another type of media to fail. Usage of a single 186 transport session means that the conversation succeeds or fails 187 atomically. We consider this a feature. 189 3.3. Setup Time 191 The rtcweb group is considering the usage of ICE to create p2p 192 sessions. ICE provides firewall and NAT traversal in addition to 193 providing a handshake necessary to assure mutual consent for 194 communications. 196 Unfortunately, ICE requires time to perform its setup operations. 197 This time grows in proportion to the number of transport sessions 198 which must be opened in order to support the call. By using a 199 different port for video traffic, call setup times will increase. 200 The precise amount of this increase depends on the type of NAT and 201 varies depending on packet loss. However, in a simple, ideal case of 202 no packet loss and direct connectivity between endpoints, this value 203 is XXX [[fill in]]. 205 3.4. Complexity 207 ICE is not a simple protocol. One of its significant complexities is 208 its requirement to support calls for multiple media streams, each of 209 which runs on a separate port, and multiple components for each 210 stream (e.g., RTCP). If the concept of streams and components were 211 eliminated, ICE would be a simpler protocol. 213 If, within rtcweb, a single transport connection was utilized, 214 browsers could implement a simplified version of the ICE protocol. 216 4. Responding to draft-perkins-rtcweb-rtp-usage 218 [I-D.perkins-rtcweb-rtp-usage] outlines several arguments for 219 continuing to use a separate port for audio and video. In this 220 section, we respond to those arguments. 222 4.1. Requires Additional Signaling 224 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 225 video on the same RTP session would require a demux point to be 226 specified (for example, the SSRC), and require additional signaling 227 to be specified to accomplish this. 229 Firstly, this conclusion is only partly true. For communications 230 sessions between rtcweb users within the same domain, no signaling 231 specifications are required. This is true in general with rtcweb; 232 one of its benefits is that it does not require standardized 233 signaling. 235 Secondly, it is not yet clear that rtcweb will be able to 236 interoperate with existing VoIP endpoitns without a media 237 intermediary to terminate ICE traffic. It is our position that 238 interoperability without media intermediary only be provided for 239 basic voice services, and even then, only when RTCP is supported. In 240 the case of basic voice endpoints, where there is no video, RTP 241 multiplexing of voice and video is irrelevant, and thus no signaling 242 complexity is introduced. 244 Thirdly, the primary place where there will be a need for signaling 245 enhancements is for inter-domain calling between rtcweb endpoints in 246 different domains. In such a case, an SDP extension is required, and 247 one can be specified. It is trivial to do so. 249 Finally, this document does recommend that it be possible to utilize 250 a separate transport session for voice and for video, and that, in 251 the worst case, this mode can be used for calls between an rtcweb 252 endpoint and a legacy endpoint. 254 4.2. QoS and Traffic Engineering 256 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 257 video on the same RTP session would mean that it would not be 258 possible to apply QoS techniques separately for voice and video which 259 rely on the 5-tuple. 261 Firstly, the public Internet lacks any QoS mechanism, so this 262 argument is moot on the public Internet. 264 Secondly, private enterprise networks which do provide QoS most often 265 use diffserv. Diffserv is compatible with utilization of a common 266 port for voice and video traffic. Typically, different DSCPs are 267 used for voice and video (Cisco recommends EF for audio and AF41 for 268 video in enterprise telephony deployments), and this practice is 269 compatible with usage of the same port - each packet would be marked 270 appropriately. It is also possible to use the same DSCP for voice 271 and video. 273 Carrier networks, such as mobile operator networks, typically provide 274 QoS through traffic engineering, using a combination of MPLS tunnels 275 and diffserv markings. MPLS tunnels do use 5-tuples as classifiers 276 to determine which traffic to put in what kind of tunnel. If there 277 is a need for using separate MPLS tunnels for voice and video, the 278 DSCP codepoint itself can be used as a differentiator. 280 It is true that it would not be possible to utilize RSVP to 281 separately establish QoS treatment for the voice and the video 282 traffic. However, there is very little real deployment of RSVP. 283 None within the public Internet and relatively little within 284 corporate networks. As such, this argument is mostly theoretical. 286 Finally, DPI is used within some operator networks to perform traffic 287 classification. It would always be possible to use DPI to assign 288 different treatment to voice and video traffic. 290 4.3. Scalability 292 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 293 video on the same RTP session would mean that layered coding using 294 multicast for each layer would not be possible. 296 Firstly, most layered coding today uses unicast and a switch or mixer 297 of some sort to discard layers. That architecture is completely 298 compatible with the usage of a single transport session for voice and 299 video. The limitation applies only to the use of IP multicast for 300 real-time communications. The usage of multicast on the Internet has 301 substantially diminished over time. There is some usage today in 302 private networks but primarily for streaming media distribution. The 303 usage for real-time communications is quite rare. As such, we find 304 this to be a theoretical corner case. 306 4.4. RTP Retransmission 308 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 309 video on the same RTP session would not be interoperable with 310 endpoints doing RTP retransmission per [RFC4588]. 312 As pointed out above, interoperability with existing endpoints 313 without the usage of a media intermediary is not a given at this 314 point, and we argue it should only be supported for the common case - 315 a basic, voice-only RTP-capable endpoint. There is, to our 316 knowledge, relatively little deployment of RFC4588, at least for 317 real-time communications. It is certainly not a common feature in 318 basic RTP endpoints and never a baseline requirement for 319 interoperability. Consequently, if there is a need to interoperate 320 with an endpoint supporting RFC4588, and it is desired to avoid a 321 media intermediary, RFC4588 can just be turned off for the session. 323 As such, we find the interoperability argument here not compelling. 325 4.5. Forward Error Correction 327 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 328 video on the same RTP session will limit the applicability of FEC 329 [RFC5109] to when the RTP packets are half of the path MTU. 331 There are two cases to consider - interoperability with existing 332 endpoints and usage for calls between rtcweb endpoints. 334 For interoperability with existing endpoints, we argue the same thing 335 here as for retransmits. FEC is not commonly used in legacy voice 336 endpoints, and if it is supported, is never a required feature. 337 Consequently, if present, its usage can be disabled when 338 interoperating with an rtcweb endpoint. If FEC is included as part 339 of the rtcweb specifications, the lower bandwidth of voice means that 340 FEC packets could be sent on the same port, using [RFC2198], without 341 approaching the path MTU. 343 For communications between rtcweb endpoints, this is only an issue if 344 FEC is included as part of the rtcweb specification. If the group 345 decides to do that (there is some value for real-time video), it 346 should define a mechanism which allows for FEC packets to be sent 347 using a separate SSRC. 349 4.6. RTCP Issues 351 [I-D.perkins-rtcweb-rtp-usage] argues that multiplexing of voice and 352 video on the same RTP session will introduce complications in the 353 usage of RTCP, primarily when considering RTCP extensions. 355 It is our belief that normal RTCP operation as defined in the RTCP 356 specification will work fine with multiplexed voice and video 357 traffic. SRs and RRs are already generated per SSRC to handle 358 multiple senders, and RTCP in general supports feedback for multiple 359 SSRC within a session. These mechanisms work as defined when each 360 SSRC happens to represent a different media stream instead of a 361 different user. 363 The only complication that arises is for RTCP extensions which are 364 defined to be media dependent. [I-D.perkins-rtcweb-rtp-usage] points 365 out, as an example, the usage of RTCP extended report blocks (XR) 366 [RFC3611]. However, XR works fine in conjunction with multiplexing 367 of voice and video within the same port. Each of the seven report 368 blocks defined in [RFC3611] include the SSRC of the source as part of 369 the block, and thus will work. [I-D.perkins-rtcweb-rtp-usage] 370 indicates that "SSRC purpose tagging needs not only to be one the 371 media side, but also on the RTCP reporting". However, we do not 372 believe this to be accurate. Since the XR blocks report the SSRC 373 source already, the specifications provide all that is needed. The 374 XR report is merely included when it is relevant. 376 Furthermore, the discussion around XR assumes that we need to support 377 them for interoperability with existing VoIP endpoints, or we are 378 utilizing it for rtcweb itself. As with FEC and retransmissions, in 379 the case of interoperability, if there is an issue, XR can simply be 380 disabled in these cases. [RFC3611] does specify that XR can be sent 381 without prior signaling. In the worst case XR are received by an 382 rtcweb endpoint which are then discarded. In terms of usage of RTCP 383 XR for communications between rtcweb endpoints, we would argue that a 384 much more flexible solution would be to provide Javascript APis which 385 allow the application to have access to the same data used to 386 generate the XR, and then the application itself can use this data as 387 it sees fit, including sending it back to the sender through some 388 kind of application data packet. 390 5. Arguing Against a Shim 392 It has been proposed on the mailing list that an alternative approach 393 for multiplexing on the same port would be to specify a new 394 multiplexing protocol that has a small shim, which could then be used 395 to separate voice and video traffic as a layer between UDP and RTP. 396 Such a shim could then also be used to enable non-RTP data traffic as 397 well. 399 We believe that such a shim would be a mistake, for the same reason 400 that shims have been avoided in the multiplexing of RTCP, STUN, and 401 DTLS on the same port as RTP: 403 o The shim would break interoperability with a great deal of 404 existing network inspection gear - firewalls, packet sniffers, 405 traffic analyzers, and so on - which know how to extract, parse, 406 and process RTP packets. 408 o The shim would add complexity through yet another layer of 409 multiplexing. 411 o The shim would increase packet overhead further. 413 o A shim is a mistake which cannot be undone later. If multiplexing 414 on a single port truly causes interoperability issues, clients can 415 fall back to using multiple ports, possibly even in the 416 preponderance of cases. However, once a shim is inserted, 417 interoperability will always require an intermediary to strip it 418 out, forever. 420 6. Conclusion 422 In conclusion, we feel that benefits of multiplexing of voice and 423 video on a single RTP session (and thus single transport connection), 424 outweight the drawbacks. The primary benefit is the impact on NAT 425 capacity, which is becoming an important issue in the modern 426 Internet. Furthermore, the unique nature of backwards compatibility 427 for rtcweb lessens many of the interoperability concerns, and the 428 traditional arguments around multicast and RSVP are simply no longer 429 relevant and those technologies have faded from use. 431 7. Informative References 433 [I-D.perkins-rtcweb-rtp-usage] 434 Perkins, C., Westerlund, M., and J. Ott, "RTP Requirements 435 for RTC-Web", draft-perkins-rtcweb-rtp-usage-01 (work in 436 progress), June 2011. 438 [I-D.rosenberg-rtcweb-framework] 439 Rosenberg, J., Kaufman, M., Hiie, M., and F. Audet, "An 440 Architectural Framework for Browser based Real-Time 441 Communications (RTC)", draft-rosenberg-rtcweb-framework-00 442 (work in progress), February 2011. 444 [I-D.ietf-rtcweb-use-cases-and-requirements] 445 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 446 Time Communication Use-cases and Requirements", 447 draft-ietf-rtcweb-use-cases-and-requirements-01 (work in 448 progress), July 2011. 450 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 451 (ICE): A Protocol for Network Address Translator (NAT) 452 Traversal for Offer/Answer Protocols", RFC 5245, 453 April 2010. 455 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 456 Jacobson, "RTP: A Transport Protocol for Real-Time 457 Applications", STD 64, RFC 3550, July 2003. 459 [RFC4787] Audet, F. and C. Jennings, "Network Address Translation 460 (NAT) Behavioral Requirements for Unicast UDP", BCP 127, 461 RFC 4787, January 2007. 463 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 464 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 465 July 2006. 467 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 468 Correction", RFC 5109, December 2007. 470 [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control 471 Protocol Extended Reports (RTCP XR)", RFC 3611, 472 November 2003. 474 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 475 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 476 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 477 September 1997. 479 Authors' Addresses 481 Jonathan Rosenberg 482 Skype 484 Email: jdrosen@skype.net 485 URI: http://www.jdrosen.net 487 Cullen Jennings 488 Cisco 490 Email: fluffy@cisco.com 492 Jon Peterson 493 Neustar 495 Email: jon.peterson@neustar.biz 496 Matthew Kaufman 497 Skype 499 Email: matthew.kaufman@skype.net 501 Eric Rescorla 502 RTFM 504 Email: ekr@rtfm.com 506 Tim Terriberry 507 Mozilla 509 Email: tterriberry@mozilla.com