idnits 2.17.1 draft-lennox-rtcweb-rtp-media-type-mux-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2011) is 4558 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTCWEB J. Lennox 3 Internet-Draft Vidyo 4 Intended status: Standards Track J. Rosenberg 5 Expires: April 26, 2012 Skype 6 October 24, 2011 8 Multiplexing Multiple Media Types In a Single Real-Time Transport 9 Protocol (RTP) Session 10 draft-lennox-rtcweb-rtp-media-type-mux-00 12 Abstract 14 This document describes mechanisms and recommended practice for 15 transmitting media streams of multiple media types (e.g., audio and 16 video) over a single Real-Time Transport Protocol (RTP) session, 17 primarily for the use of Real-Time Communication for the Web 18 (rtcweb). 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 26, 2012. 37 Copyright Notice 39 Copyright (c) 2011 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Transmitting multiple types of media in a single RTP 57 session . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 3.1. Optimizations . . . . . . . . . . . . . . . . . . . . . . 5 59 4. Backward compatibility . . . . . . . . . . . . . . . . . . . . 6 60 5. Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 6. Protocols with SSRC semantics . . . . . . . . . . . . . . . . 8 62 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 63 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 64 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 65 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 66 9.2. Informative References . . . . . . . . . . . . . . . . . . 9 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 69 1. Introduction 71 Classically, multimedia sessions using the Real-Time Transport 72 Protocol (RTP) [RFC3550] have transported different media types (most 73 commonly, audio and video) in different RTP sessions, each with in 74 own transport flow. At the time RTP was designed, this was a 75 reasonable design decision, reducing system variability and adding 76 flexibility ([RFC3550] discusses the motivation for this design 77 decision in section 5.2). 79 However, the de facto architecture of the Internet has changed 80 substantially since RTP was originally designed, nearly twenty years 81 ago. In particular, Network Address Translators (NATs) and firewalls 82 are now ubiquitous, and IPv4 address space scarcity is becoming more 83 severe. As a consequence, the network resources used up by an 84 application, and its probability of failure, are directly 85 proportional to the number of distinct transport flows it uses. 87 Furthermore, applications have developed mechanisms (notably 88 Interactive Connectivity Establishment (ICE) [RFC5245]) to traverse 89 NATs and firewalls. The time such mechanisms need to perform the 90 traversal process is proportional to the number of distinct transport 91 flows in use. 93 As a result, in the modern Internet, it is advisable and useful to 94 revisit the transport-layer separation of media in a multimedia 95 session. Fortunately, the architecture of RTP allows this to be done 96 in a straightforward and natural way: by placing multiple sources of 97 different media types in the same RTP session. 99 Since this is architecturally somewhat different from existing RTP 100 deployments, however, this decision has some consequences that may be 101 non-obvious. Furthermore, it is somewhat complex to negotiate such 102 flows in signaling protocols that assumed the older architecture, 103 most notably the Session Description Protocol (SDP) [RFC4566]. The 104 rest of this document discusses these issues. 106 2. Terminology 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in RFC 2119 [RFC2119] and 111 indicate requirement levels for compliant implementations. 113 3. Transmitting multiple types of media in a single RTP session 115 RTP [RFC3550] supports the notion of multiple sources within a 116 session. Historically, this was typically used for distinct users 117 within a group to send media of the same type. Each source has its 118 own synchronization source (SSRC) value and has a distinct sequence 119 number and timestamp space. This document specifies that this same 120 mechanism is used to allow sources of multiple media types in the 121 same RTP session, even if they come from the same user. For example, 122 in a call containing audio and video between two users, each sending 123 a single audio and a single video source, there would be a single RTP 124 session containing two sources (one audio, one video) from each user, 125 for a total of four sources (and thus four SSRC values) within the 126 RTP session. 128 Transmitting multiple types of media in a single RTP [RFC3550] 129 session is done using the same RTP mechanisms as are used to transmit 130 multiple sources of the same media type on a session. Notably: 131 o Each stream (of every media type) is a distinct source (distinct 132 stream of consecutive packets to be sent to a decoder) and is 133 given a distinct synchronization source ID (SSRC), and has its own 134 distinct timestamp and sequence number space. 135 o Every media type (full media type and subtype, e.g. video/h264 or 136 audio/pcmu) has a distinct payload type value. The same payload 137 type value mappings apply across all sources in the session. 138 o RTP SSRCs, initial sequence numbers, and initial timestamps are 139 chosen at random, independently for each source (of each media 140 type). 141 o RTCP bandwidth is five percent of the total RTP session bandwidth. 142 o RTP session bandwidth and RTCP bandwidth are divided among all the 143 sources in the session. 144 o RTCP sender report (SR) or receiver report (RR) packets, and 145 source description (SDES) packets, are sent periodically for every 146 source in the session. 148 In other words, no special RTP mechanisms are specifically needed for 149 senders of multiplexed media. The only constraint is that senders 150 sources MUST NOT change the top-level media type (e.g. audio or 151 video) of a given source. (It remains valid to change a source's 152 subtype, e.g. switching between audio/pcmu and audio/g729.) 154 For a receiver, the primary complexity of multiplexing is knowing how 155 to process a received source. Without multiplexing, all sources in 156 an RTP session can (in theory) be processed the same manner; e.g., 157 all audio sources can be fed to an audio mixer, and all video sources 158 displayed on a screen. With multiplexing, however, receivers must 159 apply additional knowledge. 161 If the streams being multiplexed are simply audio and video, this 162 processing can decision can be made based simply on a source's 163 payload type. For more complex situations (for example, simultaneous 164 live-video and shared-application sources, both sent as video), 165 signaling-level descriptions of sources would be needed, using a 166 mechanism such as SDP Source Descriptions [RFC5576]. 168 Additionally, due to the large difference in typical bitrate between 169 different media (video can easily use a bit rate an order of 170 magnitude or more larger than audio), some complications arise with 171 RTCP timing. Because RTCP bandwidth is shared evenly among all 172 sources in a session, the RTCP for an audio source can end up being 173 sent significantly more frequently than it would in a non-multiplexed 174 session. (The RTCP for video will, correspondingly, be sent slightly 175 less frequently; this is not nearly as serious an issue.) 177 For RTP sessions that use RTP's recommended minimum fixed timing 178 interval of 5 seconds, this problem is not likely to arise, as most 179 sessions' bandwidth is not so low that RTCP timing exceeds this 180 limit. The RTP/AVP [RFC3551] or RTP/SAVP [RFC3711] profiles use this 181 minimum interval by default, and do not have a mechanism in SDP to 182 negotiate an alternate interval. 184 For sessions using the RTP/AVPF [RFC4585] and RTP/SAVPF [RFC5124] 185 profiles, however, endpoints SHOULD set the minimum RTCP regular 186 reporting interval trr-int to 5000 (5 seconds), unless they 187 explicitly need it to be lower. This minimizes the excessive RTCP 188 bandwidth consumption, as well as aiding compatibility with AVP 189 endpoints. Since this value only affects regular RTCP reports, not 190 RTCP feedback, this does not prevent AVPF feedback messages from 191 being sent as needed. 193 3.1. Optimizations 195 For multiple sources in the same session, several optimizations are 196 possible. (Most of these optimizations also apply to multiple 197 sources of the same type in a session.) In all cases, endpoints MUST 198 be prepared for their peers to be using these optimizations. 200 An endpoint sending multiple sources MAY, as needed, reallocate media 201 bandwidth among the RTP sources it is sending. This includes adding 202 or removing sources as more or less bandwidth becomes available. 204 An endpoint MAY choose to send multiple sources' RTCP messages in a 205 single compound RTCP packet (though such compound packets SHOULD NOT 206 exceed the path MTU, if avoidable and if it is known). This will 207 reduce the average compound RTCP packet size, and thus increase the 208 frequency with which RTCP messages can be sent. Regular (non- 209 feedback) RTCP compound packets MUST still begin with an SR or RR 210 packet, but otherwise may contain RTCP packets in any order. 211 Receivers MUST be prepared to receive such compound packets. 213 An endpoint SHOULD NOT send reception reports from one of its own 214 sources about another one ("cross-reports"). Such reports are 215 useless (they would always indicate zero loss and jitter) and use up 216 bandwidth that coud more profitably be used to send information about 217 remote sources. Endpoints receiving reception reports MUST be 218 prepared that their peers might not be sending reception reports 219 about their own sources. (A naive RTCP monitor might think that 220 there is a network disconnection between these sources; however, 221 architecturally it is very unclear if such monitors actually exist, 222 or would care about a disconnection of this sort.) 224 Similarly, an endpoint sending multiple sources SHOULD NOT send 225 reception reports about a remote source from more than one of its 226 local sources. Instead, it SHOULD pick one of its local sources as 227 the "reporting" source for each remote source, which sends full 228 report blocks; all its other sources SHOULD be treated as if they 229 were disconnected, and never saw that remote source. An endpoint MAY 230 choose different local sources as the reporting source for different 231 remote sources (for example, it could choose to send reports about 232 remote audio sources from its local audio source, and reports about 233 remote video sources from its local video source), or it MAY choose a 234 single local source for all its reports. If the reporting source 235 leaves the session (sends BYE), another reporting source MUST be 236 chosen. This "reporting" source SHOULD also be the source for any 237 AVPF feedback messages about its remote sources, as well. Endpoints 238 interpreting reception reports MUST be prepared to receive RTCP SR or 239 RR messages where only one remote source is reporting about its 240 sources. 242 4. Backward compatibility 244 In some circumstances, the offerer in an offer/answer exchange 245 [RFC3264] will not know whether the peer which will receive its offer 246 supports media type multiplexing. 248 In scenarios where endpoints can rely on their peers supporting 249 Interactive Connectivity Establishment (ICE) [RFC5245], even if they 250 might not support multiplexing, this should not be a problem. An 251 endpoint could construct a list of ICE candidates for its single 252 session, and then offer that list, for backward compatibility, toward 253 each of the peers; it would disambiguate the flows based on the ufrag 254 fields in the received ICE connectivity checks. (This would result 255 in the chosen ICE candidates participating in multiple RTP sessions, 256 in much the same manner as following a forked SIP offer.) For 257 RTCWeb, it is currently anticipated that ICE will be required in all 258 cases, for consent verification. 260 The more difficult case is if an offerer cannot reply on its 261 potential peers supporting any features beyond baseline RTP (i.e., 262 neither ICE nor multiplexing). In this case, it would either need to 263 be prepared to use only a single media type (e.g., audio) with such a 264 peer, or else will need to do the pre-offer steps to set up all the 265 non-multiplexed sessions. Notably, this would include opening local 266 ports, and doing ICE address gathering (collecting candidate 267 addresses from STUN and/or TURN servers) for each session, even if it 268 is anticipated that in most cases backward compatibility is not going 269 to be necessary. 271 If the signaling protocol in use supports sending additional ICE 272 candidates for an ongoing ICE exchange, or updating the destination 273 of a non-ICE RTP session, it is instead possible for an offerer to do 274 such gathering lazily, e.g. opening only local host candidates for 275 the non-default RTP sessions, and gathering and offering additional 276 candidates or public relay addresses once it becomes clear that they 277 are needed. (With SIP, sending updated candidates or RTP 278 destinations prior to the call being answered is possible only if 279 both peers support the SIP 100rel feature [RFC3262], i.e. PRACK and 280 UPDATE; otherwise, the initial offer cannot be updated until after 281 the 200 OK response to the initial INVITE.) 283 5. Signaling 285 There is a need to signal multiplexed media in the Session 286 Description Protocol (SDP) [RFC4566] -- for inter-domain federation 287 in the case of RTCWeb, as well as for "pure" SIP endpoints that also 288 want to use media-multiplexed sessions. 290 To signal multiplexed sessions, two approaches seem to present 291 themselves: either using the SDP grouping framework [RFC5888], as in 292 [I-D.holmberg-mmusic-sdp-bundle-negotiation], or directly 293 representing the multiplexed sessions in SDP. 295 Directly encoded multiplexed sessions would have some grammar issues 296 in SDP, as the syntax of SDP mixes together top-level media types and 297 transport information in the m= line, splitting media types to be 298 partially described in the m= line and partially in the a=rtpmap 299 attribute. New SDP attributes would need to be invented to describe 300 the top-level media types for each source. 302 m=multiplex 49170 RTP/AVP 96 97 303 a=mediamap:96 video 304 a=rtpmap:96 H264/90000 305 a=mediamap:97 audio 306 a=rtpmap:97 pcmu/8000 308 Figure 1: Hypothetical syntax for describing multiplexed media lines 309 in SDP 311 If single-pass backward compatibility is (ever) a goal, directly 312 encoding multiplexed sessions in SDP m= lines becomes much more 313 complex, as it would require SDP Capability Negotiation [RFC5939] in 314 order to offer both the legacy and the multiplexed streams. 316 Using SDP grouping seems to rule out the possibility of non-backward- 317 compatible multiplexed streams. Other than that, however, it seems 318 that it would be the easier path to signal multiplexed sessions. 320 6. Protocols with SSRC semantics 322 There are some RTP protocols that impose semantics on SSRC values. 323 Most notably, there are several protocols (for instance, FEC 324 [RFC5109], layered codecs [RFC5583], or RTP retransmission [RFC4588]) 325 have modes that require that sources in multiple RTP sessions have 326 the same SSRC value. 328 When multiplexing, this is impossible. Fortunately, in each case, 329 there are alternative ways to do this, by explicitly signaling RTP 330 SSRC values [RFC5576]. Thus, when multiplexing, these modes need to 331 be used instead. 333 It is unclear how to signal this in a backward-compatible way 334 (falling back to session-multiplexed modes) if SDP grouping semantics 335 are used to described multiplexed sources in SDP. 337 7. Security Considerations 339 The security considerations of a muxed stream appear to be similar to 340 those of multiple sources of the same media type in an RTP session. 342 Notably, it is crucial that SSRC values are never used more than once 343 with the same SRTP keys. 345 8. IANA Considerations 347 The IANA actions required depend on the decision about how muxed 348 streams are signaled. 350 9. References 352 9.1. Normative References 354 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 355 Requirement Levels", BCP 14, RFC 2119, March 1997. 357 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 358 Jacobson, "RTP: A Transport Protocol for Real-Time 359 Applications", STD 64, RFC 3550, July 2003. 361 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 362 "Extended RTP Profile for Real-time Transport Control 363 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 364 July 2006. 366 9.2. Informative References 368 [I-D.holmberg-mmusic-sdp-bundle-negotiation] 369 Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation 370 Using Session Description Protocol (SDP) Port Numbers", 371 draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in 372 progress), October 2011. 374 [RFC3262] Rosenberg, J. and H. Schulzrinne, "Reliability of 375 Provisional Responses in Session Initiation Protocol 376 (SIP)", RFC 3262, June 2002. 378 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 379 with Session Description Protocol (SDP)", RFC 3264, 380 June 2002. 382 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 383 Video Conferences with Minimal Control", STD 65, RFC 3551, 384 July 2003. 386 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 387 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 388 RFC 3711, March 2004. 390 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 391 Description Protocol", RFC 4566, July 2006. 393 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 394 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 395 July 2006. 397 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 398 Correction", RFC 5109, December 2007. 400 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 401 Real-time Transport Control Protocol (RTCP)-Based Feedback 402 (RTP/SAVPF)", RFC 5124, February 2008. 404 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 405 (ICE): A Protocol for Network Address Translator (NAT) 406 Traversal for Offer/Answer Protocols", RFC 5245, 407 April 2010. 409 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 410 Media Attributes in the Session Description Protocol 411 (SDP)", RFC 5576, June 2009. 413 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 414 Dependency in the Session Description Protocol (SDP)", 415 RFC 5583, July 2009. 417 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 418 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 420 [RFC5939] Andreasen, F., "Session Description Protocol (SDP) 421 Capability Negotiation", RFC 5939, September 2010. 423 Authors' Addresses 425 Jonathan Lennox 426 Vidyo, Inc. 427 433 Hackensack Avenue 428 Seventh Floor 429 Hackensack, NJ 07601 430 US 432 Email: jonathan@vidyo.com 433 Jonathan Rosenberg 434 Skype 436 Email: jdrosen@skype.net 437 URI: http://www.jdrosen.net