idnits 2.17.1 draft-jennings-rtcweb-plan-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii) Publication Limitation clause. If this document is intended for submission to the IESG for publication, this constitutes an error. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4566], [RFC3550]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 13 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 19 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The reoccurring theme of this draft is that SDP [RFC4566] already has a way of solving many of the problems being discussed at the RTCWeb WG and we SHOULD not try to invent something new but rather re-use the existing methods for describing RTP [RFC3550] media flows. -- The document date (February 25, 2013) is 4079 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 4583 (Obsoleted by RFC 8856) -- Obsolete informational reference (is this intentional?): RFC 4756 (Obsoleted by RFC 5956) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Jennings 3 Internet-Draft Cisco 4 Intended status: Informational February 25, 2013 5 Expires: August 29, 2013 7 Proposed Plan for Usage of SDP and RTP 8 draft-jennings-rtcweb-plan-01 10 Abstract 12 This draft outlines a bunch of the remaining issues in RTCWeb related 13 to how the the W3C APIs map to various usages of RTP and the 14 associated SDP. It proposes one possible solution to that problem 15 and outlines several chunks of work that would need to be put into 16 other drafts or result in new drafts being written. The underlying 17 design guideline is to, as much as possible, re-use what is already 18 defined in existing SDP [RFC4566] and RTP [RFC3550] specifications. 20 This draft is not intended to become an specification but is meant 21 for working group discussion to help build the specifications. It is 22 being discussed on the rtcweb@ietf.org mailing list though it has 23 topics relating to the CLUE WG, MMUSIC WG, AVT* WG, and WebRTC WG at 24 W3C. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. This document may not be modified, 30 and derivative works of it may not be created, and it may not be 31 published except as an Internet-Draft. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on August 29, 2013. 45 Copyright Notice 47 Copyright (c) 2013 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 4. Background/Solution Overview . . . . . . . . . . . . . . . . . 9 66 5. Overall Design . . . . . . . . . . . . . . . . . . . . . . . . 11 67 6. Example Mappings . . . . . . . . . . . . . . . . . . . . . . . 12 68 6.1. One Audio, One Video, No bundle/multiplexing . . . . . . 12 69 6.2. One Audio, One Video, Bundle/multiplexing . . . . . . . . 12 70 6.3. One Audio, One Video, Simulcast, Bundle/multiplexing . . 12 71 6.4. One Audio, One Video, Bundle/multiplexing, Lip-Sync . . . 12 72 6.5. One Audio, One Active Video, 5 Thumbnails, 73 Bundle/multiplexing . . . . . . . . . . . . . . . . . . . 12 74 6.6. One Audio, One Active Video, 5 Thumbnails, Main 75 Speaker Lip-Sync, Bundle/multiplexing . . . . . . . . . . 13 76 7. Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 14 77 7.1. Correlation and Multiplexing . . . . . . . . . . . . . . 15 78 7.2. Multiple Render . . . . . . . . . . . . . . . . . . . . . 18 79 7.2.1. Complex Multi Render Example . . . . . . . . . . . . . 18 80 7.3. Dirty Little Secrets . . . . . . . . . . . . . . . . . . 22 81 7.4. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 22 82 7.5. Confusions . . . . . . . . . . . . . . . . . . . . . . . 22 83 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 84 9. Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 85 10. Security Considerations . . . . . . . . . . . . . . . . . . . 28 86 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 87 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 30 88 13. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 31 89 14. Existing SDP . . . . . . . . . . . . . . . . . . . . . . . . . 32 90 14.1. Multiple Encodings . . . . . . . . . . . . . . . . . . . 32 91 14.2. Forward Error Correction . . . . . . . . . . . . . . . . 33 92 14.3. Same Video Codec With Different Settings . . . . . . . . 33 93 14.4. Different Video Codecs With Different Resolutions 94 Formats . . . . . . . . . . . . . . . . . . . . . . . . . 34 95 14.5. Lip Sync Group . . . . . . . . . . . . . . . . . . . . . 34 96 14.6. BFCP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 97 14.7. Retransmission . . . . . . . . . . . . . . . . . . . . . 35 98 14.8. Layered coding dependency . . . . . . . . . . . . . . . . 37 99 14.9. SSRC Signaling . . . . . . . . . . . . . . . . . . . . . 38 100 14.10. Content Signaling . . . . . . . . . . . . . . . . . . . . 39 101 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 102 15.1. Normative References . . . . . . . . . . . . . . . . . . 40 103 15.2. Informative References . . . . . . . . . . . . . . . . . 40 104 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 42 106 1. Overview 108 The reoccurring theme of this draft is that SDP [RFC4566] already has 109 a way of solving many of the problems being discussed at the RTCWeb 110 WG and we SHOULD not try to invent something new but rather re-use 111 the existing methods for describing RTP [RFC3550] media flows. 113 The general theory is that, roughly speaking, the m-line corresponds 114 to flow of packets that can be handled by the application in the same 115 way. This often results in more m-lines than there are media sources 116 such as microphones or cameras. Forward Error Correction (FEC) is 117 done with multiple M-lines as shown in [RFC4756]. Retransmission 118 (RTX) is done with multiple m-lines as shown in [RFC4588]. Layered 119 coding is done with multiple m-lines as shown in [RFC5583]. 120 Simulcast, which is really just multiple video stream from the same 121 camera, much like layered coding but with no inter m-line dependency, 122 is done with multiple m-lines modeled after the Layered coding 123 defined in in [RFC5583]. 125 The significant addition to SDP semantics is an multi-render media 126 level attribute that allows a device to indicate that it makes sense 127 to simultaneously use multiple stream of video that will be 128 simultaneously displayed but share the same SDP characteristics and 129 semantics such that they can all be negotiated under a single m-line. 130 When using features like RTX, FEC, and Simulcast in a multi-render 131 situation, there needs to be a way to correlate a given related media 132 flow with the correct "base" media-flow. This is accomplished by 133 having the related flows carry, in the CSRC, the SSRC of their base 134 flow. An example SDP might look like as provided in the example 135 Section 7.2.1. 137 This draft also propose that advanced usages, including WebRTC to 138 WebRTC scenarios, uses a Media Stream Identifier (MSID) that is 139 signaled in SDP and also attempts to negotiate the usage of a RTP 140 header extension to include the MSID in the RTP packet. This 141 resolves many long term issues. 143 This does results in lots of m lines but all the alternatives designs 144 resulted in an roughly equivalent number of SSRC lines with a 145 possibility of redefining most of the media level attributes. So 146 it's really hard to see the big benefits defining something new over 147 what we have. One of the concerns about this approach is the time to 148 collect all the ICE candidates needed for the initial offer. 149 Section 7.2.1 provides mitigations to reduce the number of ports 150 needed to be the same as an alternative SSRC based design. This 151 assumes that it is perfectly feasible to transport SDP that much 152 larger than a single MTU. The SIP [RFC3261] usage of SDP has 153 successfully passed over this long ago. In the cases where the SDP 154 is passed over web mechanisms, it is easy to use compression and the 155 size of SDP is more of an optimization criteria than a limiting 156 issue. 158 2. Terminology 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", 161 "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be 162 interpreted as described in [RFC2119]. 164 This draft uses the API and terminology described in [webrtc-api]. 166 Transport-Flow: An transport 5 Tuple representing the UDP source and 167 destination IP address and port over which RTP is flowing. 169 5-tuple: A collection of the following values: source IP address, 170 source transport port, destination IP address, destination transport 171 port and transport protocol. 173 PC-Track: A source of media (audio and/or video) that is contained in 174 a PC-Stream. A PC-Track represents content comprising one or more 175 PC-Channels. 177 PC-Stream: Represents stream of data of audio and/or video added to a 178 Peer Connection by local or remote media source(s). A PC-Stream is 179 made up of zero or more PC-Tracks. 181 m-line: An SDP [RFC4566] media description identifier that starts 182 with "m=" field and conveys following values:media type,transport 183 port,transport protocol and media format descriptions. 185 m-block: An SDP [RFC4566] media description that starts with an 186 m-line and is terminated by either the next m-line or by the end of 187 the session description. 189 Offer: An [RFC3264] SDP message generated by the participant who 190 wishes to initiate a multimedia communication session. An Offer 191 describes participants capabilities for engaging in a multimedia 192 session. 194 Answer: An [RFC3264] SDP message generated by the participant in 195 response to an Offer. An Answer describes participants capabilities 196 in continuing with the multimedia session with in the constraints of 197 the Offer. 199 This draft avoids using terms that implementors do not have a clear 200 idea of exactly what they are - for example RTP Session. 202 3. Requirements 204 The requirements listed here are a collection of requirements that 205 have come from WebRTC, CLUE, and the general community that uses RTP 206 for interactive communications based on Offer/Answer. It does not 207 try to meet the needs of streaming usages or usages involving 208 multicast. This list also does not try to list every possible 209 requirement but instead outlines the ones that might influence the 210 design. 212 o Devices with multiple audio and/or video sources 214 o Devices that display multiple streams of video and/or render 215 multiple streams of audio 217 o Simulcast, wherein a video from a single camera is sent in a few 218 independent video streams typically at different resolutions and 219 frame rates. 221 o Layered Codec such as H.264 SVC 223 o One way media flows and bi-directional media flows 225 o Support asymmetry, i.e. to send a different number of type of 226 media streams that you receive. 228 o Mapping W3C PeerConnection (PC) aspects into SDP and RTP. It is 229 important that the SDP be descriptive enough that both sides can 230 get the same view of various identifiers for PC-Tracks, PC-Streams 231 and their relationships. 233 o Support of Interactive Connectivity Establishment (ICE) [RFC5245] 235 o Support of multiplexing multiple media flows, possible of 236 different media types, on same 5-tuple. 238 o Synchronization - It needs to be clear how implementations deal 239 with synchronization, in particular usages of both CNAME and LS 240 group. The sender needs be able to indicate which Media Flows are 241 intended to be synchronized and which are not. 243 o Redundant codings - The ability to send some media, such as the 244 audio from a microphone, multiple times. For example it may be 245 sent with a high quality wideband codec and a low bandwidth codec. 246 If packets are lost from the high bandwidth steam, the low 247 bandwidth stream can be used to fill in the missing gaps of audio. 248 This is very similar to simulcast. 250 o Forward Error Correction - Support for various RTP FEC schemes. 252 o RSVP QoS - Ability to signal various QoS mechanism such Single 253 Reservation Flow (SRF) group 255 o Desegregated Media (FID group) - There is a growing desire to deal 256 with endpoints that are distributed - for example a video phone 257 where the incoming video is displayed on the an IP TV but the 258 outgoing video comes from a tablet computer. This results in 259 situations where the SDP sets up a session with not all the media 260 transmitted to a single IP address. 262 o In flight change of codec: Support for system that can negotiate 263 the uses of more than one codec for a given media flow and then 264 the sender can arbitrarily switch between them when they are 265 sending but they only send with one codec as at time. 267 o Distinguish simulcast (e.g. multiple encoding of same source) from 268 multiple different sources 270 o Support for Sequential and Parallel forking at the SIP level 272 o Support for Early Media 274 o Conferencing environments with Transcoding MCU that decodes/mixes/ 275 recodes the media 277 o Conferencing environments with Switching MCU where the MCU mucks 278 the header information of the media and do not decode/recode all 279 the media 281 4. Background/Solution Overview 283 The basic unit of media description in SDP is the m-line/m-block. 284 This allows any entity defined by a single m-block to be individually 285 negotiated. This negotiation applies not only to individual sources 286 (e.g., cameras) but also to individual components that come from a 287 single source, such as layers in SVC. 289 For example, consider negotiation of FEC as defined in [RFC4756]. 291 Offer 293 v=0 294 o=adam 289083124 289083124 IN IP4 host.example.com 295 s=ULP FEC Seminar 296 t=0 0 297 c=IN IP4 192.0.2.0 298 a=group:FEC 1 2 299 a=group:FEC 3 4 301 m=audio 30000 RTP/AVP 0 302 a=mid:1 304 m=audio 30002 RTP/AVP 100 305 a=rtpmap:100 ulpfec/8000 306 a=mid:2 308 m=video 30004 RTP/AVP 31 309 a=mid:3 311 m=video 30004 RTP/AVP 101 312 c=IN IP4 192.0.2.1 313 a=rtpmap:101 ulpfec/8000 314 a=mid:4 316 When FEC is expressed this way, the answerer can selectively accept 317 or reject the various streams by setting the port in the m-line to 318 zero. RTX [RFC4588], layered coding [RFC5583], and Simulcast are all 319 handled the same way. Note that while it is also possible to 320 represent FEC and SVC using source-specific attributes [RFC5576], 321 that mechanism is less flexible because it does not permit selective 322 acceptance and rejection as described in [RFC 5576; Section 8]. Most 323 deployed systems which implement FEC, layered coding, etc. do so with 324 each component on a separate m-line. 326 Unfortunately, this strategy runs into problems when combined with 327 two new features that are desired for WebRTC: 329 m-line multiplexing (bundle): 331 The ability to send media described in multiple media over the 332 same 5-tuple. 334 multi-render: 336 The ability to have large numbers of multiple similar media flows 337 (e.g., multiple cameras). The paradigmatic case here is multiple 338 video thumbnails. 340 Obviously, this strategy does not scale to large numbers For 341 instance, consider the case where we want to be able to transmit 35 342 video thumbnails (this is large, but not insane). In the model 343 described above, each of these flows would need its own m-line and 344 its own set of codecs. If each side supports three separate codecs 345 (e.g., H.261, H.263, and VP8), then we have just consumed 105 payload 346 types, which exceeds the available dynamic payload space. 348 In order to resolve this issue, it is necessary to have multiple 349 flows (e.g., multiple thumbnails) indicated by the same m-line and 350 using the same set of payload types (see Section XXX for proposed 351 syntax for this.) Because each source has its own SSRC, it is 352 possible to divide the RTP packets into individual flows. However, 353 this solution still leaves us with two problems: 355 o How to individually address specific RTP flows in order to, for 356 instance, order them on a page or display flow-specific captions. 358 o How to determine the relationship between multiple variants of the 359 same stream. For instance, if we have multiple cameras each of 360 which is present in a layered encoding, we need to be able to 361 determine which layers go together. 363 For reasons described in Section 5, the SSRC learned visa SDP is not 364 suitable for individually addressing RTP flows. Instead, we 365 introduce a new identifier, the MSID, which can be carried both in 366 the SDP and the RTP and therefore can be used to correlate SDP 367 elements to RTP elements. See Section 7.1 369 By contrast, we can use RTP-only mechanisms to express the 370 correlation between RTP flows: while all the flows associated with a 371 given camera have distinct SSIDs, we can use the CSRC to indicate 372 which flows belong together. This is described in Section 7.2 374 5. Overall Design 376 The basic unit of media description in SDP is the m-line/m-block and 377 this document continues with that assumption. In general, different 378 cameras, microphones, etc. are carried on different m-lines. The 379 exceptions to this rule is when using the multi-render extension in 380 which case: 382 o Multiple sources which are semantically equivalent and multiplexed 383 on a time-wise basis. For instance, if an MCU mixes multiple 384 camera feeds but only some subset is displayed at a time, they can 385 all appear on the same m-line. 387 By contrast, multiple sources which are semantically distinct cannot 388 appear on the same m-line because that does not allow for clear 389 negotiation of which sources are acceptable, or which sets of RTP 390 SSRCs correspond to which flow. 392 The second basic assumption is that SSRCs cannot always be safely 393 used to associate RTP flows with information in the SDP. There are 394 two reasons for this. First, in an offer/answer setting, RTP can 395 appear at the offerer before the answer is received; if SSRC 396 information from the offerer is required, then these RTP packets 397 cannot be interpreted. The second reason is that RTP permits SSRCs 398 to be changed at any time. 400 This assumption makes clear why the two exceptions to the "one flow 401 per m-line" rule work. In the case of time-based multiplexing (multi 402 render) of camera sources, all the cameras are equivalent from the 403 receiver's perspective; he merely needs to know which ones to display 404 now and he does that based on which ones have been most recently 405 received. In the case of multiple versions of the same content, 406 payload types or payload types plus SSRC can be used to distinguish 407 the different versions. 409 6. Example Mappings 411 This section shows a number of sample mappings in abstract form. 413 6.1. One Audio, One Video, No bundle/multiplexing 415 Microphone --> m=audio --> Speaker > 5-Tuple 417 Camera --> m=video --> Window > 5-Tuple 419 6.2. One Audio, One Video, Bundle/multiplexing 421 Microphone --> m=audio --> Speaker \ 422 > 5-Tuple 423 Camera --> m=video --> Window / 425 6.3. One Audio, One Video, Simulcast, Bundle/multiplexing 427 Microphone --> m=audio --> Speaker \ 428 | 429 Camera +-> m=video -\ > 5-Tuple 430 | ?-> Window | 431 +-> m=video -/ / 433 6.4. One Audio, One Video, Bundle/multiplexing, Lip-Sync 435 Microphone --> m=audio --> Speaker \ 436 > 5-Tuple, Lip-Sync 437 Camera --> m=video --> Window / group 439 6.5. One Audio, One Active Video, 5 Thumbnails, Bundle/multiplexing 441 Microphone --> m=audio --> Speaker \ 442 | 443 Camera --> m=video --> Window | 444 > 5-Tuple 445 Camera --> m=video --> 5 Small Windows | 446 Camera a=multi-render:5 | 447 ... / 449 Note that in this case the payload types must be distinct between the 450 two video m-lines, because that is what is used to demultiplex. 452 6.6. One Audio, One Active Video, 5 Thumbnails, Main Speaker Lip-Sync, 453 Bundle/multiplexing 455 Microphone --> m=audio --> Speaker \ \ Lip-sync 456 | > group 457 Camera --> m=video --> Window | / 458 > 5-Tuple 459 Camera --> m=video --> 5 Small Windows | 460 Camera a=multi-render:5 | 461 ... / 463 7. Solutions 465 This section outlines a set of rules for the usage of SDP and RTP 466 that seems to deal with the various problems and issues that have 467 been discussed. Most of these are not new and are pretty much how 468 many systems do it today. Some of them are new, but all the items 469 requiring new standardization work are called out in the Section 9. 471 Approach: 473 1. If a system wants to offer to send two sources, such as two 474 camera, it MUST use a separate m-block for each source. The 475 means that each PC-Track corresponds to one or more m-blocks. 477 2. In cases such as FEC, simulcast, SVC, each repair stream, layer, 478 or simulcast media flow will get an m-block per media flow. 480 3. If a systems wants to receive two streams of video to display in 481 two different windows or screens, it MUST use separate m-blocks 482 for each unless explicitly signaled to be otherwise (see 483 Section 7.2). 485 4. Unless explicitly signaled otherwise (see Section 7.2), if a 486 given m-line receives media from multiple SSRCs, only media from 487 the most recently received SSRC SHOULD be rendered and other 488 SSRC SHOULD NOT and if it is video it SHOULD be rendered in the 489 same window or screen. 491 5. If a camera is sending simulcast video and three resolutions, 492 each resolution MUST get its own m-block and all the three 493 m-blocks will be grouped. A new SDP group will be defined for 494 this. 496 6. If a camera is using a layered codec with three layers, there 497 MUST be an m-block for each, and they will be grouped using 498 standard SDP for grouping layers. 500 7. To aid in synchronized playback, there is exactly one, and only 501 one, LS group for each PC-Stream. All the m-blocks for all the 502 PC-Tracks in a given PC-Stream are synchronized so they are all 503 put in one LS group. All the PC-Tracks in a given PC-Stream 504 have the same CNAME. If a PC-Track appears in more than one PC- 505 Stream, then all the PC-Streams with that PC-Track MUST have the 506 same CNAME. 508 8. One way media MUST use the sendonly or recvonly attributes. 510 9. Media lines that are not currently in use but may be used later, 511 so that the resources need to be kept allocated, SHOULD use the 512 inactive attribute. 514 10. If an m-line will not be used, or it is rejected, it MUST have 515 its port set to zero. 517 11. If a video switching MCU produces a virtual "active speaker" 518 media flow, that media flow should have its own SSRC but include 519 the SSRC of the current speaker's video in the CSRC packets it 520 produces. 522 12. For each PC-Track, the W3C API MUST provide a way to set and 523 read the CSRC list, set and read the content RFC 4574 "label", 524 and read the SSRC of last packet received on a PC-Track. 526 13. The W3C API should have a constraint or API method to allow a 527 PC-Stream to indicate the number of multi-render video streams 528 it can accept. Each time a new stream is received up to the 529 maximum, a new PC-Track will be created. 531 14. Applications MAY signal all the SSRC they intend to send using 532 RFC 5576, but receivers need to be careful in their usage of the 533 SSRC in signaling, as the SSRC can change when there is a 534 collision and it takes time before that will be updated in 535 signaling. 537 15. Applications can get out of band "roster information" that maps 538 the names of various speakers or other information to the MSID 539 and/or SSRCs that a user is using 541 16. Applications MAY use RFC 4574 content labels to indicate the 542 purpose of the video. The additional content types, main-left 543 and main-right, need to be added to support two- and three- 544 screen systems. 546 17. The CLUE WG might want to consider SDP to signal the 3D location 547 and field of view parameters for captures and renderers. 549 18. The W3C API allows a "label" to be set for the PC-Track. This 550 MUST be mapped to the SDP label attribute. 552 7.1. Correlation and Multiplexing 554 The port number that RTP is received on provides the primary 555 mechanism for correlating it to the correct m-line. However, when 556 the port does not uniquely male the RTP packet to the correct m-block 557 (such as in multiplexing and other cases), the next thing that can be 558 looked at is the PT number. Finally there are cases where SSRC can 559 be used if that was signaled. 561 There are some complications when using SSRC for correlation with 562 signaling. First, the offerer may end up receiving RTP packets 563 before receiving the signaling with the SSRC correlation information. 564 This is because the sender of the RTP chooses the SSRC; there is no 565 way for the receiver to signal how some of the bits in the SSRC 566 should be set. Numerous attempts to provide a way to do this have 567 been made, but they have all been rejected for various reasons, so 568 this situation is unlikely to change. The second issue is that the 569 signaled SSRC can change, particularly in collision cases, and there 570 is no good way to know when SSRC are changing, such that the 571 currently signaled SSRC usage maps to the actual RTP SSRC usage. 572 Finally SSRC does not always provide correlation information between 573 media flows - take for example trying to look at SSRC to tell that an 574 audio media flow and video media flow came from the same camera. The 575 nice thing about SSRC is that they are also included in the RTP. 577 The proposal here is to extend the MSID draft to meet these needs: 578 each media flow would have a unique MSID and the MSID would have some 579 level of internal structure, which would allow various forms of 580 correlation, including what WebRTC needs to be able to recreate the 581 MS-Stream / MS-Track hierarchy to be the same on both sides. In 582 addition, this work proposes creating an optional RTP header 583 extension that could be used to carry the MSID for a media flow in 584 the RTP packets. This is not absolutely needed for the WebRTC use 585 cases but it helps in the case where media arrives before signaling 586 and it helps resolve a broader category of web conferencing use 587 cases. 589 The MSID consists of three things and can be extended to have more. 590 It has a device identifier, which corresponds to a unique identifier 591 of the device that created the offer; one or more synchronization 592 context identifiers, which is a number that helps correlate different 593 synchronized media flows; and a media flow identifier. The 594 synchronization identifier and flow identifier are scoped within the 595 context of the device identifier, but the device identifier is 596 globally unique. The suggested device identifier is a 64-bit random 597 number. The synchronization group is an integer that is the same for 598 all media flows that have this device identifier and are meant to be 599 synchronized. Right now there can be more than one synchronization 600 identifier, but the open issues suggest that one would be preferable. 601 The flow identifier is an integer that uniquely identifies this media 602 flow within the context of the device identifier. 604 Open Issues: how to know if the MSID RTP Header Extension should be 605 included in the RTP? 606 An example MSID for a device identifier of 12345123451234512345, 607 synchronization group of 1, and a media flow id of 3 would be: 609 a=msid:12345123451234512345 s:1 f:3 611 When the MSID is used in an answer, the MSID also has the remote 612 device identifier included. In the case where the device ID of the 613 device sending the answer was 22222333334444455555, the MSID would 614 look like: 616 a=msid:22222333334444455555 s:1 f:3 r:12345123451234512345 618 Note: The 64 bit size for the device identifier was chosen as it 619 allows less than a one in a million chance of collision with greater 620 than 10,000 flows (actually it allows this probability with more like 621 6 million flows). Much smaller numbers could be used but 32 bits is 622 probably too small. More discussion on the size of this and the 623 color of the bike shed is needed. 625 When used in the WebRTC context, each PeerConnection should generate 626 a unique device identifier. Each PC-Stream in the PeerConnection 627 will get a a unique synchronization group identifier, and each PC- 628 Track in the Peer Connection will get a unique flow identifier. 629 Together these will be used to form the MSID. The MSID MUST be 630 included in the SDP offer or answer so that the WebRTC connection on 631 the remote side can form the correct structure of remote PC-Streams 632 and PC-Tracks. If a WebRTC client receives an Offer with no MSID 633 information and no LS group information, it MUST put all the remote 634 PC-Tracks into a single PC-Stream. If there is LS group information 635 but no MSID, a PC-Stream for each LS group MUST be created and the 636 PC-Tracks put in the appropriate PC-Stream. 638 The W3C specs should be updated to have the ID attribute of the MS- 639 Stream be the MSID with no flow identifier, and the ID attribute of 640 the MS-Track be the MSID. 642 In addition, the SDP will attempt to negotiate sending the MSID in 643 the RTP using a RTP Header Extension. WebRTC clients SHOULD also 644 include the a=ssrc attributes if they know which SSRC they plan to 645 send but they can not rely on this not changing, being compete, or 646 existing in all offers or answers they receive - particularly when 647 working with SIP endpoints. 649 When using multiplexing, the SDP MUST be distinct enough where the 650 combination of payload type number and SSRC allows for unique 651 demultiplexing of all the media on the same transport flow without 652 use of MSID though the MSID can help in several use cases. 654 7.2. Multiple Render 656 There are cases - such as a grid of security cameras or thumbnails in 657 a video conference - where a receiver is willing to receive and 658 display several media flows of video. The proposal here is to create 659 a new media level attribute called multi-render that includes an 660 integer that indicates how many streams can be rendered at the same 661 time. 663 As an example of a m-block, a system that could display 16 thumbnails 664 at the same time and was willing to receive H261 or H264 might offer 666 Offer 668 m=video 52886 RTP/AVP 98 99 669 a=multi-render:16 670 a=rtpmap:98 H261/90000 671 a=rtpmap:99 H264/90000 672 a=fmtp:99 profile-level-id=4de00a; 673 packetization-mode=0; mst-mode=NI-T; 674 sprop-parameter-sets={sps0},{pps0}; 676 When combining this multi-render feature with multiplexing, the 677 answer will might not know all the SSRCs that will be send to this 678 m-block so it is best to use payload type (PT) numbers that are 679 unique for the SDP: the demultiplexing may have to only use the PT if 680 the SSRCs are unknown. 682 The intention is that the most recently sent SSRC are the ones that 683 are rendered. Some switching MCU will likely only send the correct 684 number of SSRC and not change the SSRC but will instead update the 685 CSRC as the switching MCU select a different participant to include 686 in the particular video stream. 688 The receiver displays, in different windows, the video from the most 689 recent 16 SSRC to send video to m-block. 691 This allows a switching MCU to know how many thumbnail type streams 692 would be appropriate to send to this endpoint. 694 7.2.1. Complex Multi Render Example 696 The following shows a single multi render m-line that can display up 697 to three video streams, and send 3 streams, and support 2 layers of 698 simulcast with FEC on the high resolution layer and bundle. Note 699 that only host candidates are provided for the FEC and lower 700 resolution simulcast so if the device is behind a NAT, those streams 701 will not be used. 703 Offer 705 v=0 706 o=alice 20519 0 IN IP4 0.0.0.0 707 s=ULP FEC 708 t=0 0 709 a=ice-ufrag:074c6550 710 a=ice-pwd:a28a397a4c3f31747d1ee3474af08a068 711 a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7: 712 c9:c7:70:9d:1f:66:79:a8:07 713 c= IN IP4 24.23.204.141 714 a=group:BUNDLE vid1 vid2 vid3 715 a=group:FEC vid1 vid2 716 a=group:SIMULCAST vid1 vid3 718 m=video 62537 RTP/SAVPF 96 719 a=mid:vid1 720 a=multi-render:3 721 a=rtcp-mux 722 a=msid:12345123451234512345 s:1 f:1 723 a=rtpmap:96 VP8/90000 724 a=fmtp:96 max-fr=30;max-fs=3600; 725 a=imageattr:96 [x=1280,y=720] 726 a=candidate:0 1 UDP 2113667327 192.168.1.4 62537 typ host 727 a=candidate:1 1 UDP 694302207 24.23.204.141 62537 728 typ srflx raddr 192.168.1.4 rport 62537 729 a=candidate:0 2 UDP 2113667326 192.168.1.4 64678 typ host 730 a=candidate:1 2 UDP 1694302206 24.23.204.141 64678 731 typ rflx raddr 192.168.1.4 rport 64678 733 m=video 62541 RTP/SAVPF 97 734 a=mid:vid2 735 a=multi-render:3 736 a=rtcp-mux 737 a=msid:34567345673456734567 s:1 f:2 738 a=rtpmap:97 uplfec/90000 739 a=candidate:0 1 UDP 2113667327 192.168.1.4 62541 typ host 741 m=video 62545 RTP/SAVPF 98 742 a=mid:vid3 743 a=multi-render:3 744 a=rtcp-mux 745 a=msid:333444558899000991122 s:1 f:3 746 a=rtpmap:98 VP8/90000 747 a=fmtp:98 max-fr=15;max-fs=300; 748 a=imageattr:96 [x=320,y=240] 749 a=candidate:0 1 UDP 2113667327 192.168.1.4 62545 typ host 750 The following shows an answer to the above offer that accepts 751 everything and plans to send video from five different cameras in to 752 this m-line (but only three at a time). 754 Answer 756 v=0 757 o=Bob 20519 0 IN IP4 0.0.0.0 758 s=ULP FEC 759 t=0 0 760 a=ice-ufrag:c300d85b 761 a=ice-pwd:de4e99bd291c325921d5d47efbabd9a2 762 a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7: 763 c9:c7:70:9d:1f:66:79:a8:07 764 c= IN IP4 98.248.92.77 765 a=group:BUNDLE vid1 vid2 vid3 766 a=group:FEC vid1 vid2 767 a=group:SIMULCAST vid1 vid3 769 m=video 42537 RTP/SAVPF 96 770 a=mid:vid1 771 a=multi-render:3 772 a=rtcp-mux 773 a=msid:54321543215432154321 s:1 f:1 r:12345123451234512345 774 a=rtpmap:96 VP8/90000 775 a=fmtp:96 max-fr=30;max-fs=3600; 776 a=imageattr:96 [x=1280,y=720] 777 a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host 778 a=candidate:1 1 UDP 1694302207 98.248.92.77 42537 779 typ srflx raddr 192.168.1.7 rport 42537 780 a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host 781 a=candidate:1 2 UDP 1694302206 98.248.92.77 60065 782 typ srflx raddr 192.168.1.7 rport 60065 784 m=video 42539 RTP/SAVPF 97 785 a=mid:vid2 786 a=multi-render:3 787 a=rtcp-mux 788 a=msid:11111122222233333444444 s:1 f:2 r:34567345673456734567 789 a=rtpmap:97 uplfec/90000 790 a=candidate:0 1 UDP 2113667327 192.168.1.7 42539 typ host 792 m=video 42537 RTP/SAVPF 98 793 a=mid:vid3 794 a=multi-render:3 795 a=rtcp-mux 796 a=msid:777777888888999999111111 s:1 f:3 r:333444558899000991122 797 a=rtpmap:98 VP8/90000 798 a=fmtp:98 max-fr=15;max-fs=300; 799 a=imageattr:98 [x=320,y=240] 800 a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host 801 a=candidate:1 1 UDP 1694302207 98.248.92.77 42537 802 typ srflx raddr 192.168.1.7 rport 42537 803 a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host 804 a=candidate:1 2 UDP 1694302206 98.248.92.77 60065 805 typ srflx raddr 192.168.1.7 rport 60065 807 The following shows an answer to the above by a client that does not 808 support simulcast, FEC, bundle, or msid. 810 Answer 812 v=0 813 o=Bob 20519 0 IN IP4 0.0.0.0 814 s=ULP FEC 815 t=0 0 816 a=ice-ufrag:c300d85b 817 a=ice-pwd:de4e99bd291c325921d5d47efbabd9a2 818 a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7: 819 c9:c7:70:9d:1f:66:79:a8:07 820 c= IN IP4 98.248.92.77 822 m=video 42537 RTP/SAVPF 96 823 a=mid:vid1 824 a=rtcp-mux 825 a=recvonly 826 a=rtpmap:96 VP8/90000 827 a=fmtp:96 max-fr=30;max-fs=3600; 828 a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host 829 a=candidate:1 1 UDP 1694302207 98.248.92.77 42537 830 typ srflx raddr 192.168.1.7 rport 42537 831 a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host 832 a=candidate:1 2 UDP 1694302206 98.248.92.77 60065 833 typ srflx raddr 192.168.1.7 rport 60065 835 m=video 0 RTP/SAVPF 97 836 a=mid:vid2 837 a=rtcp-mux 838 a=rtpmap:97 uplfec/90000 840 m=video 0 RTP/SAVPF 98 841 a=mid:vid3 842 a=rtcp-mux 843 a=rtpmap:98 H264/90000 844 a=fmtp:98 profile-level-id=428014; 845 max-fs=3600; max-mbps=108000; max-br=14000 847 7.3. Dirty Little Secrets 849 If SDP offer/answers are of type AVP or AVPF but contain a crypto of 850 fingerprint attribute, they should be treated as if they were SAVP or 851 SAVPF respectively. The Answer should have the same type as the 852 offer but for all practical purposes the implementation should treat 853 it as the secure variant. 855 If SDP offer/answers are of type AVP or SAVP, but contain an 856 a=rtcp-fb attribute, they should be treated as if they were AVPF or 857 SAVPF respectively. The SDP Answer should have the same type as the 858 Offer but for all practical purposes the implementation should treat 859 it as the feedback variant. 861 If an SDP Offer has both a fingerprint and a crypto attribute, it 862 means the Offerer supports both DTLS-SRTP and SDES and the answer 863 should select one and return an Answer with only an attribute for the 864 selected keying mechanism. 866 These may not look appealing but the alternative is to make cap-neg 867 mandatory to implement in WebRTC. 869 7.4. Open Issues 871 What do do with unrecognized media received at W3C PerrConnection 872 level? Suggestion is it creates a new track in whatever stream the 873 MSID would indicate if present and the default stream if no MSID 874 header extension in the RTP. 876 7.5. Confusions 878 You can decrypt DTLS-SRTP media before receiving an answer, you can't 879 determine if it is secure or not till you have the fingerprint and 880 have verified it 882 You can use RTCP-FB to do things like PLI without signaling the SSRC. 883 The PLI packets gets the sender SSRC from the incoming media that is 884 trying to signal the PLI for. 886 8. Examples 888 Example of a video client joining a video conference. The client can 889 produce and receive two streams of video, one from the slides and the 890 other of the person. The video of the person is synchronized with 891 the audio. In addition, the client can display up to 10 thumbnails 892 of video. The main video is simulcast at HD size and a thumbnail 893 size. 895 Offer 897 v=0 898 o=alice 2890844526 2890844527 IN IP4 host.example.com 899 s= 900 c=IN IP4 host.atlanta.example.com 901 t=0 0 902 a=group:LS 1,2,3 903 a=group:SIMULCAST 2,3 905 m=audio 49170 RTP/AVP 96 <- This is the Audio 906 a=mid:1 907 a=rtpmap:96 iLBC/8000 908 a=content:main 910 m=video 51372 RTP/AVP 97 <- This is the main video 911 a=mid:2 912 a=rtpmap:97 VP8/90000 913 a=fmtp:97 max-fr=30;max-fs=3600; 914 a=imageattr:97 [x=1080,y=720] 915 a=content:main 917 m=video 51372 RTP/AVP 98 <- This is the slides 918 a=mid:2 919 a=rtpmap:98 VP8/90000 920 a=fmtp:98 max-fr=30;max-fs=3600; 921 a=imageattr:98 [x=1080,y=720] 922 a=content:slides 924 m=video 51372 RTP/AVP 99 <- This is the simulcast of main 925 a=mid:3 926 a=rtpmap:99 VP8/90000 927 a=fmtp:99 max-fr=15;max-fs=300; 928 a=imageattr:99 [x=320,y=240] 930 m=video 51372 RTP/AVP 100 <- This is the 10 thumbnails 931 a=mid:4 932 a=multi-render:10 933 a=recvonly 934 a=rtpmap:100 VP8/90000 935 a=fmtp:100 max-fr=15;max-fs=300; 936 a=imageattr:100 [x=320,y=240] 938 Example of a three-screen video endpoint connecting to a two-screen 939 system which ends up selecting the left and middle screens. 941 Offer 943 v=0 944 o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com 945 s= 946 c=IN IP4 host.atlanta.example.com 947 t=0 0 948 a=rtcp-fb 950 m=audio 49100 RTP/SAVPF 96 951 a=rtpmap:96 iLBC/8000 953 m=video 49102 RTP/SAVPF 97 954 a=content:main 955 a=rtpmap:97 H261/90000 957 m=video 49104 RTP/SAVPF 98 958 a=content:left 959 a=rtpmap:98 H261/90000 961 m=video 49106 RTP/SAVPF 99 962 a=content:right 963 a=rtpmap:99 H261/90000 965 Answer 967 v=0 968 o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com 969 s= 970 c=IN IP4 host.biloxi.example.com 971 t= 0 0 972 a=rtcp-fb 974 m=audio 50100 RTP/SAVPF 96 975 a=rtpmap:96 iLBC/8000 977 m=video 50102 RTP/SAVPF 97 978 a=content:main 979 a=rtpmap:97 H261/90000 981 m=video 50104 RTP/SAVPF 98 982 a=content:left 983 a=rtpmap:98 H261/90000 985 m=video 0 RTP/SAVPF 99 986 a=content:right 987 a=rtpmap:99 H261/90000 989 Example of a client that supports SRTP-DTLS and SDES connecting to a 990 client that supports SRTP-DTLS. 992 Offer 994 v=0 995 o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com 996 s= 997 c=IN IP4 host.atlanta.example.com 998 t=0 0 1000 m=audio 49170 RTP/AVP 99 1001 a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d 1002 :f7:c9:c7:70:9d:1f:66:79:a8:07 1003 a=crypto:1 AES_CM_128_HMAC_SHA1_80 1004 inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32 1005 a=rtpmap:99 iLBC/8000 1007 m=video 51372 RTP/AVP 96 1008 a=fingerprint:sha-1 92:81:49:83:4a:23:0a:0f:1f:9d:f7: 1009 c0:c7:70:9d:1f:66:79:a8:07 1010 a=crypto:1 AES_CM_128_HMAC_SHA1_32 1011 inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32 1012 a=rtpmap:96 H261/90000 1014 Answer 1016 v=0 1017 o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com 1018 s= 1019 c=IN IP4 host.biloxi.example.com 1020 t=0 0 1022 m=audio 49172 RTP/AVP 99 1023 a=crypto:1 AES_CM_128_HMAC_SHA1_80 1024 inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32 1025 a=rtpmap:99 iLBC/8000 1027 m=video 51374 RTP/AVP 96 1028 a=crypto:1 AES_CM_128_HMAC_SHA1_80 1029 inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32 1030 a=rtpmap:96 H261/90000 1032 9. Tasks 1034 This section outlines work that needs to be done in various 1035 specifications to make the proposal here actually happen. 1037 Tasks: 1039 1. Extend the W3C API to be able to set and read the CSRC list for 1040 a PC-Track. 1042 2. Extend the W3C API to be able to read SSRC of last RTP packed 1043 received. 1045 3. Write an RTP Header Extension draft to cary the MSID. 1047 4. Fix up MSID draft to align with this proposal. 1049 5. Write a draft to add left, right to the SDP content attribute. 1050 Add the stuff to the W3C API to read and write this on a track. 1052 6. Write a draft on SDP "SIMULCAST" group to signal multiple 1053 m-block as are simulcast of same video content. 1055 7. Complete the bundle draft. 1057 8. Provide guidance for ways to use SDP for reduced glare when 1058 adding of one way media streams. 1060 9. Write a draft defining the multi render attribute. 1062 10. Change W3C API to say that a PC-Track can be in only one 1063 PeerConnection or make an object inside the PeerConnection for 1064 each track in the PC that can be used to set constraints and 1065 settings and get information related to the RTP flow. 1067 11. Sort out how to tell a PC-Track, particularly one meant for 1068 receiving information, that it can do simulcast, layered coding, 1069 RTX, FEC, etc. 1071 10. Security Considerations 1073 TBD 1075 11. IANA Considerations 1077 This document requires no actions from IANA. 1079 12. Acknowledgments 1081 I would like to thank Suhas Nandakumar, Eric Rescorla, Charles Eckel, 1082 Mo Zanaty, and Lyndsay Campbell for help with this draft. 1084 13. Open Issues 1086 The overall solution is complicated considerably by the fact that 1087 WebRTC allows a PC-Track to be used in more than one PC-Stream but 1088 requires only one copy of the RTP data for the track to be sent. I 1089 am not aware of any use case for this and think it should be removed. 1090 If a PC-Track needs to be synchronized with two different things, 1091 they should all go in one PC-Stream instead of two. 1093 14. Existing SDP 1095 The following shows some examples of SDP today that any new system 1096 needs to be able to receive and work with in a backwards compatible 1097 way. 1099 14.1. Multiple Encodings 1101 Multiple codecs accepted on same m-line [RFC4566]. 1103 Offer 1105 v=0 1106 o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com 1107 s= 1108 c=IN IP4 host.atlanta.example.com 1109 t=0 0 1111 m=audio 49170 RTP/AVP 99 1112 a=rtpmap:99 iLBC/8000 1114 m=video 51372 RTP/AVP 31 32 1115 a=rtpmap:31 H261/90000 1116 a=rtpmap:32 MPV/90000 1118 Answer 1120 v=0 1121 o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com 1122 s= 1123 c=IN IP4 host.biloxi.example.com 1124 t=0 0 1126 m=audio 49172 RTP/AVP 99 1127 a=rtpmap:99 iLBC/8000 1129 m=video 51374 RTP/AVP 31 32 1130 a=rtpmap:31 H261/90000 1131 a=rtpmap:32 MPV/90000 1133 This means that a sender can switch back and forth between H261 and 1134 MVP without any further signaling. The receiver MUST be capable of 1135 receiving both formats. At any point in time, only one video format 1136 is sent, thus implying that only one video is meant to be displayed. 1138 14.2. Forward Error Correction 1140 Multiple m-blocks identified with respective "mid" grouped to 1141 indicate FEC operation using FEC-FR semantics defined in [RFC5956]. 1143 Offer 1145 v=0 1146 o=ali 1122334455 1122334466 IN IP4 fec.example.com 1147 s=Raptor RTP FEC Example 1148 t=0 0 1149 a=group:FEC-FR S1 R1 1151 m=video 30000 RTP/AVP 100 1152 c=IN IP4 233.252.0.1/127 1153 a=rtpmap:100 MP2T/90000 1154 a=fec-source-flow: id=0 1155 a=mid:S1 1157 m=application 30000 RTP/AVP 110 1158 c=IN IP4 233.252.0.2/127 1159 a=rtpmap:110 raptorfec/90000 1160 a=fmtp:110 raptor-scheme-id=1; Kmax=8192; T=128; 1161 P=A; repair-window=200000 1162 a=mid:R1 1164 14.3. Same Video Codec With Different Settings 1166 This example shows a single codec,say H.264, signaled with different 1167 settings [RFC4566]. 1169 Offer 1171 v=0 1173 m=video 49170 RTP/AVP 100 99 98 1174 a=rtpmap:98 H264/90000 1175 a=fmtp:98 profile-level-id=42A01E; packetization-mode=0; 1176 sprop-parameter-sets=Z0IACpZTBYmI,aMljiA== 1177 a=rtpmap:99 H264/90000 1178 a=fmtp:99 profile-level-id=42A01E; packetization-mode=1; 1179 sprop-parameter-sets=Z0IACpZTBYmI,aMljiA== 1180 a=rtpmap:100 H264/90000 1181 a=fmtp:100 profile-level-id=42A01E; packetization-mode=2; 1182 sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==; 1183 sprop-interleaving-depth=45; sprop-deint-buf-req=64000; 1184 sprop-init-buf-time=102478; deint-buf-cap=128000 1186 14.4. Different Video Codecs With Different Resolutions Formats 1188 The SDP below shows some m-blocks with various ways to specify 1189 resolutions for video codecs signaled [RFC4566]. 1191 Offer 1193 m=video 49170 RTP/AVP 31 1194 a=rtpmap:31 H261/90000 1195 a=fmtp:31 CIF=2;QCIF=1;D=1 1197 m=video 49172 RTP/AVP 99 1198 a=rtpmap:99 jpeg2000/90000 1199 a=fmtp:99 sampling=YCbCr-4:2:0;width=128;height=128 1201 m=video 49174 RTP/AVP 96 1202 a=rtpmap:96 VP8/90000 1203 a=fmtp:96 max-fr=30;max-fs=3600; 1204 a=imageattr:96 [x=1280,y=720] 1206 14.5. Lip Sync Group 1208 [RFC5888] grouping semantics for Lip Synchronization between audio 1209 and video 1211 Offer 1213 v=0 1214 o=Laura 289083124 289083124 IN IP4 one.example.com 1215 c=IN IP4 192.0.2.1 1216 t=0 0 1217 a=group:LS 1 2 1219 m=audio 30000 RTP/AVP 0 1220 a=mid:1 1222 m=video 30002 RTP/AVP 31 1223 a=mid:2 1225 14.6. BFCP 1227 [RFC4583] defines SDP format for Binary Floor Control Protocol (BFCP) 1228 as shown below 1229 Offer 1231 m=application 50000 TCP/TLS/BFCP * 1232 a=setup:passive 1233 a=connection:new 1234 a=fingerprint:SHA-1 \ 1235 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 1236 a=floorctrl:s-only 1237 a=confid:4321 1238 a=userid:1234 1239 a=floorid:1 m-stream:10 1240 a=floorid:2 m-stream:11 1242 m=audio 50002 RTP/AVP 0 1243 a=label:10 1245 m=video 50004 RTP/AVP 31 1246 a=label:11 1248 Answer 1250 m=application 50000 TCP/TLS/BFCP * 1251 a=setup:passive 1252 a=connection:new 1253 a=fingerprint:SHA-1 \ 1254 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 1255 a=floorctrl:s-only 1256 a=confid:4321 1257 a=userid:1234 1258 a=floorid:1 m-stream:10 1259 a=floorid:2 m-stream:11 1261 m=audio 50002 RTP/AVP 0 1262 a=label:10 1264 m=video 50004 RTP/AVP 31 1265 a=label:11 1267 14.7. Retransmission 1269 The SDP given below shows SDP signaling for retransmission of the 1270 original media stream(s) as defined in [RFC4756] 1271 Offer 1273 v=0 1274 o=mascha 2980675221 2980675778 IN IP4 host.example.net 1275 c=IN IP4 192.0.2.0 1276 a=group:FID 1 2 1277 a=group:FID 3 4 1279 m=audio 49170 RTP/AVPF 96 1280 a=rtpmap:96 AMR/8000 1281 a=fmtp:96 octet-align=1 1282 a=rtcp-fb:96 nack 1283 a=mid:1 1285 m=audio 49172 RTP/AVPF 97 1286 a=rtpmap:97 rtx/8000 1287 a=fmtp:97 apt=96;rtx-time=3000 1288 a=mid:2 1290 m=video 49174 RTP/AVPF 98 1291 a=rtpmap:98 MP4V-ES/90000 1292 a=rtcp-fb:98 nack 1293 a=fmtp:98 profile-level-id=8;config=01010000012000884006682C209\ 1294 0A21F 1295 a=mid:3 1297 m=video 49176 RTP/AVPF 99 1298 a=rtpmap:99 rtx/90000 1299 a=fmtp:99 apt=98;rtx-time=3000 1300 a=mid:4 1302 Note that RTX RFC also has the following SSRC multiplexing example 1303 but this is meant for declarative use of SDP as there was no way in 1304 this RFC to accept, reject, or otherwise negotiate this in a an offer 1305 / answer SDP usage. 1307 SDP 1309 v=0 1310 o=mascha 2980675221 2980675778 IN IP4 host.example.net 1311 c=IN IP4 192.0.2.0 1313 m=video 49170 RTP/AVPF 96 97 1314 a=rtpmap:96 MP4V-ES/90000 1315 a=rtcp-fb:96 nack 1316 a=fmtp:96 profile-level-id=8;config=01010000012000884006682C209\ 1317 0A21F 1318 a=rtpmap:97 rtx/90000 1319 a=fmtp:97 apt=96;rtx-time=3000 1321 14.8. Layered coding dependency 1323 [RFC5583] "depend" attribute is shown here to indicate dependency 1324 between layers represented by the individual m-blocks 1325 Offer 1327 a=group:DDP L1 L2 L3 1329 m=video 20000 RTP/AVP 96 97 98 1330 a=rtpmap:96 H264/90000 1331 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 1332 mst-mode=NI-T; sprop-parameter-sets={sps0},{pps0}; 1333 a=rtpmap:97 H264/90000 1334 a=fmtp:97 profile-level-id=4de00a; packetization-mode=1; 1335 mst-mode=NI-TC; sprop-parameter-sets={sps0},{pps0}; 1336 a=rtpmap:98 H264/90000 1337 a=fmtp:98 profile-level-id=4de00a; packetization-mode=2; 1338 mst-mode=I-C; init-buf-time=156320; 1339 sprop-parameter-sets={sps0},{pps0}; 1340 a=mid:L1 1342 m=video 20002 RTP/AVP 99 100 1343 a=rtpmap:99 H264-SVC/90000 1344 a=fmtp:99 profile-level-id=53000c; packetization-mode=1; 1345 mst-mode=NI-T; sprop-parameter-sets={sps1},{pps1}; 1346 a=rtpmap:100 H264-SVC/90000 1347 a=fmtp:100 profile-level-id=53000c; packetization-mode=2; 1348 mst-mode=I-C; sprop-parameter-sets={sps1},{pps1}; 1349 a=mid:L2 1350 a=depend:99 lay L1:96,97; 100 lay L1:98 1352 m=video 20004 RTP/AVP 101 1353 a=rtpmap:101 H264-SVC/90000 1354 a=fmtp:101 profile-level-id=53001F; packetization-mode=1; 1355 mst-mode=NI-T; sprop-parameter-sets={sps2},{pps2}; 1356 a=mid:L3 1357 a=depend:101 lay L1:96,97 L2:99 1359 14.9. SSRC Signaling 1361 [RFC5576] "ssrc" attribute is shown here to signal synchronization 1362 sources in a given RTP Session 1364 Offer 1366 m=video 49170 RTP/AVP 96 1367 a=rtpmap:96 H264/90000 1368 a=ssrc:12345 cname:user@example.com 1369 a=ssrc:67890 cname:user@example.com 1371 This indicates what the sender will send. It's at best a guess 1372 because in the case of SSRC collision, it's all wrong. It does not 1373 allow one to reject a stream. It does not mean that both streams are 1374 displayed at the same time. 1376 14.10. Content Signaling 1378 [RFC4796] "content" attribute is used to specify the semantics of 1379 content represented by the video streams. 1381 Offer 1383 v=0 1384 o=Alice 292742730 29277831 IN IP4 131.163.72.4 1385 s=Second lecture from information technology 1386 c=IN IP4 131.164.74.2 1387 t=0 0 1389 m=video 52886 RTP/AVP 31 1390 a=rtpmap:31 H261/9000 1391 a=content:slides 1393 m=video 53334 RTP/AVP 31 1394 a=rtpmap:31 H261/9000 1395 a=content:speaker 1397 m=video 54132 RTP/AVP 31 1398 a=rtpmap:31 H261/9000 1399 a=content:sl 1401 15. References 1403 15.1. Normative References 1405 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1406 Requirement Levels", BCP 14, RFC 2119, March 1997. 1408 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1409 with Session Description Protocol (SDP)", RFC 3264, 1410 June 2002. 1412 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1413 Description Protocol", RFC 4566, July 2006. 1415 15.2. Informative References 1417 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1418 A., Peterson, J., Sparks, R., Handley, M., and E. 1419 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1420 June 2002. 1422 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1423 Jacobson, "RTP: A Transport Protocol for Real-Time 1424 Applications", STD 64, RFC 3550, July 2003. 1426 [RFC4583] Camarillo, G., "Session Description Protocol (SDP) Format 1427 for Binary Floor Control Protocol (BFCP) Streams", 1428 RFC 4583, November 2006. 1430 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1431 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1432 July 2006. 1434 [RFC4756] Li, A., "Forward Error Correction Grouping Semantics in 1435 Session Description Protocol", RFC 4756, November 2006. 1437 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description 1438 Protocol (SDP) Content Attribute", RFC 4796, 1439 February 2007. 1441 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1442 (ICE): A Protocol for Network Address Translator (NAT) 1443 Traversal for Offer/Answer Protocols", RFC 5245, 1444 April 2010. 1446 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1447 Media Attributes in the Session Description Protocol 1448 (SDP)", RFC 5576, June 2009. 1450 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 1451 Dependency in the Session Description Protocol (SDP)", 1452 RFC 5583, July 2009. 1454 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1455 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1457 [RFC5956] Begen, A., "Forward Error Correction Grouping Semantics in 1458 the Session Description Protocol", RFC 5956, 1459 September 2010. 1461 [webrtc-api] 1462 Bergkvist, Burnett, Jennings, Narayanan, "WebRTC 1.0: 1463 Real-time Communication Between Browsers", October 2011. 1465 Available at 1466 http://dev.w3.org/2011/webrtc/editor/webrtc.html 1468 Author's Address 1470 Cullen Jennings 1471 Cisco 1472 400 3rd Avenue SW, Suite 350 1473 Calgary, AB T2P 4H2 1474 Canada 1476 Email: fluffy@iii.ca