idnits 2.17.1 draft-ivov-rtcweb-noplan-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There are 11 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 421: '...ers WebRTC applications MUST therefore...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 17, 2013) is 3965 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2222' on line 803 -- Looks like a reference, but probably isn't: '2223' on line 799 -- Looks like a reference, but probably isn't: '2224' on line 799 -- Looks like a reference, but probably isn't: '2225' on line 803 == Unused Reference: 'RFC6015' is defined on line 867, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Ivov 3 Internet-Draft Jitsi 4 Intended status: Standards Track E. Marocco 5 Expires: December 19, 2013 Telecom Italia 6 P. Thatcher 7 Google 8 June 17, 2013 10 No Plan: Economical Use of the Offer/Answer Model in WebRTC Sessions 11 with Multiple Media Sources 12 draft-ivov-rtcweb-noplan-01 14 Abstract 16 This document describes a model for the lightweight use of SDP Offer/ 17 Answer in WebRTC. The goal is to minimize reliance on Offer/Answer 18 exchanges in a WebRTC session and provide applications with the tools 19 necessary to implement the signalling that they may need in a way 20 that best fits their custom requirements and topologies. This 21 simplifies signalling of multiple media sources or providing RTP 22 Synchronisation source (SSRC) identification in multi-party sessions. 23 Another important goal of this model is to remove from clients 24 topological constraints such as the requirement to know in advance 25 all SSRC identifiers that they could potentially introduce in a 26 particular session. 28 The model described here is similar to the one employed by the data 29 channel JavaScript APIs in WebRTC, where methods are supported on 30 PeerConnection without being reflected in SDP. 32 This document does not question the use of SDP and the Offer/Answer 33 model or the value they have in terms of interoperability with legacy 34 or other non-WebRTC devices. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on December 19, 2013. 53 Copyright Notice 55 Copyright (c) 2013 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 71 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Reliance on Offer/Answer . . . . . . . . . . . . . . . . . . 5 73 3.1. Interoperability with Legacy . . . . . . . . . . . . . . 6 74 4. Additional Session Control and Signalling . . . . . . . . . . 8 75 5. Demultiplexing and Identifying Streams 76 (Use of Bundle) . . . . . . . . . . . . . . . . . . . . . . . 9 77 6. Simulcasting, FEC, Layering and RTX (Open Issue) . . . . . . 10 78 7. WebRTC API Requirements . . . . . . . . . . . . . . . . . . . 11 79 7.1. Suggested WebRTC API Using TrackSendParams . . . . . . . 12 80 7.1.1. Example 2 . . . . . . . . . . . . . . . . . . . . . . 15 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 82 9. Informative References . . . . . . . . . . . . . . . . . . . 18 83 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 19 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 86 1. Background 88 In its early stages the RTCWEB working group chose to use the Session 89 Description Protocol (SDP) and the Offer/Answer model [RFC3264] when 90 establishing and negotiating sessions. This choice was also 91 accompanied by the decision not to mandate a specific signalling 92 protocol so that, once interoperability has been achieved, web 93 applications can choose the semantics that best fit their 94 requirements. In some scenarios however, such as those involving the 95 use of multiple media sources, these choices have left open the issue 96 of exactly which operations should be handled by SDP Offer/Answer and 97 which of them should be left to application-specific signalling. 99 At the time of writing of this document, the RTCWEB working group is 100 considering two approaches to addressing the issue, that are often 101 referred to as Plan A [PlanA] and Plan B [PlanB]. Both of them 102 describe semantics that require Offer/Answer exchanges in a number of 103 situations where this could be avoided, particularly when adding or 104 removing media sources to a session. This requirement applies 105 equally to cases where a client adds the stream of a newly activated 106 web cam, a simulcast flow or upon the arrival or departure of a 107 conference participant. 109 Plan A handles such notifications with the addition or removal of 110 independent m= lines [PlanA], while Plan B relies on the use of 111 multiplexed m= lines but still depends on the Offer/Answer exchanges 112 for the addition or removal of media stream identifiers [MSID]. 114 By taking the Offer/Answer approach, both Plan A and Plan B take away 115 from the application the opportunity to handle such events in a way 116 that is most fitting for the use case, which, among other things, 117 also goes against the working group's decision to not to define a 118 specific signalling protocol. (It could be argued that it is 119 therefore only natural how proponents of each plan, having different 120 use cases in mind, are remarkably far from reaching consensus). 122 Reliance on preliminary announcement of SSRC identifiers is another 123 issue. While this could be perceived as relatively straightforward 124 in one-to-one sessions or even conference calls within controlled 125 environments, it can be a problem in the following cases: 127 o interoperability with legacy/non-WebRTC endpoints 129 o use within non-controlled and potentially federated conference 130 environments where new RTP streams may appear relatively often. 131 In such cases the signalling required to describe all of them 132 through Offer/Answer may represent substantial overhead while none 133 or only a part of it (e.g. the description of a main, active 134 speaker stream) may be required by the application. 136 By increasing the number of Offer/Answer exchanges Both Plan A and 137 Plan B also increase the risk of encountering glare situations (i.e. 138 cases where both parties attempt to modify a session at the same 139 time). While glare is also possible with basic Offer/Answer and 140 resolution of such situations must be implemented anyway, the need to 141 frequently resort to such code may either negatively impact user 142 experience (e.g. when "back off" resolution is used) or require 143 substantial modifications in the Offer/Answer model and/or further 144 venturing into the land of signalling protocols 145 [ROACH-GLARELESS-ADD]. 147 2. Introduction 149 The goal of this document is to provide directions for use of the SDP 150 Offer/Answer model in a way that satisfies the following 151 requirements: 153 o the addition and removal of media sources (e.g. conference 154 participants, multiple web cams or "slides" ) must be possible 155 without the need of Offer/Answer exchanges; 157 o the addition or removal of simulcast or layered streams must be 158 possible without the need for Offer/Answer exchanges beyond the 159 initial declaration of such capabilities for either direction. 161 o call establishment must not require preliminary announcement or 162 even knowledge of all potentially participating media sources; 164 o application specific signalling should be used to cover most 165 semantics following call establishment, such as adding, removing 166 or identifying SSRCs; 168 o straightforward interoperability with widely deployed legacy 169 endpoints with rudimentary support for Offer/Answer. This 170 includes devices that allow for one audio and potentially one 171 video m= line and that expect to only ever be required to render a 172 single RTP stream at a time for any of them. (Note that this does 173 NOT include devices that expect to see multiple "m=video" lines 174 for different SSRCs as they can hardly be viewed as "widely 175 deployed legacy"). 177 To achieve the above requirements this specification expects that 178 browsers and WebRTC endpoints in general will only use SDP Offer/ 179 Answer to establish transport channels and initialize an RTP stack 180 and codec/processing chains. This also includes any renegotiation 181 that requires the re-initialisation of these chains. For example, 182 adding VP8 to a session that was setup with only H.264, would 183 obviously still require an Offer/Answer exchange. 185 All other session control and signalling are to be left to 186 applications. 188 The actual Offer/Answer semantics presented here do not differ 189 fundamentally from those proposed by Plan A and Plan B. The main 190 differentiation point of this approach is the fact that the exact 191 protocol mechanism is left to WebRTC applications. Such applications 192 or lightweight signalling gateways can then implement either Plan A, 193 or Plan B, or an entirely different signalling protocol, depending on 194 what best matches their use cases and topology. 196 3. Reliance on Offer/Answer 198 The model presented in this specification relies on use of SDP and 199 Offer/Answer in quite the same way as many of the pre-WebRTC (and 200 most of the legacy) endpoints do: negotiating formats, establishing 201 transport channels and exchanging, in a declarative way, media and 202 transport parameters that are then used for the initialization of the 203 corresponding stacks. 205 The following is an example presenting what this specification views 206 as a typical offer sent by a WebRTC endpoint: 208 v=0 209 o=- 0 0 IN IP4 198.51.100.33 210 s= 211 t=0 0 213 a=group:BUNDLE audio video // declaring BUNDLE Support 214 c=IN IP4 198.51.100.33 215 a=ice-ufrag:Qq8o/jZwknkmXpIh // initializing ICE 216 a=ice-pwd:gTMACiJcZv1xdPrjfbTHL5qo 217 a=ice-options:trickle 218 a=fingerprint:sha-1 // DTLS-SRTP keying 219 a4:b1:97:ab:c7:12:9b:02:12:b8:47:45:df:d8:3a:97:54:08:3f:16 221 m=audio 5000 RTP/SAVPF 96 0 8 222 a=mid:audio 223 a=rtcp-mux 225 a=rtpmap:96 opus/48000/2 // PT mappings 226 a=rtpmap:0 PCMU/8000 227 a=rtpmap:8 PCMA/8000 229 a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level //5825 header 230 a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level //extensions 232 [ICE Candidates] 234 m=video 5002 RTP/SAVPF 97 98 235 a=mid:video 236 a=rtcp-mux 237 a=rtpmap:97 VP8/90000 // PT mappings and resolutions capabilities 238 a=imageattr:97 \ 239 send [x=[480:16:800],y=[320:16:640],par=[1.2-1.3],q=0.6] \ 240 [x=[176:8:208],y=[144:8:176],par=[1.2-1.3]] \ 241 recv * 242 a=rtpmap:98 H264/90000 243 a=imageattr:98 send [x=800,y=640,sar=1.1,q=0.6] [x=480,y=320] \ 244 recv [x=330,y=250] 246 a=extmap:3 urn:ietf:params:rtp-hdrext:fec-source-ssrc //5825 header 247 a=extmap:4 urn:ietf:params:rtp-hdrext:rtx-source-ssrc //extensions 249 a=max-send-ssrc:{*:1} // declaring maximum 250 a=max-recv-ssrc:{*:4} // number of SSRCs 252 [ICE Candidates] 254 The answer to the offer above would have roughly the same structure 255 and content. The most important aspects here are: 257 o Preserves interoperability with most kinds of legacy or non-WebRTC 258 endpoints. 260 o Allows the negotiation of most parameters that concern the media/ 261 RTP stack (typically the browser). 263 o Only a single Offer/Answer exchange is required for session 264 establishment and, in most cases, for the entire duraftion of a 265 session. 267 o Leaves complete freedom to applications as to the way that they 268 are going to signal any other information such as SSRC 269 identification information or the addition or removal of RTP 270 streams. 272 3.1. Interoperability with Legacy 274 Interoperating with the "widely deployed legacy endpoints" is one of 275 the main reasons for the RTCWEB working group to choose the SDP Offer 276 /Answer model as basis for media negotiation. It is hence important 277 to clarify the compatibility claims that this specification makes. 279 A "widely deployed legacy endpoint" is considered to have the 280 following characteristics: 282 o Likely to use the SIP protocol. 284 o Capability to gracefully handle one audio and potentially one 285 video m= line in an SDP Offer. 287 o Capability to render one SSRC per m=line at any given moment but 288 multiple, consecutive SSRCs over a period of time. This would be 289 the case with transferred session replacements for example. While 290 the capability to handle multiple SSRCs simultaneously is not 291 uncommon it cannot be relied upon and should first be confirmed by 292 signalling. 294 o Possibly have features such as ICE, BUNDLE, RTCP-MUX, etc. Just 295 as likely not to. 297 o Very unlikely to announce in SDP the SSRCs that they intend to use 298 for a given session. 300 o Exact set of features and capabilities: Guaranteed to be wildly 301 and widely diverse. 303 While it is relatively simple for RTCWEB to accommodate some of the 304 above, it is obviously impossible to design a model that could simply 305 be labeled as "compatible with legacy". It is reasonable to assume 306 that use cases involving use of such endpoints will be designed for a 307 relatively specific set of devices and applications. The role of the 308 WebRTC framework is to hence provide a least-common-denominator model 309 that can then be extended by applications. 311 It is just as important not to make choices or assumptions that will 312 render interoperability for some applications or topologies difficult 313 or even impossible. 315 This is exactly what the use of Offer/Answer discussed here strives 316 to achieve. Audio/Video offers originating from WebRTC endpoints 317 will always have a maximum of one audio and one video m= line. It 318 will be up to applications to determine exactly how many streams they 319 can afford to send once such a session has been established. The 320 exact mechanism to do this is outside the scope of this document (or 321 WebRTC in general). 323 Note that it is still possible for WebRTC endpoints to indicate 324 support for a maximum number of incoming or outgoing streams for 325 reasons such as processing constraints. Use of the "max-send-ssrc" 326 and "max-recv-ssrc" attributes [MAX-SSRC] could be one way of doing 327 this, although that mechanism would need to be extended to provide 328 ways of distinguishing between independent flows and complementary 329 ones such as layered FEC and RTX. Even with this in mind it is still 330 important, not to rely on the presence of that indication in incoming 331 descriptions as well as to provide applications with a way of 332 retrieving such capabilities from the WebRTC stack (e.g. the 333 browser). 335 Determining whether a peer has the ability to seamlessly switch from 336 one SSRC to another is also left to application specific signalling. 337 It is worth noting that protocols such as SIP for example, often 338 accompany SSRC replacements with extra signalling (re-INVITEs with a 339 "replaces" header) that can easily be reused by applications or 340 mapped to something that they deem more convenient. 342 For the sake of interoperability this specification strongly advises 343 against the use of multiple m= lines for a single media type. Not 344 only would such use be meaningless to a large number of legacy 345 endpoints but it is also likely to be mishandled by many of them and 346 to cause unexpected behaviour. 348 Finally, it is also worth pointing out that there is a significant 349 number of feature rich non-WebRTC applications and devices that have 350 relatively advanced, modern sets of capabilities. Such endpoints 351 hardly fit the "legacy" qualification. Yet, as is often the case 352 with novel and/or proprietary applications, they too have adopted 353 diverse signalling mechanisms and the requirements described in this 354 section fully apply when it comes to interoperating with them. 356 4. Additional Session Control and Signalling 358 o Adding and removing RTP streams to an existing session. 360 o Accepting and refusing some of them. 362 o Identifying SSRCs and obtaining additional metadata for them (e.g. 363 the user corresponding to a specific SSRC). 365 All of the above semantics are best handled and hence should be left 366 to applications. There are numerous existing or emerging solutions, 367 some of them developed by the IETF, that already cover this. This 368 includes CLUE channels [CLUE], the SIP Event Package For Conference 369 State [RFC4575] and its XMPP variant [COIN] as well as the protocols 370 defined within the Centralised Conferencing IETF working group [XCON] 371 . Additional mechanisms, undoubtedly many based on JSON, are very 372 likely to emerge in the future as WebRTC applications address varying 373 use cases, scenarios and topologies. 375 The most important part of this specification is hence to prevent 376 certain assumptions or topologies from being imposed on applications. 377 One example of this is the need to know and include in the Offer/ 378 Answer exchange, all the SSRCs that can show up in a session. This 379 can be particularly problematic for scenarios that involve non-WebRTC 380 endpoints. 382 Large scale conference calls, potentially federated through RTP 383 translator-like bridges, would be another problematic scenario. 384 Being able to always pre-announce SSRCs in such situations could of 385 course be made to work but it would come at a price. It would either 386 require a very high number of Offer/Answer updates that propagate the 387 information through the entire topology, or use of tricks such as 388 pre-allocating a range of "fake" SSRCs, announcing them to 389 participants and then overwriting the actual SSRCs with them. 390 Depending on the scenario both options could prove inappropriate or 391 inefficient while some applications may not even need such 392 information. Others could be retrieving it through simplistic means 393 such as access to a centralized resource (e.g. an URL pointing to a 394 JSON description of the conference). 396 5. Demultiplexing and Identifying Streams (Use of Bundle) 398 This document assumes use of BUNDLE in WebRTC endpoints. This 399 implies that all RTP streams are likely to end up being received on 400 the same port. A demuxing mechanism is therefore necessary in order 401 for these packets to then be fed into the appropriate processing 402 chain (i.e. matched to an m= line). 404 Note: it is important to distinguish between the demultiplexing 405 and the identification of incoming flows. Throughout this 406 specification the former is used to refer to the process of 407 choosing selecting a depacketizing/decoding/processing chain to 408 feed incoming packets to. Such decisions depend solely on the 409 format that is used to encode the content of incoming packets. 411 The above is not to be confused with the process of making 412 rendering decision about a processed flow. Such decisions include 413 showing a "current speaker" flow at a specific location, window or 414 video tag, while choosing a different one for a second, "slides" 415 flow. Another example would be the possibility to attach "Alice", 416 "Bob" and "Carol" labels on top of the appropriate UI components. 417 This specification leaves such rendering choices entirely to 418 application-specific signalling as described in Section 4. 420 This specification uses demuxing based on RTP payload types. When 421 creating offers and answers WebRTC applications MUST therefore 422 allocate RTP payload types only once per bundle group. In cases 423 where rtcp-mux is in use this would mean a maximum of 96 payload 424 types per bundle [RFC5761]. It has been pointed out that some legacy 425 devices may have unpredictable behaviour with payload types that are 426 outside the 96-127 range reserved by [RFC3551] for dynamic use. Some 427 applications or implementations may therefore choose not to use 428 values outside this range. Whatever the reason, offerers that find 429 they need more than the available payload type numbers, will simply 430 need to either use a second bundle group or not use BUNDLE at all 431 (which in the case of a single audio and a single video m= line 432 amounts to roughly the same thing). This would also imply building a 433 dynamic table, mapping SSRCs to PTs and m= lines, in order to then 434 also allow for RTCP demuxing. 436 While not desirable, the implications of such a decision would be 437 relatively limited. Use of trickle ICE [TRICKLE-ICE] is going to 438 lessen the impact on call establishment latency. Also, the fact that 439 this would only occur in a limited number of cases makes it unlikely 440 to have a significant effect on port consumption. 442 An additional requirement that has been expressed toward demuxing is 443 the ability to assign incoming packets with the same payload type to 444 different processing chains depending on their SSRCs. A possible 445 example for this is a scenario where two video streams are being 446 rendered on different video screens that each have their own decoding 447 hardware. 449 While the above may appear as a demuxing and a decoding related 450 problem it is really mostly a rendering policy specific to an 451 application. As such it should be handled by app. specific 452 signalling that could involve custom-formatted, per-SSRC information 453 that accompanies SDP offers and answers. 455 6. Simulcasting, FEC, Layering and RTX (Open Issue) 457 From a WebRTC perspective, repair flows such as layering, FEC, RTX 458 and to some extent simulcasting, present an interesting challenge, 459 which is why they are considered an open issue by this specification. 461 On the one hand they are transport utilities that need to be 462 understood, supported and used by browsers in a way that is mostly 463 transparent to applications. On the other, some applications may 464 need to be made aware of them and given the option to control their 465 use. This could be necessary in cases where their use needs to be 466 signalled to non-WebRTC endpoints in an application specific way. 467 Another example is the possibility for an application to choose to 468 disable some or all repair flows because it has been made aware by 469 application-specific signalling that they are temporarily not being 470 used/rendered by the remote end (e.g. because it is only displaying a 471 thumbnail or because a corresponding video tag is not currently 472 visible). 474 One way of handling such flows would be to advertise them in the way 475 suggested by [RFC5956] and to then control them through application 476 specific signalling. This options has the merit of already existing 477 but it also implies the pre-announcement and propagation of SSRCs and 478 the bloated signalling that this incurs. Also, relying solely on 479 Offer/Answer here would expose an offerer to the typical race 480 condition of repair SSRCs arriving before the answer and the 481 processing ambiguity that this would imply. 483 Another approach could be a combination of RTCP and RTP header 484 extensions [RFC5285] in a way similar to the one employed by the 485 Rapid Synchronisation of RTP Flows [RFC6051]. While such a mechanism 486 is not currently defined by the IETF, specifying it could be 487 relatively straightforward: 489 Every packet belonging to a repair flow could carry an RTP header 490 extension [RFC5285] that points to the source stream (or source layer 491 in case of layered mechanisms). 493 Again, these are just some possibilities. Different mechanisms may 494 and probably will require different extensions or signalling 495 ([SRCNAME] will likely be an option for some). In some cases, where 496 layering information is provided by the codec, an extensions is not 497 going to be necessary at all. 499 In cases where FEC or simulcast relations are not immediately needed 500 by the recipient, this information could also be delayed until the 501 reception of the first RTCP packet. 503 7. WebRTC API Requirements 505 One of the main characteristics of this specification is the use of 506 SDP for transport channel setup and media stack initialisation only. 507 In order for applications to be able to cover everything else it is 508 important that WebRTC APIs actually allow for it. Given the initial 509 directions taken by early implementations and specification work, 510 this is currently almost but not entirely possible. 512 The following is a list of requirements that the WebRTC APIs would 513 need to satisfy in order for this specification to be usable. (Note: 514 some of the items are already possible and are only included for the 515 sake of completeness.) 517 1. Expose the SSRCs of all local MediaStreamTrack-s that the 518 application attaches to a PeerConnection. 520 2. Expose the SSRCs of all remote MediaStreamTrack-s that are 521 received on a PeerConnection 523 3. Expose to applications all locally generated repair flows that 524 exist for a source (e.g. FEC and RTX flows that will be generated 525 for a webcam) their types relations and SSRCs. 527 4. Expose information about the maximum number of incoming streams 528 that can be decoded and rendered. 530 5. Applications should be able to pause and resume (disable and 531 enable) any MediaStreamTrack. This should also include the 532 possibility to do so for specific repair flows. 534 6. Information about how certain MediaStreamTrack-s relate to each 535 other (e.g. a given audio flow is related to a specific video 536 flow) may be exchanged by applications after media has started 537 arriving. At that point the corresponding MediaStreamTrack-s may 538 have been announced to the application within independent 539 MediaStream-s. It should therefore be possible for applications 540 to join such tracks within a single MediaStream. 542 The following section Section 7.1 provides suggestions for addressing 543 the above requirements. 545 7.1. Suggested WebRTC API Using TrackSendParams 547 This document proposes that the following methods and dictionaries be 548 added to the WebRTC API. The changes follow the model of 549 createDataChannel, which has a JS method on PeerConnection that makes 550 it possible to add data channels without going through SDP. 551 Furthermore, just like createDataChannel allows 2 ways to handle 552 neogitation (the "I know what I'm doing; Here's what I want to send; 553 Let me signal everything" mode and the "please take care of it for 554 me; send an OPEN message" mode), this also has 2 ways to handle 555 negotiation (the "I know what I'm doing; Here's what I want to send; 556 Let me signal everything" mode and the "please take care of it for 557 me; send SDP back and forth" mode). 559 Following the success of createDataChannel, this allows simple 560 applications to Just Work and more advanced applications to easily 561 control what they need to. In particular, it's possible to use this 562 API to implement either Plan A or Plan B. 564 // The following two method are added to RTCPeerConnection 565 partial interface RTCPeerConnection { 566 // Create a stream that is used to send a source stream. 567 // The MediaSendStream.description can be used for signalling. 568 // No media is sent until addStream(MediaSendStream) is called. 569 LocalMediaStream createLocalStream(MediaStream sourceStream); 570 // Create a stream that is used to receive media from the remote side, 571 // given the parameters signalled from MedaiSendStream.description. 572 MediaStream createRemoteStream(MediaStreamDescription description); 573 } 575 interface LocalMediaStream implements MediaStream { 576 // This can be changed at any time, but especially before calling 577 // PeerConnection.addStream 578 attribute MediaStreamDescription description; 579 } 581 // Represents the parameters used to either send or receive a stream 582 // over a PeerConnection. 583 dictionary MediaStreamDescription { 584 MediaStreamTrackDescription[] tracks; 585 } 587 // Represents the parameters used to either send or receive a track over 588 // a PeerConnection. A track has many "flows", which can be grouped 589 // together. 590 dictionary MediaStreamTrackDescription { 591 // Same as the MediaStreamTrack.id 592 DOMString id; 594 // Same as the MediaStreamTrack.kind 595 DOMString kind; 597 // A track can have many "flows", such as for Simulcast, FEC, etc. 598 // And they can be grouped in arbitrary ways. 599 MediaFlowDescription[] flows; 600 MediaFlowGroup[] flowGroups; 601 } 603 // Represents the parameters used to either send or receive a "flow" 604 // over a PeerConnection. A "flow" is a media that arrives with a 605 // single, unique SSRC. One to many flows together make up the media 606 // for a track. For example, there may be Simulcast, FEC, and RTX 607 // flows. 608 dictionay MediaFlowDescription { 609 // The "flow id" must be unique to the track, but need not be unique 610 // outside of the track (two tracks could both have a flow with the 611 // same flow ID). 612 DOMString id; 614 // Each flow can go over its own transport. If the JS sets this to a 615 // transportId that doesn't have a transport setup already, the 616 // browser will use SDP negotiation to setup a transport to back that 617 // transportId. If This is set to an MID in the SDP, then that MID's 618 // transport is used. 619 DOMString transportId; 621 // The SSRC used to send the flow. 622 unsigned int ssrc; 624 // When used as receive parameters, this indicates the possible list 625 // of codecs that might come in for this flow. For exmample, a given 626 // receive flow could be setup to receive any of OPUS, ISAC, or PCMU. 627 // When used as send parameters, this indicates that the first codec 628 // should be used, but the browser can use send other codecs if it 629 // needs to because of either bandwidth or CPU constraints. 630 MediaCodecDescription[] codecs; 631 } 633 dictionary MediaFlowGroup { 634 DOMString type; // "SIM" for Simulcast, "FEC" for FEC, etc 635 DOMString[] flowids; 636 } 638 dictionary MediaCodecDescription { 639 unsigned byte payloadType; 640 DOMString name; 641 unsigned int? clockRate; 642 unsigned int? bitRate; 643 // A grab bag of other fmtp that will need to be further defined. 644 MediaCodecParam[] params; 645 } 647 dictionary MediaCodecParam { 648 DOMString key; 649 DOMString value; 650 } 651 } 653 Some additional notes: 655 o When LocalMediaStreams are added using addStream, 656 onnegotiatedneeded is not called, and those streams are never 657 reflected in future SDP exchanges. Indeed, it would be impossible 658 to put them in the SDP without first resolving if that would be 659 Plan A SDP or Plan B SDP. 661 o Just like piles of attributes would need to be defined for Plan A 662 and for Plan B, similar attributes would need to be defined here 663 (Luckily, much work has already been done figuring out what those 664 parameters are :). 666 API Pros: 668 o Either Plan A or Plan B or could be implemented in Javascript 669 using this API 671 o It exposes all the same functionality to the Javascript as SDP, 672 but in a much nicer format that is much easier to work with. 674 o Any other signalling mechanism, such as Jingle or CLUE could be 675 implemented using this API. 677 o There is almost no risk of signalling glare. 679 o Debugging errors with misconfigured descriptions should be much 680 easier with this than with large SDP blobs. 682 API Cons: 684 o Now there are two slightly different ways to add streams: by 685 creating a LocalMediaStream first, and not. This is, however, 686 analogous to setting "negotiated: true" in createDataChannel. On 687 way is "Just Work", and the other is more advanced control. 689 o All the options in MediaCodecDescription are a bit complicated. 690 Really, this is only necessary because Plan A requires being able 691 to specify codec parameters per SSRC, and set each flow on 692 different transports. If we did not have this requirement, we 693 could simplify. 695 7.1.1. Example 2 697 Following is an example of how these API additions would be used: 699 // Imagine I have MyApp, handles creating a PeerConnection, 700 // signalling, and rendering streams. This is how the new API could be 701 // used. 702 var peerConnection = MyApp.createPeerConnection(); 704 // On sender side: 705 var stream = MyApp.getMediaStream(); 706 var localStream = peerConnection.createSendStream(stream); 707 sendStream.description = MyApp.modifyStream(localStream.description) 708 MyApp.signalAddStream(localStream.description, function(response)) { 709 if (!response.rejected) { 710 // Media will not be sent. 711 peerConnection.addStream(localStream); 712 } 713 } 715 // On receiver side: 716 MyApp.onAddStreamSignalled = function(streamDescription) { 717 var stream = peerConnection.createReceiveStream(streamDescription); 718 MyApp.renderStream(stream); 719 } 721 // In this exchange, the MediaStreamDescription signalled from the 722 // sender to the receiver may have looked something like this: 724 { 725 tracks: [ 726 { 727 id: "audio1", 728 kind: "audio", 729 flows: [ 730 { 731 id: "main", 732 transportId: "transport1", 733 ssrc: 1111, 734 codecs: [ 735 { 736 payloadType: 111, 737 name: "opus", 738 // ... more codec details 739 }, 740 { 741 payloadType: 112, 742 name: "pcmu", 743 // ... more codec details 744 }] 745 }] 746 }, 747 { 748 id: "video1", 749 kind: "video", 750 flows: [ 751 { 752 id: "sim0", 753 transportId: "transport2", 754 ssrc: 2222, 755 codecs: [ 756 { 757 payloadType: 122, 758 name: "vp8" 759 // ... more codec details 760 }] 761 }, 762 { 763 id: "sim1", 764 transportId: "transport2", 765 ssrc: 2223, 766 codecs: [ 767 { 768 payloadType: 122, 769 name: "vp8", 770 // ... more codec details 771 }] 772 }, 773 { 774 id: "sim2", 775 transportId: "transport2", 776 ssrc: 2224, 777 codecs: [ 778 { 779 payloadType: 122, 780 name: "vp8", 781 // ... more codec details 782 }] 783 }, 785 { 786 id: "sim0fec", 787 transportId: "transport2", 788 ssrc: 2225, 789 codecs: [ 790 { 791 payloadType: 122, 792 name: "vp8", 793 // ... 794 }] 795 }], 796 flowGroups: [ 797 { 798 semantics: "SIM", 799 ssrcs: [2222, 2223, 2224] 800 }, 801 { 802 semantics: "FEC", 803 ssrcs: [2222, 2225] 805 }] 806 }] 807 } 809 8. IANA Considerations 811 None. 813 9. Informative References 815 [CLUE] Duckworth, M., Pepperell, A., and S. Wenger, "Framework 816 for Telepresence Multi-Streams", reference.I-D.ietf-clue- 817 framework (work in progress), May 2013, . 820 [COIN] Ivov, E. and E. Marocco, "XEP-0298: Delivering Conference 821 Information to Jingle Participants (Coin)", XSF XEP 0298, 822 June 2011, . 824 [MAX-SSRC] 825 Westerlund, M., Burman, B., and F. Jansson, "Multiple 826 Synchronization sources (SSRC) in RTP Session Signaling ", 827 reference.I-D.westerlund-avtcore-max-ssrc (work in 828 progress), July 2012, . 831 [MSID] Alvestrand, H., "Cross Session Stream Identification in 832 the Session Description Protocol", reference.I-D.ietf- 833 mmusic-msid (work in progress), February 2013, 834 . 836 [PlanA] Roach, A. and M. Thomson, "Using SDP with Large Numbers of 837 Media Flows", reference.I-D.roach-rtcweb-plan-a (work in 838 progress), May 2013, . 840 [PlanB] Uberti, J., "Plan B: a proposal for signaling multiple 841 media sources in WebRTC.", reference.I-D.uberti-rtcweb- 842 plan (work in progress), May 2013, . 845 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 846 with Session Description Protocol (SDP)", RFC 3264, June 847 2002. 849 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 850 Video Conferences with Minimal Control", STD 65, RFC 3551, 851 July 2003. 853 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 854 Initiation Protocol (SIP) Event Package for Conference 855 State", RFC 4575, August 2006. 857 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 858 Header Extensions", RFC 5285, July 2008. 860 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 861 Control Packets on a Single Port", RFC 5761, April 2010. 863 [RFC5956] Begen, A., "Forward Error Correction Grouping Semantics in 864 the Session Description Protocol", RFC 5956, September 865 2010. 867 [RFC6015] Begen, A., "RTP Payload Format for 1-D Interleaved Parity 868 Forward Error Correction (FEC)", RFC 6015, October 2010. 870 [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP 871 Flows", RFC 6051, November 2010. 873 [ROACH-GLARELESS-ADD] 874 Roach, A., "An Approach for Adding RTCWEB Media Streams 875 without Glare", reference.I-D.roach-rtcweb-glareless-add 876 (work in progress), May 2013, . 879 [SRCNAME] Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 880 Item SRCNAME to Label Individual Sources ", reference.I-D 881 .westerlund-avtext-rtcp-sdes-srcname (work in progress), 882 October 2012, . 885 [TRICKLE-ICE] 886 Ivov, E., Rescorla, E., and J. Uberti, "Trickle ICE: 887 Incremental Provisioning of Candidates for the Interactive 888 Connectivity Establishment (ICE) Protocol ", reference.I-D 889 .ivov-mmusic-trickle-ice (work in progress), March 2013, 890 . 892 [XCON] , "Centralized Conferencing (XCON) Status Pages", , 893 . 895 Appendix A. Acknowledgements 896 Many thanks to Bernard Aboba and Mary Barnes, for reviewing this 897 document and providing numerous comments and substantial input. 899 Authors' Addresses 901 Emil Ivov 902 Jitsi 903 Strasbourg 67000 904 France 906 Phone: +33-177-624-330 907 Email: emcho@jitsi.org 909 Enrico Marocco 910 Telecom Italia 911 Via G. Reiss Romoli, 274 912 Turin 10148 913 Italy 915 Email: enrico.marocco@telecomitalia.it 917 Peter Thatcher 918 Google 919 747 6th St S 920 Kirkland, WA 98033 921 USA 923 Phone: +1 857 288 8888 924 Email: pthatcher@google.com