idnits 2.17.1 

draft-ivov-rtcweb-noplan-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** There are 11 instances of too long lines in the document, the longest
     one being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 421: '...ers WebRTC applications MUST therefore...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 17, 2013) is 3965 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '2222' on line 803

  -- Looks like a reference, but probably isn't: '2223' on line 799

  -- Looks like a reference, but probably isn't: '2224' on line 799

  -- Looks like a reference, but probably isn't: '2225' on line 803

  == Unused Reference: 'RFC6015' is defined on line 867, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 5285
     (Obsoleted by RFC 8285)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            E. Ivov
3	Internet-Draft                                                     Jitsi
4	Intended status: Standards Track                              E. Marocco
5	Expires: December 19, 2013                                Telecom Italia
6	                                                             P. Thatcher
7	                                                                  Google
8	                                                           June 17, 2013

10	  No Plan: Economical Use of the Offer/Answer Model in WebRTC Sessions
11	                      with Multiple Media Sources
12	                      draft-ivov-rtcweb-noplan-01

14	Abstract

16	   This document describes a model for the lightweight use of SDP Offer/
17	   Answer in WebRTC.  The goal is to minimize reliance on Offer/Answer
18	   exchanges in a WebRTC session and provide applications with the tools
19	   necessary to implement the signalling that they may need in a way
20	   that best fits their custom requirements and topologies.  This
21	   simplifies signalling of multiple media sources or providing RTP
22	   Synchronisation source (SSRC) identification in multi-party sessions.
23	   Another important goal of this model is to remove from clients
24	   topological constraints such as the requirement to know in advance
25	   all SSRC identifiers that they could potentially introduce in a
26	   particular session.

28	   The model described here is similar to the one employed by the data
29	   channel JavaScript APIs in WebRTC, where methods are supported on
30	   PeerConnection without being reflected in SDP.

32	   This document does not question the use of SDP and the Offer/Answer
33	   model or the value they have in terms of interoperability with legacy
34	   or other non-WebRTC devices.

36	Status of This Memo

38	   This Internet-Draft is submitted in full conformance with the
39	   provisions of BCP 78 and BCP 79.

41	   Internet-Drafts are working documents of the Internet Engineering
42	   Task Force (IETF).  Note that other groups may also distribute
43	   working documents as Internet-Drafts.  The list of current Internet-
44	   Drafts is at http://datatracker.ietf.org/drafts/current/.

46	   Internet-Drafts are draft documents valid for a maximum of six months
47	   and may be updated, replaced, or obsoleted by other documents at any
48	   time.  It is inappropriate to use Internet-Drafts as reference
49	   material or to cite them other than as "work in progress."

51	   This Internet-Draft will expire on December 19, 2013.

53	Copyright Notice

55	   Copyright (c) 2013 IETF Trust and the persons identified as the
56	   document authors.  All rights reserved.

58	   This document is subject to BCP 78 and the IETF Trust's Legal
59	   Provisions Relating to IETF Documents
60	   (http://trustee.ietf.org/license-info) in effect on the date of
61	   publication of this document.  Please review these documents
62	   carefully, as they describe your rights and restrictions with respect
63	   to this document.  Code Components extracted from this document must
64	   include Simplified BSD License text as described in Section 4.e of
65	   the Trust Legal Provisions and are provided without warranty as
66	   described in the Simplified BSD License.

68	Table of Contents

70	   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
71	   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
72	   3.  Reliance on Offer/Answer  . . . . . . . . . . . . . . . . . .   5
73	     3.1.  Interoperability with Legacy  . . . . . . . . . . . . . .   6
74	   4.  Additional Session Control and Signalling . . . . . . . . . .   8
75	   5.  Demultiplexing and Identifying Streams
76	       (Use of Bundle) . . . . . . . . . . . . . . . . . . . . . . .   9
77	   6.  Simulcasting, FEC, Layering and RTX (Open Issue)  . . . . . .  10
78	   7.  WebRTC API Requirements . . . . . . . . . . . . . . . . . . .  11
79	     7.1.  Suggested WebRTC API Using TrackSendParams  . . . . . . .  12
80	       7.1.1.  Example 2 . . . . . . . . . . . . . . . . . . . . . .  15
81	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
82	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  18
83	   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . .  19
84	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

86	1.  Background

88	   In its early stages the RTCWEB working group chose to use the Session
89	   Description Protocol (SDP) and the Offer/Answer model [RFC3264] when
90	   establishing and negotiating sessions.  This choice was also
91	   accompanied by the decision not to mandate a specific signalling
92	   protocol so that, once interoperability has been achieved, web
93	   applications can choose the semantics that best fit their
94	   requirements.  In some scenarios however, such as those involving the
95	   use of multiple media sources, these choices have left open the issue
96	   of exactly which operations should be handled by SDP Offer/Answer and
97	   which of them should be left to application-specific signalling.

99	   At the time of writing of this document, the RTCWEB working group is
100	   considering two approaches to addressing the issue, that are often
101	   referred to as Plan A [PlanA] and Plan B [PlanB].  Both of them
102	   describe semantics that require Offer/Answer exchanges in a number of
103	   situations where this could be avoided, particularly when adding or
104	   removing media sources to a session.  This requirement applies
105	   equally to cases where a client adds the stream of a newly activated
106	   web cam, a simulcast flow or upon the arrival or departure of a
107	   conference participant.

109	   Plan A handles such notifications with the addition or removal of
110	   independent m= lines [PlanA], while Plan B relies on the use of
111	   multiplexed m= lines but still depends on the Offer/Answer exchanges
112	   for the addition or removal of media stream identifiers [MSID].

114	   By taking the Offer/Answer approach, both Plan A and Plan B take away
115	   from the application the opportunity to handle such events in a way
116	   that is most fitting for the use case, which, among other things,
117	   also goes against the working group's decision to not to define a
118	   specific signalling protocol.  (It could be argued that it is
119	   therefore only natural how proponents of each plan, having different
120	   use cases in mind, are remarkably far from reaching consensus).

122	   Reliance on preliminary announcement of SSRC identifiers is another
123	   issue.  While this could be perceived as relatively straightforward
124	   in one-to-one sessions or even conference calls within controlled
125	   environments, it can be a problem in the following cases:

127	   o  interoperability with legacy/non-WebRTC endpoints

129	   o  use within non-controlled and potentially federated conference
130	      environments where new RTP streams may appear relatively often.
131	      In such cases the signalling required to describe all of them
132	      through Offer/Answer may represent substantial overhead while none
133	      or only a part of it (e.g.  the description of a main, active
134	      speaker stream) may be required by the application.

136	   By increasing the number of Offer/Answer exchanges Both Plan A and
137	   Plan B also increase the risk of encountering glare situations (i.e.
138	   cases where both parties attempt to modify a session at the same
139	   time).  While glare is also possible with basic Offer/Answer and
140	   resolution of such situations must be implemented anyway, the need to
141	   frequently resort to such code may either negatively impact user
142	   experience (e.g. when "back off" resolution is used) or require
143	   substantial modifications in the Offer/Answer model and/or further
144	   venturing into the land of signalling protocols
145	   [ROACH-GLARELESS-ADD].

147	2.  Introduction

149	   The goal of this document is to provide directions for use of the SDP
150	   Offer/Answer model in a way that satisfies the following
151	   requirements:

153	   o  the addition and removal of media sources (e.g. conference
154	      participants, multiple web cams or "slides" ) must be possible
155	      without the need of Offer/Answer exchanges;

157	   o  the addition or removal of simulcast or layered streams must be
158	      possible without the need for Offer/Answer exchanges beyond the
159	      initial declaration of such capabilities for either direction.

161	   o  call establishment must not require preliminary announcement or
162	      even knowledge of all potentially participating media sources;

164	   o  application specific signalling should be used to cover most
165	      semantics following call establishment, such as adding, removing
166	      or identifying SSRCs;

168	   o  straightforward interoperability with widely deployed legacy
169	      endpoints with rudimentary support for Offer/Answer.  This
170	      includes devices that allow for one audio and potentially one
171	      video m= line and that expect to only ever be required to render a
172	      single RTP stream at a time for any of them.  (Note that this does
173	      NOT include devices that expect to see multiple "m=video" lines
174	      for different SSRCs as they can hardly be viewed as "widely
175	      deployed legacy").

177	   To achieve the above requirements this specification expects that
178	   browsers and WebRTC endpoints in general will only use SDP Offer/
179	   Answer to establish transport channels and initialize an RTP stack
180	   and codec/processing chains.  This also includes any renegotiation
181	   that requires the re-initialisation of these chains.  For example,
182	   adding VP8 to a session that was setup with only H.264, would
183	   obviously still require an Offer/Answer exchange.

185	   All other session control and signalling are to be left to
186	   applications.

188	   The actual Offer/Answer semantics presented here do not differ
189	   fundamentally from those proposed by Plan A and Plan B. The main
190	   differentiation point of this approach is the fact that the exact
191	   protocol mechanism is left to WebRTC applications.  Such applications
192	   or lightweight signalling gateways can then implement either Plan A,
193	   or Plan B, or an entirely different signalling protocol, depending on
194	   what best matches their use cases and topology.

196	3.  Reliance on Offer/Answer

198	   The model presented in this specification relies on use of SDP and
199	   Offer/Answer in quite the same way as many of the pre-WebRTC (and
200	   most of the legacy) endpoints do: negotiating formats, establishing
201	   transport channels and exchanging, in a declarative way, media and
202	   transport parameters that are then used for the initialization of the
203	   corresponding stacks.

205	   The following is an example presenting what this specification views
206	   as a typical offer sent by a WebRTC endpoint:

208	   v=0
209	   o=- 0 0 IN IP4 198.51.100.33
210	   s=
211	   t=0 0

213	   a=group:BUNDLE audio video                // declaring BUNDLE Support
214	   c=IN IP4 198.51.100.33
215	   a=ice-ufrag:Qq8o/jZwknkmXpIh              // initializing ICE
216	   a=ice-pwd:gTMACiJcZv1xdPrjfbTHL5qo
217	   a=ice-options:trickle
218	   a=fingerprint:sha-1                       // DTLS-SRTP keying
219	         a4:b1:97:ab:c7:12:9b:02:12:b8:47:45:df:d8:3a:97:54:08:3f:16

221	   m=audio 5000 RTP/SAVPF 96 0 8
222	   a=mid:audio
223	   a=rtcp-mux

225	   a=rtpmap:96 opus/48000/2                  // PT mappings
226	   a=rtpmap:0 PCMU/8000
227	   a=rtpmap:8 PCMA/8000

229	   a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level  //5825 header
230	   a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level  //extensions

232	   [ICE Candidates]

234	   m=video 5002 RTP/SAVPF 97 98
235	   a=mid:video
236	   a=rtcp-mux
237	   a=rtpmap:97 VP8/90000     // PT mappings and resolutions capabilities
238	   a=imageattr:97 \
239	     send [x=[480:16:800],y=[320:16:640],par=[1.2-1.3],q=0.6] \
240	          [x=[176:8:208],y=[144:8:176],par=[1.2-1.3]] \
241	     recv *
242	   a=rtpmap:98 H264/90000
243	   a=imageattr:98 send [x=800,y=640,sar=1.1,q=0.6] [x=480,y=320] \
244	                  recv [x=330,y=250]

246	   a=extmap:3 urn:ietf:params:rtp-hdrext:fec-source-ssrc   //5825 header
247	   a=extmap:4 urn:ietf:params:rtp-hdrext:rtx-source-ssrc   //extensions

249	   a=max-send-ssrc:{*:1}                     // declaring maximum
250	   a=max-recv-ssrc:{*:4}                     // number of SSRCs

252	   [ICE Candidates]

254	   The answer to the offer above would have roughly the same structure
255	   and content.  The most important aspects here are:

257	   o  Preserves interoperability with most kinds of legacy or non-WebRTC
258	      endpoints.

260	   o  Allows the negotiation of most parameters that concern the media/
261	      RTP stack (typically the browser).

263	   o  Only a single Offer/Answer exchange is required for session
264	      establishment and, in most cases, for the entire duraftion of a
265	      session.

267	   o  Leaves complete freedom to applications as to the way that they
268	      are going to signal any other information such as SSRC
269	      identification information or the addition or removal of RTP
270	      streams.

272	3.1.  Interoperability with Legacy

274	   Interoperating with the "widely deployed legacy endpoints" is one of
275	   the main reasons for the RTCWEB working group to choose the SDP Offer
276	   /Answer model as basis for media negotiation.  It is hence important
277	   to clarify the compatibility claims that this specification makes.

279	   A "widely deployed legacy endpoint" is considered to have the
280	   following characteristics:

282	   o  Likely to use the SIP protocol.

284	   o  Capability to gracefully handle one audio and potentially one
285	      video m= line in an SDP Offer.

287	   o  Capability to render one SSRC per m=line at any given moment but
288	      multiple, consecutive SSRCs over a period of time.  This would be
289	      the case with transferred session replacements for example.  While
290	      the capability to handle multiple SSRCs simultaneously is not
291	      uncommon it cannot be relied upon and should first be confirmed by
292	      signalling.

294	   o  Possibly have features such as ICE, BUNDLE, RTCP-MUX, etc.  Just
295	      as likely not to.

297	   o  Very unlikely to announce in SDP the SSRCs that they intend to use
298	      for a given session.

300	   o  Exact set of features and capabilities: Guaranteed to be wildly
301	      and widely diverse.

303	   While it is relatively simple for RTCWEB to accommodate some of the
304	   above, it is obviously impossible to design a model that could simply
305	   be labeled as "compatible with legacy".  It is reasonable to assume
306	   that use cases involving use of such endpoints will be designed for a
307	   relatively specific set of devices and applications.  The role of the
308	   WebRTC framework is to hence provide a least-common-denominator model
309	   that can then be extended by applications.

311	   It is just as important not to make choices or assumptions that will
312	   render interoperability for some applications or topologies difficult
313	   or even impossible.

315	   This is exactly what the use of Offer/Answer discussed here strives
316	   to achieve.  Audio/Video offers originating from WebRTC endpoints
317	   will always have a maximum of one audio and one video m= line.  It
318	   will be up to applications to determine exactly how many streams they
319	   can afford to send once such a session has been established.  The
320	   exact mechanism to do this is outside the scope of this document (or
321	   WebRTC in general).

323	   Note that it is still possible for WebRTC endpoints to indicate
324	   support for a maximum number of incoming or outgoing streams for
325	   reasons such as processing constraints.  Use of the "max-send-ssrc"
326	   and "max-recv-ssrc" attributes [MAX-SSRC] could be one way of doing
327	   this, although that mechanism would need to be extended to provide
328	   ways of distinguishing between independent flows and complementary
329	   ones such as layered FEC and RTX.  Even with this in mind it is still
330	   important, not to rely on the presence of that indication in incoming
331	   descriptions as well as to provide applications with a way of
332	   retrieving such capabilities from the WebRTC stack (e.g. the
333	   browser).

335	   Determining whether a peer has the ability to seamlessly switch from
336	   one SSRC to another is also left to application specific signalling.
337	   It is worth noting that protocols such as SIP for example, often
338	   accompany SSRC replacements with extra signalling (re-INVITEs with a
339	   "replaces" header) that can easily be reused by applications or
340	   mapped to something that they deem more convenient.

342	   For the sake of interoperability this specification strongly advises
343	   against the use of multiple m= lines for a single media type.  Not
344	   only would such use be meaningless to a large number of legacy
345	   endpoints but it is also likely to be mishandled by many of them and
346	   to cause unexpected behaviour.

348	   Finally, it is also worth pointing out that there is a significant
349	   number of feature rich non-WebRTC applications and devices that have
350	   relatively advanced, modern sets of capabilities.  Such endpoints
351	   hardly fit the "legacy" qualification.  Yet, as is often the case
352	   with novel and/or proprietary applications, they too have adopted
353	   diverse signalling mechanisms and the requirements described in this
354	   section fully apply when it comes to interoperating with them.

356	4.  Additional Session Control and Signalling

358	   o  Adding and removing RTP streams to an existing session.

360	   o  Accepting and refusing some of them.

362	   o  Identifying SSRCs and obtaining additional metadata for them (e.g.
363	      the user corresponding to a specific SSRC).

365	   All of the above semantics are best handled and hence should be left
366	   to applications.  There are numerous existing or emerging solutions,
367	   some of them developed by the IETF, that already cover this.  This
368	   includes CLUE channels [CLUE], the SIP Event Package For Conference
369	   State [RFC4575] and its XMPP variant [COIN] as well as the protocols
370	   defined within the Centralised Conferencing IETF working group [XCON]
371	   . Additional mechanisms, undoubtedly many based on JSON, are very
372	   likely to emerge in the future as WebRTC applications address varying
373	   use cases, scenarios and topologies.

375	   The most important part of this specification is hence to prevent
376	   certain assumptions or topologies from being imposed on applications.
377	   One example of this is the need to know and include in the Offer/
378	   Answer exchange, all the SSRCs that can show up in a session.  This
379	   can be particularly problematic for scenarios that involve non-WebRTC
380	   endpoints.

382	   Large scale conference calls, potentially federated through RTP
383	   translator-like bridges, would be another problematic scenario.
384	   Being able to always pre-announce SSRCs in such situations could of
385	   course be made to work but it would come at a price.  It would either
386	   require a very high number of Offer/Answer updates that propagate the
387	   information through the entire topology, or use of tricks such as
388	   pre-allocating a range of "fake" SSRCs, announcing them to
389	   participants and then overwriting the actual SSRCs with them.
390	   Depending on the scenario both options could prove inappropriate or
391	   inefficient while some applications may not even need such
392	   information.  Others could be retrieving it through simplistic means
393	   such as access to a centralized resource (e.g. an URL pointing to a
394	   JSON description of the conference).

396	5.  Demultiplexing and Identifying Streams (Use of Bundle)

398	   This document assumes use of BUNDLE in WebRTC endpoints.  This
399	   implies that all RTP streams are likely to end up being received on
400	   the same port.  A demuxing mechanism is therefore necessary in order
401	   for these packets to then be fed into the appropriate processing
402	   chain (i.e. matched to an m= line).

404	      Note: it is important to distinguish between the demultiplexing
405	      and the identification of incoming flows.  Throughout this
406	      specification the former is used to refer to the process of
407	      choosing selecting a depacketizing/decoding/processing chain to
408	      feed incoming packets to.  Such decisions depend solely on the
409	      format that is used to encode the content of incoming packets.

411	      The above is not to be confused with the process of making
412	      rendering decision about a processed flow.  Such decisions include
413	      showing a "current speaker" flow at a specific location, window or
414	      video tag, while choosing a different one for a second, "slides"
415	      flow.  Another example would be the possibility to attach "Alice",
416	      "Bob" and "Carol" labels on top of the appropriate UI components.
417	      This specification leaves such rendering choices entirely to
418	      application-specific signalling as described in Section 4.

420	   This specification uses demuxing based on RTP payload types.  When
421	   creating offers and answers WebRTC applications MUST therefore
422	   allocate RTP payload types only once per bundle group.  In cases
423	   where rtcp-mux is in use this would mean a maximum of 96 payload
424	   types per bundle [RFC5761].  It has been pointed out that some legacy
425	   devices may have unpredictable behaviour with payload types that are
426	   outside the 96-127 range reserved by [RFC3551] for dynamic use.  Some
427	   applications or implementations may therefore choose not to use
428	   values outside this range.  Whatever the reason, offerers that find
429	   they need more than the available payload type numbers, will simply
430	   need to either use a second bundle group or not use BUNDLE at all
431	   (which in the case of a single audio and a single video m= line
432	   amounts to roughly the same thing).  This would also imply building a
433	   dynamic table, mapping SSRCs to PTs and m= lines, in order to then
434	   also allow for RTCP demuxing.

436	   While not desirable, the implications of such a decision would be
437	   relatively limited.  Use of trickle ICE [TRICKLE-ICE] is going to
438	   lessen the impact on call establishment latency.  Also, the fact that
439	   this would only occur in a limited number of cases makes it unlikely
440	   to have a significant effect on port consumption.

442	   An additional requirement that has been expressed toward demuxing is
443	   the ability to assign incoming packets with the same payload type to
444	   different processing chains depending on their SSRCs.  A possible
445	   example for this is a scenario where two video streams are being
446	   rendered on different video screens that each have their own decoding
447	   hardware.

449	   While the above may appear as a demuxing and a decoding related
450	   problem it is really mostly a rendering policy specific to an
451	   application.  As such it should be handled by app. specific
452	   signalling that could involve custom-formatted, per-SSRC information
453	   that accompanies SDP offers and answers.

455	6.  Simulcasting, FEC, Layering and RTX (Open Issue)

457	   From a WebRTC perspective, repair flows such as layering, FEC, RTX
458	   and to some extent simulcasting, present an interesting challenge,
459	   which is why they are considered an open issue by this specification.

461	   On the one hand they are transport utilities that need to be
462	   understood, supported and used by browsers in a way that is mostly
463	   transparent to applications.  On the other, some applications may
464	   need to be made aware of them and given the option to control their
465	   use.  This could be necessary in cases where their use needs to be
466	   signalled to non-WebRTC endpoints in an application specific way.
467	   Another example is the possibility for an application to choose to
468	   disable some or all repair flows because it has been made aware by
469	   application-specific signalling that they are temporarily not being
470	   used/rendered by the remote end (e.g. because it is only displaying a
471	   thumbnail or because a corresponding video tag is not currently
472	   visible).

474	   One way of handling such flows would be to advertise them in the way
475	   suggested by [RFC5956] and to then control them through application
476	   specific signalling.  This options has the merit of already existing
477	   but it also implies the pre-announcement and propagation of SSRCs and
478	   the bloated signalling that this incurs.  Also, relying solely on
479	   Offer/Answer here would expose an offerer to the typical race
480	   condition of repair SSRCs arriving before the answer and the
481	   processing ambiguity that this would imply.

483	   Another approach could be a combination of RTCP and RTP header
484	   extensions [RFC5285] in a way similar to the one employed by the
485	   Rapid Synchronisation of RTP Flows [RFC6051].  While such a mechanism
486	   is not currently defined by the IETF, specifying it could be
487	   relatively straightforward:

489	   Every packet belonging to a repair flow could carry an RTP header
490	   extension [RFC5285] that points to the source stream (or source layer
491	   in case of layered mechanisms).

493	   Again, these are just some possibilities.  Different mechanisms may
494	   and probably will require different extensions or signalling
495	   ([SRCNAME] will likely be an option for some).  In some cases, where
496	   layering information is provided by the codec, an extensions is not
497	   going to be necessary at all.

499	   In cases where FEC or simulcast relations are not immediately needed
500	   by the recipient, this information could also be delayed until the
501	   reception of the first RTCP packet.

503	7.  WebRTC API Requirements

505	   One of the main characteristics of this specification is the use of
506	   SDP for transport channel setup and media stack initialisation only.
507	   In order for applications to be able to cover everything else it is
508	   important that WebRTC APIs actually allow for it.  Given the initial
509	   directions taken by early implementations and specification work,
510	   this is currently almost but not entirely possible.

512	   The following is a list of requirements that the WebRTC APIs would
513	   need to satisfy in order for this specification to be usable.  (Note:
514	   some of the items are already possible and are only included for the
515	   sake of completeness.)

517	   1.  Expose the SSRCs of all local MediaStreamTrack-s that the
518	       application attaches to a PeerConnection.

520	   2.  Expose the SSRCs of all remote MediaStreamTrack-s that are
521	       received on a PeerConnection

523	   3.  Expose to applications all locally generated repair flows that
524	       exist for a source (e.g. FEC and RTX flows that will be generated
525	       for a webcam) their types relations and SSRCs.

527	   4.  Expose information about the maximum number of incoming streams
528	       that can be decoded and rendered.

530	   5.  Applications should be able to pause and resume (disable and
531	       enable) any MediaStreamTrack.  This should also include the
532	       possibility to do so for specific repair flows.

534	   6.  Information about how certain MediaStreamTrack-s relate to each
535	       other (e.g. a given audio flow is related to a specific video
536	       flow) may be exchanged by applications after media has started
537	       arriving.  At that point the corresponding MediaStreamTrack-s may
538	       have been announced to the application within independent
539	       MediaStream-s. It should therefore be possible for applications
540	       to join such tracks within a single MediaStream.

542	   The following section Section 7.1 provides suggestions for addressing
543	   the above requirements.

545	7.1.  Suggested WebRTC API Using TrackSendParams

547	   This document proposes that the following methods and dictionaries be
548	   added to the WebRTC API.  The changes follow the model of
549	   createDataChannel, which has a JS method on PeerConnection that makes
550	   it possible to add data channels without going through SDP.
551	   Furthermore, just like createDataChannel allows 2 ways to handle
552	   neogitation (the "I know what I'm doing; Here's what I want to send;
553	   Let me signal everything" mode and the "please take care of it for
554	   me; send an OPEN message" mode), this also has 2 ways to handle
555	   negotiation (the "I know what I'm doing; Here's what I want to send;
556	   Let me signal everything" mode and the "please take care of it for
557	   me; send SDP back and forth" mode).

559	   Following the success of createDataChannel, this allows simple
560	   applications to Just Work and more advanced applications to easily
561	   control what they need to.  In particular, it's possible to use this
562	   API to implement either Plan A or Plan B.

564	   // The following two method are added to RTCPeerConnection
565	   partial interface RTCPeerConnection {
566	    // Create a stream that is used to send a source stream.
567	    // The MediaSendStream.description can be used for signalling.
568	    // No media is sent until addStream(MediaSendStream) is called.
569	    LocalMediaStream createLocalStream(MediaStream sourceStream);
570	    // Create a stream that is used to receive media from the remote side,
571	    // given the parameters signalled from MedaiSendStream.description.
572	    MediaStream createRemoteStream(MediaStreamDescription description);
573	   }

575	   interface LocalMediaStream implements MediaStream {
576	     // This can be changed at any time, but especially before calling
577	     // PeerConnection.addStream
578	     attribute MediaStreamDescription description;
579	   }

581	   // Represents the parameters used to either send or receive a stream
582	   // over a PeerConnection.
583	   dictionary MediaStreamDescription {
584	     MediaStreamTrackDescription[] tracks;
585	   }

587	   // Represents the parameters used to either send or receive a track over
588	   // a PeerConnection.  A track has many "flows", which can be grouped
589	   // together.
590	   dictionary MediaStreamTrackDescription {
591	     // Same as the MediaStreamTrack.id
592	     DOMString id;

594	     // Same as the MediaStreamTrack.kind
595	     DOMString kind;

597	     // A track can have many "flows", such as for Simulcast, FEC, etc.
598	     // And they can be grouped in arbitrary ways.
599	     MediaFlowDescription[] flows;
600	     MediaFlowGroup[] flowGroups;
601	   }

603	   // Represents the parameters used to either send or receive a "flow"
604	   // over a PeerConnection.  A "flow" is a media that arrives with a
605	   // single, unique SSRC.  One to many flows together make up the media
606	   // for a track.  For example, there may be Simulcast, FEC, and RTX
607	   // flows.
608	   dictionay MediaFlowDescription {
609	     // The "flow id" must be unique to the track, but need not be unique
610	     // outside of the track (two tracks could both have a flow with the
611	     // same flow ID).
612	     DOMString id;

614	     // Each flow can go over its own transport.  If the JS sets this to a
615	     // transportId that doesn't have a transport setup already, the
616	     // browser will use SDP negotiation to setup a transport to back that
617	     // transportId.  If This is set to an MID in the SDP, then that MID's
618	     // transport is used.
619	     DOMString transportId;

621	     // The SSRC used to send the flow.
622	     unsigned int ssrc;

624	     // When used as receive parameters, this indicates the possible list
625	     // of codecs that might come in for this flow.  For exmample, a given
626	     // receive flow could be setup to receive any of OPUS, ISAC, or PCMU.
627	     // When used as send parameters, this indicates that the first codec
628	     // should be used, but the browser can use send other codecs if it
629	     // needs to because of either bandwidth or CPU constraints.
630	     MediaCodecDescription[] codecs;
631	   }

633	   dictionary MediaFlowGroup {
634	     DOMString type;  // "SIM" for Simulcast, "FEC" for FEC, etc
635	     DOMString[] flowids;
636	   }

638	   dictionary MediaCodecDescription {
639	     unsigned byte payloadType;
640	     DOMString name;
641	     unsigned int? clockRate;
642	     unsigned int? bitRate;
643	     // A grab bag of other fmtp that will need to be further defined.
644	     MediaCodecParam[] params;
645	   }

647	   dictionary MediaCodecParam {
648	     DOMString key;
649	     DOMString value;
650	   }
651	   }

653	   Some additional notes:

655	   o  When LocalMediaStreams are added using addStream,
656	      onnegotiatedneeded is not called, and those streams are never
657	      reflected in future SDP exchanges.  Indeed, it would be impossible
658	      to put them in the SDP without first resolving if that would be
659	      Plan A SDP or Plan B SDP.

661	   o  Just like piles of attributes would need to be defined for Plan A
662	      and for Plan B, similar attributes would need to be defined here
663	      (Luckily, much work has already been done figuring out what those
664	      parameters are :).

666	   API Pros:

668	   o  Either Plan A or Plan B or could be implemented in Javascript
669	      using this API

671	   o  It exposes all the same functionality to the Javascript as SDP,
672	      but in a much nicer format that is much easier to work with.

674	   o  Any other signalling mechanism, such as Jingle or CLUE could be
675	      implemented using this API.

677	   o  There is almost no risk of signalling glare.

679	   o  Debugging errors with misconfigured descriptions should be much
680	      easier with this than with large SDP blobs.

682	   API Cons:

684	   o  Now there are two slightly different ways to add streams: by
685	      creating a LocalMediaStream first, and not.  This is, however,
686	      analogous to setting "negotiated: true" in createDataChannel.  On
687	      way is "Just Work", and the other is more advanced control.

689	   o  All the options in MediaCodecDescription are a bit complicated.
690	      Really, this is only necessary because Plan A requires being able
691	      to specify codec parameters per SSRC, and set each flow on
692	      different transports.  If we did not have this requirement, we
693	      could simplify.

695	7.1.1.  Example 2

697	   Following is an example of how these API additions would be used:

699	   // Imagine I have MyApp, handles creating a PeerConnection,
700	   // signalling, and rendering streams.  This is how the new API could be
701	   // used.
702	   var peerConnection = MyApp.createPeerConnection();

704	   // On sender side:
705	   var stream = MyApp.getMediaStream();
706	   var localStream = peerConnection.createSendStream(stream);
707	   sendStream.description = MyApp.modifyStream(localStream.description)
708	   MyApp.signalAddStream(localStream.description, function(response)) {
709	     if (!response.rejected) {
710	       // Media will not be sent.
711	       peerConnection.addStream(localStream);
712	     }
713	   }

715	   // On receiver side:
716	   MyApp.onAddStreamSignalled = function(streamDescription) {
717	     var stream = peerConnection.createReceiveStream(streamDescription);
718	     MyApp.renderStream(stream);
719	   }

721	   // In this exchange, the MediaStreamDescription signalled from the
722	   // sender to the receiver may have looked something like this:

724	   {
725	     tracks: [
726	     {
727	       id: "audio1",
728	       kind: "audio",
729	       flows: [
730	       {
731	         id: "main",
732	         transportId: "transport1",
733	         ssrc: 1111,
734	         codecs: [
735	         {
736	           payloadType: 111,
737	           name: "opus",
738	           // ... more codec details
739	         },
740	         {
741	           payloadType: 112,
742	           name: "pcmu",
743	           // ... more codec details
744	         }]
745	      }]
746	    },
747	    {
748	       id: "video1",
749	       kind: "video",
750	       flows: [
751	       {
752	         id: "sim0",
753	         transportId: "transport2",
754	         ssrc: 2222,
755	         codecs: [
756	         {
757	           payloadType: 122,
758	           name: "vp8"
759	           // ... more codec details
760	         }]
761	      },
762	      {
763	        id: "sim1",
764	        transportId: "transport2",
765	        ssrc: 2223,
766	        codecs: [
767	        {
768	          payloadType: 122,
769	          name: "vp8",
770	          // ... more codec details
771	        }]
772	      },
773	      {
774	        id: "sim2",
775	        transportId: "transport2",
776	        ssrc: 2224,
777	        codecs: [
778	        {
779	          payloadType: 122,
780	          name: "vp8",
781	          // ... more codec details
782	        }]
783	      },

785	      {
786	        id: "sim0fec",
787	        transportId: "transport2",
788	        ssrc: 2225,
789	        codecs: [
790	        {
791	          payloadType: 122,
792	          name: "vp8",
793	          // ...
794	        }]
795	      }],
796	      flowGroups: [
797	      {
798	        semantics: "SIM",
799	        ssrcs: [2222, 2223, 2224]
800	      },
801	      {
802	        semantics: "FEC",
803	        ssrcs: [2222, 2225]

805	      }]
806	    }]
807	   }

809	8.  IANA Considerations

811	   None.

813	9.  Informative References

815	   [CLUE]     Duckworth, M., Pepperell, A., and S. Wenger, "Framework
816	              for Telepresence Multi-Streams", reference.I-D.ietf-clue-
817	              framework (work in progress), May 2013, <reference.I-D
818	              .ietf-clue-framework>.

820	   [COIN]     Ivov, E. and E. Marocco, "XEP-0298: Delivering Conference
821	              Information to Jingle Participants (Coin)", XSF XEP 0298,
822	              June 2011, <reference.I-D.ietf-coin-framework>.

824	   [MAX-SSRC]
825	              Westerlund, M., Burman, B., and F. Jansson, "Multiple
826	              Synchronization sources (SSRC) in RTP Session Signaling ",
827	              reference.I-D.westerlund-avtcore-max-ssrc (work in
828	              progress), July 2012, <reference.I-D.westerlund-avtcore-
829	              max-ssrc>.

831	   [MSID]     Alvestrand, H., "Cross Session Stream Identification in
832	              the Session Description Protocol", reference.I-D.ietf-
833	              mmusic-msid (work in progress), February 2013,
834	              <reference.I-D.ietf-mmusic-msid>.

836	   [PlanA]    Roach, A. and M. Thomson, "Using SDP with Large Numbers of
837	              Media Flows", reference.I-D.roach-rtcweb-plan-a (work in
838	              progress), May 2013, <reference.I-D.roach-rtcweb-plan-a>.

840	   [PlanB]    Uberti, J., "Plan B: a proposal for signaling multiple
841	              media sources in WebRTC.", reference.I-D.uberti-rtcweb-
842	              plan (work in progress), May 2013, <reference.I-D.uberti-
843	              rtcweb-plan>.

845	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
846	              with Session Description Protocol (SDP)", RFC 3264, June
847	              2002.

849	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
850	              Video Conferences with Minimal Control", STD 65, RFC 3551,
851	              July 2003.

853	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
854	              Initiation Protocol (SIP) Event Package for Conference
855	              State", RFC 4575, August 2006.

857	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
858	              Header Extensions", RFC 5285, July 2008.

860	   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
861	              Control Packets on a Single Port", RFC 5761, April 2010.

863	   [RFC5956]  Begen, A., "Forward Error Correction Grouping Semantics in
864	              the Session Description Protocol", RFC 5956, September
865	              2010.

867	   [RFC6015]  Begen, A., "RTP Payload Format for 1-D Interleaved Parity
868	              Forward Error Correction (FEC)", RFC 6015, October 2010.

870	   [RFC6051]  Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
871	              Flows", RFC 6051, November 2010.

873	   [ROACH-GLARELESS-ADD]
874	              Roach, A., "An Approach for Adding RTCWEB Media Streams
875	              without Glare", reference.I-D.roach-rtcweb-glareless-add
876	              (work in progress), May 2013, <reference.I-D.roach-rtcweb-
877	              glareless-add>.

879	   [SRCNAME]  Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES
880	              Item SRCNAME to Label Individual Sources ", reference.I-D
881	              .westerlund-avtext-rtcp-sdes-srcname (work in progress),
882	              October 2012, <reference.I-D.westerlund-avtext-rtcp-sdes-
883	              srcname>.

885	   [TRICKLE-ICE]
886	              Ivov, E., Rescorla, E., and J. Uberti, "Trickle ICE:
887	              Incremental Provisioning of Candidates for the Interactive
888	              Connectivity Establishment (ICE) Protocol ", reference.I-D
889	              .ivov-mmusic-trickle-ice (work in progress), March 2013,
890	              <reference.I-D.ivov-mmusic-trickle-ice>.

892	   [XCON]     , "Centralized Conferencing (XCON) Status Pages", ,
893	              <http://tools.ietf.org/wg/xcon/>.

895	Appendix A.  Acknowledgements
896	   Many thanks to Bernard Aboba and Mary Barnes, for reviewing this
897	   document and providing numerous comments and substantial input.

899	Authors' Addresses

901	   Emil Ivov
902	   Jitsi
903	   Strasbourg  67000
904	   France

906	   Phone: +33-177-624-330
907	   Email: emcho@jitsi.org

909	   Enrico Marocco
910	   Telecom Italia
911	   Via G. Reiss Romoli, 274
912	   Turin  10148
913	   Italy

915	   Email: enrico.marocco@telecomitalia.it

917	   Peter Thatcher
918	   Google
919	   747 6th St S
920	   Kirkland, WA  98033
921	   USA

923	   Phone: +1 857 288 8888
924	   Email: pthatcher@google.com