idnits 2.17.1 

draft-ivov-rtcweb-noplan-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** There is 1 instance of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 419: '...ers WebRTC applications MUST therefore...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 29, 2013) is 3986 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 5285
     (Obsoleted by RFC 8285)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            E. Ivov
3	Internet-Draft                                                     Jitsi
4	Intended status: Standards Track                            May 29, 2013
5	Expires: November 30, 2013

7	  No Plan: Economical Use of the Offer/Answer Model in WebRTC Sessions
8	                      with Multiple Media Sources
9	                      draft-ivov-rtcweb-noplan-00

11	Abstract

13	   This document describes a model for the lightweight use of SDP Offer/
14	   Answer in WebRTC.  The goal is to minimize reliance on Offer/Answer
15	   exchanges in a WebRTC session and provide applications with the tools
16	   necessary to implement the signalling that they may need in a way
17	   that best fits their custom requirements and topologies.  This
18	   simplifies signalling of multiple media sources or providing RTP
19	   Synchronisation source (SSRC) identification in multi-party sessions.
20	   Another important goal of this model is to remove from clients
21	   topological constraints such as the requirement to know in advance
22	   all SSRC identifiers that they could potentially introduce in a
23	   particular session.

25	   This document does not question the use of SDP and the Offer/Answer
26	   model or the value they have in terms of interoperability with legacy
27	   or other non-WebRTC devices.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on November 30, 2013.

46	Copyright Notice
47	   Copyright (c) 2013 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
63	   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
64	   3.  Reliance on Offer/Answer  . . . . . . . . . . . . . . . . . .   5
65	     3.1.  Interoperability with Legacy  . . . . . . . . . . . . . .   6
66	   4.  Additional Session Control and Signalling . . . . . . . . . .   8
67	   5.  Demultiplexing and Identifying Streams (Use of Bundle)  . . .   9
68	   6.  Simulcasting, FEC, Layering and RTX (Open Issue)  . . . . . .  10
69	   7.  WebRTC API Requirements . . . . . . . . . . . . . . . . . . .  12
70	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  13
71	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  13
72	   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . .  14
73	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  14

75	1.  Background

77	   In its early stages the RTCWEB working group chose to use the Session
78	   Description Protocol (SDP) and the Offer/Answer model [RFC3264] when
79	   establishing and negotiating sessions.  This choice was also
80	   accompanied by the decision not to mandate a specific signalling
81	   protocol so that, once interoperability has been achieved, web
82	   applications can choose the semantics that best fit their
83	   requirements.  In some scenarios however, such as those involving the
84	   use of multiple media sources, these choices have left open the issue
85	   of exactly which operations should be handled by SDP Offer/Answer and
86	   which of them should be left to application-specific signalling.

88	   At the time of writing of this document, the RTCWEB working group is
89	   considering two approaches to addressing the issue, that are often
90	   referred to as Plan A [PlanA] and Plan B [PlanB].  Both of them
91	   describe semantics that require Offer/Answer exchanges in a number of
92	   situations where this could be avoided, particularly when adding or
93	   removing media sources to a session.  This requirement applies
94	   equally to cases where a client adds the stream of a newly activated
95	   web cam, a simulcast flow or upon the arrival or departure of a
96	   conference participant.

98	   Plan A handles such notifications with the addition or removal of
99	   independent m= lines [PlanA], while Plan B relies on the use of
100	   multiplexed m= lines but still depends on the Offer/Answer exchanges
101	   for the addition or removal of media stream identifiers [MSID].

103	   By taking the Offer/Answer approach, both Plan A and Plan B take away
104	   from the application the opportunity to handle such events in a way
105	   that is most fitting for the use case, which, among other things,
106	   also goes against the working group's decision to not to define a
107	   specific signalling protocol.  (It could be argued that it is
108	   therefore only natural how proponents of each plan, having different
109	   use cases in mind, are remarkably far from reaching consensus).

111	   Another problem, more specific to Plan B, is the reliance on
112	   preliminary announcement of SSRC identifiers for stream
113	   identification.  Why this could be perceived as relatively
114	   straightforward in one-to-one sessions or even conference calls
115	   within controlled environments, it can be a problem in the following
116	   cases:

118	   o  interoperability with legacy/non-WebRTC endpoints

120	   o  use within non-controlled and potentially federated conference
121	      environments where new RTP streams may appear relatively often.
122	      In such cases the signalling required to describe all of them
123	      through Offer/Answer may represent substantial overhead while none
124	      or only a part of it (e.g.  the description of a main, active
125	      speaker stream) may be required by the application.

127	   By increasing the number of Offer/Answer exchanges Both Plan A and
128	   Plan B also increase the risk of encountering glare situations (i.e.
129	   cases where both parties attempt to modify a session at the same
130	   time).  While glare is also possible with basic Offer/Answer and
131	   resolution of such situations must be implemented anyway, the need to
132	   frequently resort to such code may either negatively impact user
133	   experience (e.g.  when "back off" resolution is used) or require
134	   substantial modifications in the Offer/Answer model and/or further
135	   venturing into the land of signalling protocols
136	   [ROACH-GLARELESS-ADD].

138	   Finally, both Plan A and Plan B, also create expectations that fine
139	   grained control of FEC, layering and RTX flows will always be
140	   implemented through Offer/Answer, which would not necessarily the
141	   best way to handle this in congested situations.

143	2.  Introduction

145	   The goal of this document is to provide directions for use of the SDP
146	   Offer/Answer model in a way that satisfies the following
147	   requirements:

149	   o  the addition and removal of media sources (e.g.  conference
150	      participants, multiple web cams or "slides" ) must be possible
151	      without the need of Offer/Answer exchanges;

153	   o  the addition or removal of simulcast or layered streams must be
154	      possible without the need for Offer/Answer exchanges beyond the
155	      initial declaration of such capabilities for either direction.

157	   o  call establishment must not require preliminary announcement or
158	      even knowledge of all potentially participating media sources;

160	   o  application specific signalling should be used to cover most
161	      semantics following call establishment, such as adding, removing
162	      or identifying SSRCs;

164	   o  straightforward interoperability with widely deployed legacy
165	      endpoints with rudimentary support for Offer/Answer.  This
166	      includes devices that allow for one audio and potentially one
167	      video m= line and that expect to only ever be required to render a
168	      single RTP stream at a time for any of them.  (Note that this does
169	      NOT include devices that expect to see multiple "m=video" lines
170	      for different SSRCs as they can hardly be viewed as "widely
171	      deployed legacy").

173	   To achieve the above requirements this specification expects that
174	   browsers and WebRTC endpoints in general will only use SDP Offer/
175	   Answer to establish transport channels and initialize an RTP stack
176	   and codec/processing chains.  This also includes any renegotiation
177	   that requires the re-initialisation of these chains.  For example,
178	   adding VP8 to a session that was setup with only H.264, would
179	   obviously still require an Offer/Answer exchange.

181	   All other session control and signalling are to be left to
182	   applications.

184	   The actual Offer/Answer semantics presented here do not differ
185	   fundamentally from those proposed by Plan A and Plan B.  The main
186	   differentiation point of this approach is the fact that the exact
187	   protocol mechanism is left to WebRTC applications.  Such applications
188	   or lightweight signalling gateways can then implement either Plan A,
189	   or Plan B, or an entirely different signalling protocol, depending on
190	   what best matches their use cases and topology.

192	3.  Reliance on Offer/Answer

194	   The model presented in this specification relies on use of SDP and
195	   Offer/Answer in quite the same way as many of the pre-WebRTC (and
196	   most of the legacy) endpoints do: negotiating formats, establishing
197	   transport channels and exchanging, in a declarative way, media and
198	   transport parameters that are then used for the initialization of the
199	   corresponding stacks.

201	   The following is an example presenting what this specification views
202	   as a typical offer sent by a WebRTC endpoint:

204	   v=0
205	   o=- 0 0 IN IP4 198.51.100.33
206	   s=
207	   t=0 0

209	   a=group:BUNDLE audio video                // declaring BUNDLE Support
210	   c=IN IP4 198.51.100.33
211	   a=ice-ufrag:Qq8o/jZwknkmXpIh              // initializing ICE
212	   a=ice-pwd:gTMACiJcZv1xdPrjfbTHL5qo
213	   a=ice-options:trickle
214	   a=fingerprint:sha-1                       // DTLS-SRTP keying
215	         a4:b1:97:ab:c7:12:9b:02:12:b8:47:45:df:d8:3a:97:54:08:3f:16

217	   m=audio 5000 RTP/SAVPF 96 0 8
218	   a=mid:audio
219	   a=rtcp-mux

221	   a=rtpmap:96 opus/48000/2                  // PT mappings
222	   a=rtpmap:0 PCMU/8000
223	   a=rtpmap:8 PCMA/8000

225	   a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level  //5825 header
226	   a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level  //extensions

228	   [ICE Candidates]

230	   m=video 5002 RTP/SAVPF 97 98
231	   a=mid:video
232	   a=rtcp-mux

234	   a=rtpmap:97 VP8/90000        // PT mappings and resolutions capabilities
235	   a=imageattr:97 \
236	     send [x=[480:16:800],y=[320:16:640],par=[1.2-1.3],q=0.6] \
237	          [x=[176:8:208],y=[144:8:176],par=[1.2-1.3]] \

239	     recv *
240	   a=rtpmap:98 H264/90000
241	   a=imageattr:98 send [x=800,y=640,sar=1.1,q=0.6] [x=480,y=320] \
242	                  recv [x=330,y=250]

244	   a=extmap:3 urn:ietf:params:rtp-hdrext:fec-source-ssrc   //5825 header
245	   a=extmap:4 urn:ietf:params:rtp-hdrext:rtx-source-ssrc   //extensions

247	   a=max-send-ssrc:{*:1}                     // declaring maximum
248	   a=max-recv-ssrc:{*:4}                     // number of SSRCs

250	   [ICE Candidates]

252	   The answer to the offer above would have roughly the same structure
253	   and content.  The most important aspects here are:

255	   o  Preserves interoperability with most kinds of legacy or non-WebRTC
256	      endpoints.

258	   o  Allows the negotiation of most parameters that concern the media/
259	      RTP stack (typically the browser).

261	   o  Only a single Offer/Answer exchange is required for session
262	      establishment and, in most cases, for the entire duraftion of a
263	      session.

265	   o  Leaves complete freedom to applications as to the way that they
266	      are going to signal any other information such as SSRC
267	      identification information or the addition or removal of RTP
268	      streams.

270	3.1.  Interoperability with Legacy

272	   Interoperating with the "widely deployed legacy endpoints" is one of
273	   the main reasons for the RTCWEB working group to choose the SDP Offer
274	   /Answer model as basis for media negotiation.  It is hence important
275	   to clarify the compatibility claims that this specification makes.

277	   A "widely deployed legacy endpoint" is considered to have the
278	   following characteristics:

280	   o  Likely to use the SIP protocol.

282	   o  Capability to gracefully handle one audio and potentially one
283	      video m= line in an SDP Offer.

285	   o  Capability to render one SSRC per m=line at any given moment but
286	      multiple, consecutive SSRCs over a period of time.  This would be
287	      the case with transferred session replacements for example.  While
288	      the capability to handle multiple SSRCs simultaneously is not
289	      uncommon it cannot be relied upon and should first be confirmed by
290	      signalling.

292	   o  Possibly have features such as ICE, BUNDLE, RTCP-MUX, etc.  Just
293	      as likely not to.

295	   o  Very unlikely to announce in SDP the SSRCs that they intend to use
296	      for a given session.

298	   o  Exact set of features and capabilities: Guaranteed to be wildly
299	      and widely diverse.

301	   While it is relatively simple for RTCWEB to accommodate some of the
302	   above, it is obviously impossible to design a model that could simply
303	   be labeled as "compatible with legacy".  It is reasonable to assume
304	   that use cases involving use of such endpoints will be designed for a
305	   relatively specific set of devices and applications.  The role of the
306	   WebRTC framework is to hence provide a least-common-denominator model
307	   that can then be extended by applications.

309	   It is just as important not to make choices or assumptions that will
310	   render interoperability for some applications or topologies difficult
311	   or even impossible.

313	   This is exactly what the use of Offer/Answer discussed here strives
314	   to achieve.  Audio/Video offers originating from WebRTC endpoints
315	   will always have a maximum of one audio and one video m= line.  It
316	   will be up to applications to determine exactly how many streams they
317	   can afford to send once such a session has been established.  The
318	   exact mechanism to do this is outside the scope of this document (or
319	   WebRTC in general).

321	   Note that it is still possible for WebRTC endpoints to indicate
322	   support for a maximum number of incoming or outgoing streams for
323	   reasons such as processing constraints.  Use of the "max-send-ssrc"
324	   and "max-recv-ssrc" attributes [MAX-SSRC] could be one way of doing
325	   this, although that mechanism would need to be extended to provide
326	   ways of distinguishing between independent flows and complementary
327	   ones such as layered FEC and RTX.  Even with this in mind it is still
328	   important, not to rely on the presence of that indication in incoming
329	   descriptions as well as to provide applications with a way of
330	   retrieving such capabilities from the WebRTC stack (e.g.  the
331	   browser).

333	   Determining whether a peer has the ability to seamlessly switch from
334	   one SSRC to another is also left to application specific signalling.
335	   It is worth noting that protocols such as SIP for example, often
336	   accompany SSRC replacements with extra signalling (re-INVITEs with a
337	   "replaces" header) that can easily be reused by applications or
338	   mapped to something that they deem more convenient.

340	   For the sake of interoperability this specification strongly advises
341	   against the use of multiple m= lines for a single media type.  Not
342	   only would such use be meaningless to a large number of legacy
343	   endpoints but it is also likely to be mishandled by many of them and
344	   to cause unexpected behaviour.

346	   Finally, it is also worth pointing out that there is a significant
347	   number of feature rich non-WebRTC applications and devices that have
348	   relatively advanced, modern sets of capabilities.  Such endpoints
349	   hardly fit the "legacy" qualification.  Yet, as is often the case
350	   with novel and/or proprietary applications, they too have adopted
351	   diverse signalling mechanisms and the requirements described in this
352	   section fully apply when it comes to interoperating with them.

354	4.  Additional Session Control and Signalling

356	   o  Adding and removing RTP streams to an existing session.

358	   o  Accepting and refusing some of them.

360	   o  Identifying SSRCs and obtaining additional metadata for them (e.g.
361	      the user corresponding to a specific SSRC).

363	   All of the above semantics are best handled and hence should be left
364	   to applications.  There are numerous existing or emerging solutions,
365	   some of them developed by the IETF, that already cover this.  This
366	   includes CLUE channels [CLUE], the SIP Event Package For Conference
367	   State [RFC4575] and its XMPP variant [COIN].  Additional mechanisms,
368	   undoubtedly many based on JSON, are very likely to emerge in the
369	   future as WebRTC applications address varying use cases, scenarios
370	   and topologies.

372	   The most important part of this specification is hence to prevent
373	   certain assumptions or topologies from being imposed on applications.
374	   One example of this is the need to know and include in the Offer/
375	   Answer exchange, all the SSRCs that can show up in a session.  This
376	   can be particularly problematic for scenarios that involve non-WebRTC
377	   endpoints.

379	   Large scale conference calls, potentially federated through RTP
380	   translator-like bridges, would be another problematic scenario.

382	   Being able to always pre-announce SSRCs in such situations could of
383	   course be made to work but it would come at a price.  It would either
384	   require a very high number of Offer/Answer updates that propagate the
385	   information through the entire topology, or use of tricks such as
386	   pre-allocating a range of "fake" SSRCs, announcing them to
387	   participants and then overwriting the actual SSRCs with them.
388	   Depending on the scenario both options could prove inappropriate or
389	   inefficient while some applications may not even need such
390	   information.  Others could be retrieving it through simplistic means
391	   such as access to a centralized resource (e.g.  an URL pointing to a
392	   JSON description of the conference).

394	5.  Demultiplexing and Identifying Streams (Use of Bundle)

396	   This document assumes use of BUNDLE in WebRTC endpoints.  This
397	   implies that all RTP streams are likely to end up being received on
398	   the same port.  A demuxing mechanism is therefore necessary in order
399	   for these packets to then be fed into the appropriate processing
400	   chain (i.e.  matched to an m= line).

402	      Note: it is important to distinguish between the demultiplexing
403	      and the identification of incoming flows.  Throughout this
404	      specification the former is used to refer to the process of
405	      choosing selecting a depacketizing/decoding/processing chain to
406	      feed incoming packets to.  Such decisions depend solely on the
407	      format that is used to encode the content of incoming packets.

409	      The above is not to be confused with the process of making
410	      rendering decision about a processed flow.  Such decisions include
411	      showing a "current speaker" flow at a specific location, window or
412	      video tag, while choosing a different one for a second, "slides"
413	      flow.  Another example would be the possibility to attach "Alice",
414	      "Bob" and "Carol" labels on top of the appropriate UI components.
415	      This specification leaves such rendering choices entirely to
416	      application-specific signalling as described in Section 4.

418	   This specification uses demuxing based on RTP payload types.  When
419	   creating offers and answers WebRTC applications MUST therefore
420	   allocate RTP payload types only once per bundle group.  In cases
421	   where rtcp-mux is in use this would mean a maximum of 96 payload
422	   types per bundle [RFC5761].  It has been pointed out that some legacy
423	   devices may have unpredictable behaviour with payload types that are
424	   outside the 96-127 range reserved by [RFC3551] for dynamic use.  Some
425	   applications or implementations may therefore choose not to use
426	   values outside this range.  Whatever the reason, offerers that find
427	   they need more than the available payload type numbers, will simply
428	   need to either use a second bundle group or not use BUNDLE at all
429	   (which in the case of a single audio and a single video m= line
430	   amounts to roughly the same thing).  This would also imply building a
431	   dynamic table, mapping SSRCs to PTs and m= lines, in order to then
432	   also allow for RTCP demuxing.

434	   While not desirable, the implications of such a decision would be
435	   relatively limited.  Use of trickle ICE [TRICKLE-ICE] is going to
436	   lessen the impact on call establishment latency.  Also, the fact that
437	   this would only occur in a limited number of cases makes it unlikely
438	   to have a significant effect on port consumption.

440	   An additional requirement that has been expressed toward demuxing is
441	   the ability to assign incoming packets with the same payload type to
442	   different processing chains depending on their SSRCs.  A possible
443	   example for this is a scenario where two video streams are being
444	   rendered on different video screens that each have their own decoding
445	   hardware.

447	   While the above may appear as a demuxing and a decoding related
448	   problem it is really mostly a rendering policy specific to an
449	   application.  As such it should be handled by app.  specific
450	   signalling that could involve custom-formatted, per-SSRC information
451	   that accompanies SDP offers and answers.

453	6.  Simulcasting, FEC, Layering and RTX (Open Issue)

455	   From a WebRTC perspective, repair flows such as layering, FEC, RTX
456	   and to some extent simulcasting, present an interesting challenge,
457	   which is why they are considered an open issue by this specification.

459	   On the one hand they are transport utilities that need to be
460	   understood, supported and used by browsers in a way that is mostly
461	   transparent to applications.  On the other, some applications may
462	   need to be made aware of them and given the option to control their
463	   use.  This could be necessary in cases where their use needs to be
464	   signalled to non-WebRTC endpoints in an application specific way.
465	   Another example is the possibility for an application to choose to
466	   disable some or all repair flows because it has been made aware by
467	   application-specific signalling that they are temporarily not being
468	   used/rendered by the remote end (e.g.  because it is only displaying
469	   a thumbnail or because a corresponding video tag is not currently
470	   visible).

472	   One way of handling such flows would be to advertise them in the way
473	   suggested by [RFC5956] and to then control them through application
474	   specific signalling.  This options has the merit of already existing
475	   but it also implies the pre-announcement and propagation of SSRCs and
476	   the bloated signalling that this incurs.  Also, relying solely on
477	   Offer/Answer here would expose an offerer to the typical race
478	   condition of repair SSRCs arriving before the answer and the
479	   processing ambiguity that this would imply.

481	   Another approach could be a combination of RTCP and RTP header
482	   extensions [RFC5285] in a way similar to the one employed by the
483	   Rapid Synchronisation of RTP Flows [RFC6051].  While such a mechanism
484	   is not currently defined by the IETF, specifying it could be
485	   relatively straightforward:

487	   Every packet belonging to a repair flow could carry an RTP header
488	   extension [RFC5285] that points to the source stream (or source layer
489	   in case of layered mechanisms).  The following shows one possible way
490	   of signalling this:

492	   a=extmap:3 urn:ietf:params:rtp-hdrext:fec-source-ssrc

494	   In this case the actual RTP packet and header extension could look
495	   like this:

497	     0                   1                   2                   3
498	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
499	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
500	    |V=2|P|1|  CC   |M|     PT      |       sequence number         |
501	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+R
502	    |                           timestamp                           |T
503	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+P
504	    |           synchronisation source (SSRC) identifier            |
505	    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
506	    |       0xBE    |    0xDE       |           length=3            |
507	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+E
508	    |  ID-3 | L=3   |          SSRC of the source RTP flow   ...    |x
509	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+t
510	    |   ... SSRC    |    0 (pad)    |    0 (pad)    |    0 (pad)    |n
511	    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
512	    |                         payload data                          |
513	    |                             ....                              |
514	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

516	   Note that the above is just a stub.  It's an example that's meant to
517	   show one possible solution with some mechanisms (e.g.  1-D
518	   Interleaved Parity [RFC6015]).  Other mechanisms may and probably
519	   will require different extensions or signalling ([SRCNAME] will
520	   likely be an option for some).  In some cases, where layering
521	   information is provided by the codec, an extensions is not going to
522	   be necessary at all.

524	   In cases where FEC or simulcast relations are not immediately needed
525	   by the recipient, the above information could also be delayed until
526	   the reception of the first RTCP packet.

528	7.  WebRTC API Requirements

530	   One of the main characteristics of this specification is the use of
531	   SDP for transport channel setup and media stack initialisation only.
532	   In order for applications to be able to cover everything else it is
533	   important that WebRTC APIs actually allow for it.  Given the initial
534	   directions taken by early implementations and specification work,
535	   this is currently almost but not entirely possible.

537	   The following is a list of requirements that the WebRTC APIs would
538	   need to satisfy in order for this specification to be usable.  (Note:
539	   some of the items are already possible and are only included for the
540	   sake of completeness.)

542	   1.  Expose the SSRCs of all local MediaStreamTrack-s that the
543	       application may want to attach to a PeerConnection.

545	   2.  Expose the SSRCs of all remote MediaStreamTrack-s that are
546	       received on a PeerConnection

548	   3.  Expose to applications all locally generated repair flows that
549	       exist for a source (e.g.  FEC and RTX flows that will be
550	       generated for a webcam) their types relations and SSRCs.

552	   4.  Expose information about the maximum number of incoming streams
553	       that can be decoded and rendered.

555	   5.  Applications should be able to pause and resume (disable and
556	       enable) any MediaStreamTrack.  This should also include the
557	       possibility to do so for specific repair flows.

559	   6.  Information about how certain MediaStreamTrack-s relate to each
560	       other (e.g.  a given audio flow is related to a specific video
561	       flow) may be exchanged by applications after media has started
562	       arriving.  At that point the corresponding MediaStreamTrack-s may
563	       have been announced to the application within independent
564	       MediaStream-s.  It should therefore be possible for applications
565	       to join such tracks within a single MediaStream.

567	8.  IANA Considerations

569	   None.

571	9.  Informative References

573	   [CLUE]     Duckworth, M., Pepperell, A., and S. Wenger, "Framework
574	              for Telepresence Multi-Streams", reference.I-D.ietf-clue-
575	              framework (work in progress), May 2013, <reference.I-D
576	              .ietf-clue-framework>.

578	   [COIN]     Ivov, E. and E. Marocco, "XEP-0298: Delivering Conference
579	              Information to Jingle Participants (Coin)", XSF XEP 0298,
580	              June 2011, <reference.I-D.ietf-coin-framework>.

582	   [MAX-SSRC]
583	              Westerlund, M., Burman, B., and F. Jansson, "Multiple
584	              Synchronization sources (SSRC) in RTP Session Signaling ",
585	              reference.I-D.westerlund-avtcore-max-ssrc (work in
586	              progress), July 2012, <reference.I-D.westerlund-avtcore-
587	              max-ssrc>.

589	   [MSID]     Alvestrand, H., "Cross Session Stream Identification in
590	              the Session Description Protocol", reference.I-D.ietf-
591	              mmusic-msid (work in progress), February 2013,
592	              <reference.I-D.ietf-mmusic-msid>.

594	   [PlanA]    Roach, A. B. and M. Thomson, "Using SDP with Large Numbers
595	              of Media Flows", reference.I-D.roach-rtcweb-plan-a (work
596	              in progress), May 2013, <reference.I-D.roach-rtcweb-
597	              plan-a>.

599	   [PlanB]    Uberti, J., "Plan B: a proposal for signaling multiple
600	              media sources in WebRTC.", reference.I-D.uberti-rtcweb-
601	              plan (work in progress), May 2013, <reference.I-D.uberti-
602	              rtcweb-plan>.

604	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
605	              with Session Description Protocol (SDP)", RFC 3264, June
606	              2002.

608	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
609	              Video Conferences with Minimal Control", STD 65, RFC 3551,
610	              July 2003.

612	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
613	              Initiation Protocol (SIP) Event Package for Conference
614	              State", RFC 4575, August 2006.

616	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
617	              Header Extensions", RFC 5285, July 2008.

619	   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
620	              Control Packets on a Single Port", RFC 5761, April 2010.

622	   [RFC5956]  Begen, A., "Forward Error Correction Grouping Semantics in
623	              the Session Description Protocol", RFC 5956, September
624	              2010.

626	   [RFC6015]  Begen, A., "RTP Payload Format for 1-D Interleaved Parity
627	              Forward Error Correction (FEC)", RFC 6015, October 2010.

629	   [RFC6051]  Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
630	              Flows", RFC 6051, November 2010.

632	   [ROACH-GLARELESS-ADD]
633	              Roach, A. B., "An Approach for Adding RTCWEB Media Streams
634	              without Glare", reference.I-D.roach-rtcweb-glareless-add
635	              (work in progress), May 2013, <reference.I-D.roach-rtcweb-
636	              glareless-add>.

638	   [SRCNAME]  Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES
639	              Item SRCNAME to Label Individual Sources ", reference.I-D
640	              .westerlund-avtext-rtcp-sdes-srcname (work in progress),
641	              October 2012, <reference.I-D.westerlund-avtext-rtcp-sdes-
642	              srcname>.

644	   [TRICKLE-ICE]
645	              Ivov, E., Rescorla, E.K., and J. Uberti, "Trickle ICE:
646	              Incremental Provisioning of Candidates for the Interactive
647	              Connectivity Establishment (ICE) Protocol ", reference.I-D
648	              .ivov-mmusic-trickle-ice (work in progress), March 2013,
649	              <reference.I-D.ivov-mmusic-trickle-ice>.

651	Appendix A.  Acknowledgements

653	   Many thanks to Enrico Marocco, Bernard Aboba and Peter Thatcher for
654	   reviewing this document and providing numerous comments and
655	   substantial input.

657	Author's Address
658	   Emil Ivov
659	   Jitsi
660	   Strasbourg  67000
661	   France

663	   Phone: +33-177-624-330
664	   Email: emcho@jitsi.org