idnits 2.17.1 

draft-jennings-rtcweb-plan-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii)
     Publication Limitation clause.  If this document is intended for
     submission to the IESG for publication, this constitutes an error.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC4566], [RFC3550]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == There are 13 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.

  == There are 19 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHOULD not' in this paragraph:
     
     The reoccurring theme of this draft is that SDP [RFC4566] already
     has a way of solving many of the problems being discussed at the RTCWeb
     WG and we SHOULD not try to invent something new but rather re-use the
     existing methods for describing RTP [RFC3550] media flows.

  -- The document date (February 25, 2013) is 4079 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 4583
     (Obsoleted by RFC 8856)

  -- Obsolete informational reference (is this intentional?): RFC 4756
     (Obsoleted by RFC 5956)

  -- Obsolete informational reference (is this intentional?): RFC 5245
     (Obsoleted by RFC 8445, RFC 8839)


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        C. Jennings
3	Internet-Draft                                                     Cisco
4	Intended status: Informational                         February 25, 2013
5	Expires: August 29, 2013

7	                 Proposed Plan for Usage of SDP and RTP
8	                     draft-jennings-rtcweb-plan-01

10	Abstract

12	   This draft outlines a bunch of the remaining issues in RTCWeb related
13	   to how the the W3C APIs map to various usages of RTP and the
14	   associated SDP.  It proposes one possible solution to that problem
15	   and outlines several chunks of work that would need to be put into
16	   other drafts or result in new drafts being written.  The underlying
17	   design guideline is to, as much as possible, re-use what is already
18	   defined in existing SDP [RFC4566] and RTP [RFC3550] specifications.

20	   This draft is not intended to become an specification but is meant
21	   for working group discussion to help build the specifications.  It is
22	   being discussed on the rtcweb@ietf.org mailing list though it has
23	   topics relating to the CLUE WG, MMUSIC WG, AVT* WG, and WebRTC WG at
24	   W3C.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.  This document may not be modified,
30	   and derivative works of it may not be created, and it may not be
31	   published except as an Internet-Draft.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at http://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on August 29, 2013.

45	Copyright Notice

47	   Copyright (c) 2013 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
63	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
64	   3.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  7
65	   4.  Background/Solution Overview . . . . . . . . . . . . . . . . .  9
66	   5.  Overall Design . . . . . . . . . . . . . . . . . . . . . . . . 11
67	   6.  Example Mappings . . . . . . . . . . . . . . . . . . . . . . . 12
68	     6.1.   One Audio, One Video, No bundle/multiplexing  . . . . . . 12
69	     6.2.   One Audio, One Video, Bundle/multiplexing . . . . . . . . 12
70	     6.3.   One Audio, One Video, Simulcast, Bundle/multiplexing  . . 12
71	     6.4.   One Audio, One Video, Bundle/multiplexing, Lip-Sync . . . 12
72	     6.5.   One Audio, One Active Video, 5 Thumbnails,
73	            Bundle/multiplexing . . . . . . . . . . . . . . . . . . . 12
74	     6.6.   One Audio, One Active Video,  5 Thumbnails, Main
75	            Speaker Lip-Sync, Bundle/multiplexing . . . . . . . . . . 13
76	   7.  Solutions  . . . . . . . . . . . . . . . . . . . . . . . . . . 14
77	     7.1.   Correlation and Multiplexing  . . . . . . . . . . . . . . 15
78	     7.2.   Multiple Render . . . . . . . . . . . . . . . . . . . . . 18
79	       7.2.1.  Complex Multi Render Example . . . . . . . . . . . . . 18
80	     7.3.   Dirty Little Secrets  . . . . . . . . . . . . . . . . . . 22
81	     7.4.   Open Issues . . . . . . . . . . . . . . . . . . . . . . . 22
82	     7.5.   Confusions  . . . . . . . . . . . . . . . . . . . . . . . 22
83	   8.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
84	   9.  Tasks  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
85	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 28
86	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 29
87	   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 30
88	   13. Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . . 31
89	   14. Existing SDP . . . . . . . . . . . . . . . . . . . . . . . . . 32
90	     14.1.  Multiple Encodings  . . . . . . . . . . . . . . . . . . . 32
91	     14.2.  Forward Error Correction  . . . . . . . . . . . . . . . . 33
92	     14.3.  Same Video Codec With Different Settings  . . . . . . . . 33
93	     14.4.  Different Video Codecs With Different Resolutions
94	            Formats . . . . . . . . . . . . . . . . . . . . . . . . . 34
95	     14.5.  Lip Sync Group  . . . . . . . . . . . . . . . . . . . . . 34
96	     14.6.  BFCP  . . . . . . . . . . . . . . . . . . . . . . . . . . 34
97	     14.7.  Retransmission  . . . . . . . . . . . . . . . . . . . . . 35
98	     14.8.  Layered coding dependency . . . . . . . . . . . . . . . . 37
99	     14.9.  SSRC Signaling  . . . . . . . . . . . . . . . . . . . . . 38
100	     14.10. Content Signaling . . . . . . . . . . . . . . . . . . . . 39
101	   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
102	     15.1.  Normative References  . . . . . . . . . . . . . . . . . . 40
103	     15.2.  Informative References  . . . . . . . . . . . . . . . . . 40
104	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 42

106	1.  Overview

108	   The reoccurring theme of this draft is that SDP [RFC4566] already has
109	   a way of solving many of the problems being discussed at the RTCWeb
110	   WG and we SHOULD not try to invent something new but rather re-use
111	   the existing methods for describing RTP [RFC3550] media flows.

113	   The general theory is that, roughly speaking, the m-line corresponds
114	   to flow of packets that can be handled by the application in the same
115	   way.  This often results in more m-lines than there are media sources
116	   such as microphones or cameras.  Forward Error Correction (FEC) is
117	   done with multiple M-lines as shown in [RFC4756].  Retransmission
118	   (RTX) is done with multiple m-lines as shown in [RFC4588].  Layered
119	   coding is done with multiple m-lines as shown in [RFC5583].
120	   Simulcast, which is really just multiple video stream from the same
121	   camera, much like layered coding but with no inter m-line dependency,
122	   is done with multiple m-lines modeled after the Layered coding
123	   defined in in [RFC5583].

125	   The significant addition to SDP semantics is an multi-render media
126	   level attribute that allows a device to indicate that it makes sense
127	   to simultaneously use multiple stream of video that will be
128	   simultaneously displayed but share the same SDP characteristics and
129	   semantics such that they can all be negotiated under a single m-line.
130	   When using features like RTX, FEC, and Simulcast in a multi-render
131	   situation, there needs to be a way to correlate a given related media
132	   flow with the correct "base" media-flow.  This is accomplished by
133	   having the related flows carry, in the CSRC, the SSRC of their base
134	   flow.  An example SDP might look like as provided in the example
135	   Section 7.2.1.

137	   This draft also propose that advanced usages, including WebRTC to
138	   WebRTC scenarios, uses a Media Stream Identifier (MSID) that is
139	   signaled in SDP and also attempts to negotiate the usage of a RTP
140	   header extension to include the MSID in the RTP packet.  This
141	   resolves many long term issues.

143	   This does results in lots of m lines but all the alternatives designs
144	   resulted in an roughly equivalent number of SSRC lines with a
145	   possibility of redefining most of the media level attributes.  So
146	   it's really hard to see the big benefits defining something new over
147	   what we have.  One of the concerns about this approach is the time to
148	   collect all the ICE candidates needed for the initial offer.
149	   Section 7.2.1 provides mitigations to reduce the number of ports
150	   needed to be the same as an alternative SSRC based design.  This
151	   assumes that it is perfectly feasible to transport SDP that much
152	   larger than a single MTU.  The SIP [RFC3261] usage of SDP has
153	   successfully passed over this long ago.  In the cases where the SDP
154	   is passed over web mechanisms, it is easy to use compression and the
155	   size of SDP is more of an optimization criteria than a limiting
156	   issue.

158	2.  Terminology

160	   The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT",
161	   "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
162	   interpreted as described in [RFC2119].

164	   This draft uses the API and terminology described in [webrtc-api].

166	   Transport-Flow: An transport 5 Tuple representing the UDP source and
167	   destination IP address and port over which RTP is flowing.

169	   5-tuple: A collection of the following values: source IP address,
170	   source transport port, destination IP address, destination transport
171	   port and transport protocol.

173	   PC-Track: A source of media (audio and/or video) that is contained in
174	   a PC-Stream.  A PC-Track represents content comprising one or more
175	   PC-Channels.

177	   PC-Stream: Represents stream of data of audio and/or video added to a
178	   Peer Connection by local or remote media source(s).  A PC-Stream is
179	   made up of zero or more PC-Tracks.

181	   m-line: An SDP [RFC4566] media description identifier that starts
182	   with "m=" field and conveys following values:media type,transport
183	   port,transport protocol and media format descriptions.

185	   m-block: An SDP [RFC4566] media description that starts with an
186	   m-line and is terminated by either the next m-line or by the end of
187	   the session description.

189	   Offer: An [RFC3264] SDP message generated by the participant who
190	   wishes to initiate a multimedia communication session.  An Offer
191	   describes participants capabilities for engaging in a multimedia
192	   session.

194	   Answer: An [RFC3264] SDP message generated by the participant in
195	   response to an Offer.  An Answer describes participants capabilities
196	   in continuing with the multimedia session with in the constraints of
197	   the Offer.

199	   This draft avoids using terms that implementors do not have a clear
200	   idea of exactly what they are - for example RTP Session.

202	3.  Requirements

204	   The requirements listed here are a collection of requirements that
205	   have come from WebRTC, CLUE, and the general community that uses RTP
206	   for interactive communications based on Offer/Answer.  It does not
207	   try to meet the needs of streaming usages or usages involving
208	   multicast.  This list also does not try to list every possible
209	   requirement but instead outlines the ones that might influence the
210	   design.

212	   o  Devices with multiple audio and/or video sources

214	   o  Devices that display multiple streams of video and/or render
215	      multiple streams of audio

217	   o  Simulcast, wherein a video from a single camera is sent in a few
218	      independent video streams typically at different resolutions and
219	      frame rates.

221	   o  Layered Codec such as H.264 SVC

223	   o  One way media flows and bi-directional media flows

225	   o  Support asymmetry, i.e. to send a different number of type of
226	      media streams that you receive.

228	   o  Mapping W3C PeerConnection (PC) aspects into SDP and RTP.  It is
229	      important that the SDP be descriptive enough that both sides can
230	      get the same view of various identifiers for PC-Tracks, PC-Streams
231	      and their relationships.

233	   o  Support of Interactive Connectivity Establishment (ICE) [RFC5245]

235	   o  Support of multiplexing multiple media flows, possible of
236	      different media types, on same 5-tuple.

238	   o  Synchronization - It needs to be clear how implementations deal
239	      with synchronization, in particular usages of both CNAME and LS
240	      group.  The sender needs be able to indicate which Media Flows are
241	      intended to be synchronized and which are not.

243	   o  Redundant codings - The ability to send some media, such as the
244	      audio from a microphone, multiple times.  For example it may be
245	      sent with a high quality wideband codec and a low bandwidth codec.
246	      If packets are lost from the high bandwidth steam, the low
247	      bandwidth stream can be used to fill in the missing gaps of audio.
248	      This is very similar to simulcast.

250	   o  Forward Error Correction - Support for various RTP FEC schemes.

252	   o  RSVP QoS - Ability to signal various QoS mechanism such Single
253	      Reservation Flow (SRF) group

255	   o  Desegregated Media (FID group) - There is a growing desire to deal
256	      with endpoints that are distributed - for example a video phone
257	      where the incoming video is displayed on the an IP TV but the
258	      outgoing video comes from a tablet computer.  This results in
259	      situations where the SDP sets up a session with not all the media
260	      transmitted to a single IP address.

262	   o  In flight change of codec: Support for system that can negotiate
263	      the uses of more than one codec for a given media flow and then
264	      the sender can arbitrarily switch between them when they are
265	      sending but they only send with one codec as at time.

267	   o  Distinguish simulcast (e.g. multiple encoding of same source) from
268	      multiple different sources

270	   o  Support for Sequential and Parallel forking at the SIP level

272	   o  Support for Early Media

274	   o  Conferencing environments with Transcoding MCU that decodes/mixes/
275	      recodes the media

277	   o  Conferencing environments with Switching MCU where the MCU mucks
278	      the header information of the media and do not decode/recode all
279	      the media

281	4.  Background/Solution Overview

283	   The basic unit of media description in SDP is the m-line/m-block.
284	   This allows any entity defined by a single m-block to be individually
285	   negotiated.  This negotiation applies not only to individual sources
286	   (e.g., cameras) but also to individual components that come from a
287	   single source, such as layers in SVC.

289	   For example, consider negotiation of FEC as defined in [RFC4756].

291	   Offer

293	          v=0
294	          o=adam 289083124 289083124 IN IP4 host.example.com
295	          s=ULP FEC Seminar
296	          t=0 0
297	          c=IN IP4 192.0.2.0
298	          a=group:FEC 1 2
299	          a=group:FEC 3 4

301	          m=audio 30000 RTP/AVP 0
302	          a=mid:1

304	          m=audio 30002 RTP/AVP 100
305	          a=rtpmap:100 ulpfec/8000
306	          a=mid:2

308	          m=video 30004 RTP/AVP 31
309	          a=mid:3

311	          m=video 30004 RTP/AVP 101
312	          c=IN IP4 192.0.2.1
313	          a=rtpmap:101 ulpfec/8000
314	          a=mid:4

316	   When FEC is expressed this way, the answerer can selectively accept
317	   or reject the various streams by setting the port in the m-line to
318	   zero.  RTX [RFC4588], layered coding [RFC5583], and Simulcast are all
319	   handled the same way.  Note that while it is also possible to
320	   represent FEC and SVC using source-specific attributes [RFC5576],
321	   that mechanism is less flexible because it does not permit selective
322	   acceptance and rejection as described in [RFC 5576; Section 8].  Most
323	   deployed systems which implement FEC, layered coding, etc. do so with
324	   each component on a separate m-line.

326	   Unfortunately, this strategy runs into problems when combined with
327	   two new features that are desired for WebRTC:

329	   m-line multiplexing (bundle):

331	      The ability to send media described in multiple media over the
332	      same 5-tuple.

334	   multi-render:

336	      The ability to have large numbers of multiple similar media flows
337	      (e.g., multiple cameras).  The paradigmatic case here is multiple
338	      video thumbnails.

340	   Obviously, this strategy does not scale to large numbers For
341	   instance, consider the case where we want to be able to transmit 35
342	   video thumbnails (this is large, but not insane).  In the model
343	   described above, each of these flows would need its own m-line and
344	   its own set of codecs.  If each side supports three separate codecs
345	   (e.g., H.261, H.263, and VP8), then we have just consumed 105 payload
346	   types, which exceeds the available dynamic payload space.

348	   In order to resolve this issue, it is necessary to have multiple
349	   flows (e.g., multiple thumbnails) indicated by the same m-line and
350	   using the same set of payload types (see Section XXX for proposed
351	   syntax for this.)  Because each source has its own SSRC, it is
352	   possible to divide the RTP packets into individual flows.  However,
353	   this solution still leaves us with two problems:

355	   o  How to individually address specific RTP flows in order to, for
356	      instance, order them on a page or display flow-specific captions.

358	   o  How to determine the relationship between multiple variants of the
359	      same stream.  For instance, if we have multiple cameras each of
360	      which is present in a layered encoding, we need to be able to
361	      determine which layers go together.

363	   For reasons described in Section 5, the SSRC learned visa SDP is not
364	   suitable for individually addressing RTP flows.  Instead, we
365	   introduce a new identifier, the MSID, which can be carried both in
366	   the SDP and the RTP and therefore can be used to correlate SDP
367	   elements to RTP elements.  See Section 7.1

369	   By contrast, we can use RTP-only mechanisms to express the
370	   correlation between RTP flows: while all the flows associated with a
371	   given camera have distinct SSIDs, we can use the CSRC to indicate
372	   which flows belong together.  This is described in Section 7.2

374	5.  Overall Design

376	   The basic unit of media description in SDP is the m-line/m-block and
377	   this document continues with that assumption.  In general, different
378	   cameras, microphones, etc. are carried on different m-lines.  The
379	   exceptions to this rule is when using the multi-render extension in
380	   which case:

382	   o  Multiple sources which are semantically equivalent and multiplexed
383	      on a time-wise basis.  For instance, if an MCU mixes multiple
384	      camera feeds but only some subset is displayed at a time, they can
385	      all appear on the same m-line.

387	   By contrast, multiple sources which are semantically distinct cannot
388	   appear on the same m-line because that does not allow for clear
389	   negotiation of which sources are acceptable, or which sets of RTP
390	   SSRCs correspond to which flow.

392	   The second basic assumption is that SSRCs cannot always be safely
393	   used to associate RTP flows with information in the SDP.  There are
394	   two reasons for this.  First, in an offer/answer setting, RTP can
395	   appear at the offerer before the answer is received; if SSRC
396	   information from the offerer is required, then these RTP packets
397	   cannot be interpreted.  The second reason is that RTP permits SSRCs
398	   to be changed at any time.

400	   This assumption makes clear why the two exceptions to the "one flow
401	   per m-line" rule work.  In the case of time-based multiplexing (multi
402	   render) of camera sources, all the cameras are equivalent from the
403	   receiver's perspective; he merely needs to know which ones to display
404	   now and he does that based on which ones have been most recently
405	   received.  In the case of multiple versions of the same content,
406	   payload types or payload types plus SSRC can be used to distinguish
407	   the different versions.

409	6.  Example Mappings

411	   This section shows a number of sample mappings in abstract form.

413	6.1.  One Audio, One Video, No bundle/multiplexing

415	   Microphone -->    m=audio    -->    Speaker      >  5-Tuple

417	   Camera     -->    m=video    -->    Window       >  5-Tuple

419	6.2.  One Audio, One Video, Bundle/multiplexing

421	   Microphone -->    m=audio    -->    Speaker       \
422	                                                      > 5-Tuple
423	   Camera     -->    m=video    -->    Window       /

425	6.3.  One Audio, One Video, Simulcast, Bundle/multiplexing

427	   Microphone -->    m=audio    -->    Speaker       \
428	                                                      |
429	   Camera     +->    m=video    -\                     > 5-Tuple
430	              |                   ?->  Window         |
431	              +->    m=video    -/                   /

433	6.4.  One Audio, One Video, Bundle/multiplexing, Lip-Sync

435	   Microphone -->    m=audio    -->     Speaker     \
436	                                                     > 5-Tuple, Lip-Sync
437	   Camera     -->    m=video    -->     Window      /           group

439	6.5.  One Audio, One Active Video, 5 Thumbnails, Bundle/multiplexing

441	   Microphone -->    m=audio    -->       Speaker          \
442	                                                            |
443	   Camera     -->    m=video    -->       Window            |
444	                                                             >  5-Tuple
445	   Camera     -->    m=video    -->       5 Small Windows   |
446	   Camera            a=multi-render:5                    |
447	   ...                                                      /

449	   Note that in this case the payload types must be distinct between the
450	   two video m-lines, because that is what is used to demultiplex.

452	6.6.  One Audio, One Active Video,  5 Thumbnails, Main Speaker Lip-Sync,
453	      Bundle/multiplexing

455	   Microphone -->    m=audio    -->       Speaker        \   \  Lip-sync
456	                                                          |   > group
457	   Camera     -->    m=video    -->       Window          |  /
458	                                                           >  5-Tuple
459	   Camera     -->    m=video    -->       5 Small Windows |
460	   Camera            a=multi-render:5                  |
461	   ...                                                   /

463	7.  Solutions

465	   This section outlines a set of rules for the usage of SDP and RTP
466	   that seems to deal with the various problems and issues that have
467	   been discussed.  Most of these are not new and are pretty much how
468	   many systems do it today.  Some of them are new, but all the items
469	   requiring new standardization work are called out in the Section 9.

471	   Approach:

473	   1.   If a system wants to offer to send two sources, such as two
474	        camera, it MUST use a separate m-block for each source.  The
475	        means that each PC-Track corresponds to one or more m-blocks.

477	   2.   In cases such as FEC, simulcast, SVC, each repair stream, layer,
478	        or simulcast media flow will get an m-block per media flow.

480	   3.   If a systems wants to receive two streams of video to display in
481	        two different windows or screens, it MUST use separate m-blocks
482	        for each unless explicitly signaled to be otherwise (see
483	        Section 7.2).

485	   4.   Unless explicitly signaled otherwise (see Section 7.2), if a
486	        given m-line receives media from multiple SSRCs, only media from
487	        the most recently received SSRC SHOULD be rendered and other
488	        SSRC SHOULD NOT and if it is video it SHOULD be rendered in the
489	        same window or screen.

491	   5.   If a camera is sending simulcast video and three resolutions,
492	        each resolution MUST get its own m-block and all the three
493	        m-blocks will be grouped.  A new SDP group will be defined for
494	        this.

496	   6.   If a camera is using a layered codec with three layers, there
497	        MUST be an m-block for each, and they will be grouped using
498	        standard SDP for grouping layers.

500	   7.   To aid in synchronized playback, there is exactly one, and only
501	        one, LS group for each PC-Stream.  All the m-blocks for all the
502	        PC-Tracks in a given PC-Stream are synchronized so they are all
503	        put in one LS group.  All the PC-Tracks in a given PC-Stream
504	        have the same CNAME.  If a PC-Track appears in more than one PC-
505	        Stream, then all the PC-Streams with that PC-Track MUST have the
506	        same CNAME.

508	   8.   One way media MUST use the sendonly or recvonly attributes.

510	   9.   Media lines that are not currently in use but may be used later,
511	        so that the resources need to be kept allocated, SHOULD use the
512	        inactive attribute.

514	   10.  If an m-line will not be used, or it is rejected, it MUST have
515	        its port set to zero.

517	   11.  If a video switching MCU produces a virtual "active speaker"
518	        media flow, that media flow should have its own SSRC but include
519	        the SSRC of the current speaker's video in the CSRC packets it
520	        produces.

522	   12.  For each PC-Track, the W3C API MUST provide a way to set and
523	        read the CSRC list, set and read the content RFC 4574 "label",
524	        and read the SSRC of last packet received on a PC-Track.

526	   13.  The W3C API should have a constraint or API method to allow a
527	        PC-Stream to indicate the number of multi-render video streams
528	        it can accept.  Each time a new stream is received up to the
529	        maximum, a new PC-Track will be created.

531	   14.  Applications MAY signal all the SSRC they intend to send using
532	        RFC 5576, but receivers need to be careful in their usage of the
533	        SSRC in signaling, as the SSRC can change when there is a
534	        collision and it takes time before that will be updated in
535	        signaling.

537	   15.  Applications can get out of band "roster information" that maps
538	        the names of various speakers or other information to the MSID
539	        and/or SSRCs that a user is using

541	   16.  Applications MAY use RFC 4574 content labels to indicate the
542	        purpose of the video.  The additional content types, main-left
543	        and main-right, need to be added to support two- and three-
544	        screen systems.

546	   17.  The CLUE WG might want to consider SDP to signal the 3D location
547	        and field of view parameters for captures and renderers.

549	   18.  The W3C API allows a "label" to be set for the PC-Track.  This
550	        MUST be mapped to the SDP label attribute.

552	7.1.  Correlation and Multiplexing

554	   The port number that RTP is received on provides the primary
555	   mechanism for correlating it to the correct m-line.  However, when
556	   the port does not uniquely male the RTP packet to the correct m-block
557	   (such as in multiplexing and other cases), the next thing that can be
558	   looked at is the PT number.  Finally there are cases where SSRC can
559	   be used if that was signaled.

561	   There are some complications when using SSRC for correlation with
562	   signaling.  First, the offerer may end up receiving RTP packets
563	   before receiving the signaling with the SSRC correlation information.
564	   This is because the sender of the RTP chooses the SSRC; there is no
565	   way for the receiver to signal how some of the bits in the SSRC
566	   should be set.  Numerous attempts to provide a way to do this have
567	   been made, but they have all been rejected for various reasons, so
568	   this situation is unlikely to change.  The second issue is that the
569	   signaled SSRC can change, particularly in collision cases, and there
570	   is no good way to know when SSRC are changing, such that the
571	   currently signaled SSRC usage maps to the actual RTP SSRC usage.
572	   Finally SSRC does not always provide correlation information between
573	   media flows - take for example trying to look at SSRC to tell that an
574	   audio media flow and video media flow came from the same camera.  The
575	   nice thing about SSRC is that they are also included in the RTP.

577	   The proposal here is to extend the MSID draft to meet these needs:
578	   each media flow would have a unique MSID and the MSID would have some
579	   level of internal structure, which would allow various forms of
580	   correlation, including what WebRTC needs to be able to recreate the
581	   MS-Stream / MS-Track hierarchy to be the same on both sides.  In
582	   addition, this work proposes creating an optional RTP header
583	   extension that could be used to carry the MSID for a media flow in
584	   the RTP packets.  This is not absolutely needed for the WebRTC use
585	   cases but it helps in the case where media arrives before signaling
586	   and it helps resolve a broader category of web conferencing use
587	   cases.

589	   The MSID consists of three things and can be extended to have more.
590	   It has a device identifier, which corresponds to a unique identifier
591	   of the device that created the offer; one or more synchronization
592	   context identifiers, which is a number that helps correlate different
593	   synchronized media flows; and a media flow identifier.  The
594	   synchronization identifier and flow identifier are scoped within the
595	   context of the device identifier, but the device identifier is
596	   globally unique.  The suggested device identifier is a 64-bit random
597	   number.  The synchronization group is an integer that is the same for
598	   all media flows that have this device identifier and are meant to be
599	   synchronized.  Right now there can be more than one synchronization
600	   identifier, but the open issues suggest that one would be preferable.
601	   The flow identifier is an integer that uniquely identifies this media
602	   flow within the context of the device identifier.

604	   Open Issues: how to know if the MSID RTP Header Extension should be
605	   included in the RTP?
606	   An example MSID for a device identifier of 12345123451234512345,
607	   synchronization group of 1, and a media flow id of 3 would be:

609	      a=msid:12345123451234512345 s:1 f:3

611	   When the MSID is used in an answer, the MSID also has the remote
612	   device identifier included.  In the case where the device ID of the
613	   device sending the answer was 22222333334444455555, the MSID would
614	   look like:

616	      a=msid:22222333334444455555 s:1 f:3 r:12345123451234512345

618	   Note: The 64 bit size for the device identifier was chosen as it
619	   allows less than a one in a million chance of collision with greater
620	   than 10,000 flows (actually it allows this probability with more like
621	   6 million flows).  Much smaller numbers could be used but 32 bits is
622	   probably too small.  More discussion on the size of this and the
623	   color of the bike shed is needed.

625	   When used in the WebRTC context, each PeerConnection should generate
626	   a unique device identifier.  Each PC-Stream in the PeerConnection
627	   will get a a unique synchronization group identifier, and each PC-
628	   Track in the Peer Connection will get a unique flow identifier.
629	   Together these will be used to form the MSID.  The MSID MUST be
630	   included in the SDP offer or answer so that the WebRTC connection on
631	   the remote side can form the correct structure of remote PC-Streams
632	   and PC-Tracks.  If a WebRTC client receives an Offer with no MSID
633	   information and no LS group information, it MUST put all the remote
634	   PC-Tracks into a single PC-Stream.  If there is LS group information
635	   but no MSID, a PC-Stream for each LS group MUST be created and the
636	   PC-Tracks put in the appropriate PC-Stream.

638	   The W3C specs should be updated to have the ID attribute of the MS-
639	   Stream be the MSID with no flow identifier, and the ID attribute of
640	   the MS-Track be the MSID.

642	   In addition, the SDP will attempt to negotiate sending the MSID in
643	   the RTP using a RTP Header Extension.  WebRTC clients SHOULD also
644	   include the a=ssrc attributes if they know which SSRC they plan to
645	   send but they can not rely on this not changing, being compete, or
646	   existing in all offers or answers they receive - particularly when
647	   working with SIP endpoints.

649	   When using multiplexing, the SDP MUST be distinct enough where the
650	   combination of payload type number and SSRC allows for unique
651	   demultiplexing of all the media on the same transport flow without
652	   use of MSID though the MSID can help in several use cases.

654	7.2.  Multiple Render

656	   There are cases - such as a grid of security cameras or thumbnails in
657	   a video conference - where a receiver is willing to receive and
658	   display several media flows of video.  The proposal here is to create
659	   a new media level attribute called multi-render that includes an
660	   integer that indicates how many streams can be rendered at the same
661	   time.

663	   As an example of a m-block, a system that could display 16 thumbnails
664	   at the same time and was willing to receive H261 or H264 might offer

666	   Offer

668	    m=video 52886 RTP/AVP 98 99
669	    a=multi-render:16
670	    a=rtpmap:98 H261/90000
671	    a=rtpmap:99 H264/90000
672	    a=fmtp:99 profile-level-id=4de00a;
673	           packetization-mode=0; mst-mode=NI-T;
674	           sprop-parameter-sets={sps0},{pps0};

676	   When combining this multi-render feature with multiplexing, the
677	   answer will might not know all the SSRCs that will be send to this
678	   m-block so it is best to use payload type (PT) numbers that are
679	   unique for the SDP: the demultiplexing may have to only use the PT if
680	   the SSRCs are unknown.

682	   The intention is that the most recently sent SSRC are the ones that
683	   are rendered.  Some switching MCU will likely only send the correct
684	   number of SSRC and not change the SSRC but will instead update the
685	   CSRC as the switching MCU select a different participant to include
686	   in the particular video stream.

688	   The receiver displays, in different windows, the video from the most
689	   recent 16 SSRC to send video to m-block.

691	   This allows a switching MCU to know how many thumbnail type streams
692	   would be appropriate to send to this endpoint.

694	7.2.1.  Complex Multi Render Example

696	   The following shows a single multi render m-line that can display up
697	   to three video streams, and send 3 streams, and support 2 layers of
698	   simulcast with FEC on the high resolution layer and bundle.  Note
699	   that only host candidates are provided for the FEC and lower
700	   resolution simulcast so if the device is behind a NAT, those streams
701	   will not be used.

703	   Offer

705	   v=0
706	   o=alice 20519 0 IN IP4 0.0.0.0
707	   s=ULP FEC
708	   t=0 0
709	   a=ice-ufrag:074c6550
710	   a=ice-pwd:a28a397a4c3f31747d1ee3474af08a068
711	   a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7:
712	                       c9:c7:70:9d:1f:66:79:a8:07
713	   c= IN IP4 24.23.204.141
714	   a=group:BUNDLE vid1 vid2 vid3
715	   a=group:FEC vid1 vid2
716	   a=group:SIMULCAST vid1 vid3

718	   m=video 62537 RTP/SAVPF 96
719	   a=mid:vid1
720	   a=multi-render:3
721	   a=rtcp-mux
722	   a=msid:12345123451234512345 s:1 f:1
723	   a=rtpmap:96 VP8/90000
724	   a=fmtp:96 max-fr=30;max-fs=3600;
725	   a=imageattr:96 [x=1280,y=720]
726	   a=candidate:0 1 UDP  2113667327 192.168.1.4 62537 typ host
727	   a=candidate:1 1 UDP  694302207 24.23.204.141 62537
728	                        typ srflx raddr 192.168.1.4 rport 62537
729	   a=candidate:0 2 UDP 2113667326 192.168.1.4 64678 typ host
730	   a=candidate:1 2 UDP  1694302206 24.23.204.141 64678
731	                        typ rflx raddr 192.168.1.4 rport 64678

733	   m=video 62541 RTP/SAVPF 97
734	   a=mid:vid2
735	   a=multi-render:3
736	   a=rtcp-mux
737	   a=msid:34567345673456734567 s:1 f:2
738	   a=rtpmap:97 uplfec/90000
739	   a=candidate:0 1 UDP 2113667327 192.168.1.4 62541 typ host

741	   m=video 62545 RTP/SAVPF 98
742	   a=mid:vid3
743	   a=multi-render:3
744	   a=rtcp-mux
745	   a=msid:333444558899000991122 s:1 f:3
746	   a=rtpmap:98 VP8/90000
747	   a=fmtp:98 max-fr=15;max-fs=300;
748	   a=imageattr:96 [x=320,y=240]
749	   a=candidate:0 1 UDP 2113667327 192.168.1.4 62545 typ host
750	   The following shows an answer to the above offer that accepts
751	   everything and plans to send video from five different cameras in to
752	   this m-line (but only three at a time).

754	   Answer

756	   v=0
757	   o=Bob 20519 0 IN IP4 0.0.0.0
758	   s=ULP FEC
759	   t=0 0
760	   a=ice-ufrag:c300d85b
761	   a=ice-pwd:de4e99bd291c325921d5d47efbabd9a2
762	   a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7:
763	                       c9:c7:70:9d:1f:66:79:a8:07
764	   c= IN IP4 98.248.92.77
765	   a=group:BUNDLE vid1 vid2 vid3
766	   a=group:FEC vid1 vid2
767	   a=group:SIMULCAST vid1 vid3

769	   m=video 42537 RTP/SAVPF 96
770	   a=mid:vid1
771	   a=multi-render:3
772	   a=rtcp-mux
773	   a=msid:54321543215432154321 s:1 f:1 r:12345123451234512345
774	   a=rtpmap:96 VP8/90000
775	   a=fmtp:96 max-fr=30;max-fs=3600;
776	   a=imageattr:96 [x=1280,y=720]
777	   a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host
778	   a=candidate:1 1 UDP 1694302207 98.248.92.77 42537
779	                       typ srflx raddr 192.168.1.7 rport 42537
780	   a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host
781	   a=candidate:1 2 UDP 1694302206 98.248.92.77 60065
782	                       typ srflx raddr 192.168.1.7 rport 60065

784	   m=video 42539 RTP/SAVPF 97
785	   a=mid:vid2
786	   a=multi-render:3
787	   a=rtcp-mux
788	   a=msid:11111122222233333444444 s:1 f:2 r:34567345673456734567
789	   a=rtpmap:97 uplfec/90000
790	   a=candidate:0 1 UDP 2113667327 192.168.1.7 42539 typ host

792	   m=video 42537 RTP/SAVPF 98
793	   a=mid:vid3
794	   a=multi-render:3
795	   a=rtcp-mux
796	   a=msid:777777888888999999111111 s:1 f:3 r:333444558899000991122
797	   a=rtpmap:98 VP8/90000
798	   a=fmtp:98 max-fr=15;max-fs=300;
799	   a=imageattr:98 [x=320,y=240]
800	   a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host
801	   a=candidate:1 1 UDP 1694302207 98.248.92.77 42537
802	                       typ srflx raddr 192.168.1.7 rport 42537
803	   a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host
804	   a=candidate:1 2 UDP 1694302206 98.248.92.77 60065
805	                       typ srflx raddr 192.168.1.7 rport 60065

807	   The following shows an answer to the above by a client that does not
808	   support simulcast, FEC, bundle, or msid.

810	   Answer

812	   v=0
813	   o=Bob 20519 0 IN IP4 0.0.0.0
814	   s=ULP FEC
815	   t=0 0
816	   a=ice-ufrag:c300d85b
817	   a=ice-pwd:de4e99bd291c325921d5d47efbabd9a2
818	   a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d:f7:
819	                       c9:c7:70:9d:1f:66:79:a8:07
820	   c= IN IP4 98.248.92.77

822	   m=video 42537 RTP/SAVPF 96
823	   a=mid:vid1
824	   a=rtcp-mux
825	   a=recvonly
826	   a=rtpmap:96 VP8/90000
827	   a=fmtp:96 max-fr=30;max-fs=3600;
828	   a=candidate:0 1 UDP 2113667327 192.168.1.7 42537 typ host
829	   a=candidate:1 1 UDP 1694302207 98.248.92.77 42537
830	                       typ srflx raddr 192.168.1.7 rport 42537
831	   a=candidate:0 2 UDP 2113667326 192.168.1.7 60065 typ host
832	   a=candidate:1 2 UDP 1694302206 98.248.92.77 60065
833	                       typ srflx raddr 192.168.1.7 rport 60065

835	   m=video 0 RTP/SAVPF 97
836	   a=mid:vid2
837	   a=rtcp-mux
838	   a=rtpmap:97 uplfec/90000

840	   m=video 0 RTP/SAVPF 98
841	   a=mid:vid3
842	   a=rtcp-mux
843	   a=rtpmap:98 H264/90000
844	   a=fmtp:98 profile-level-id=428014;
845	          max-fs=3600; max-mbps=108000; max-br=14000

847	7.3.  Dirty Little Secrets

849	   If SDP offer/answers are of type AVP or AVPF but contain a crypto of
850	   fingerprint attribute, they should be treated as if they were SAVP or
851	   SAVPF respectively.  The Answer should have the same type as the
852	   offer but for all practical purposes the implementation should treat
853	   it as the secure variant.

855	   If SDP offer/answers are of type AVP or SAVP, but contain an
856	   a=rtcp-fb attribute, they should be treated as if they were AVPF or
857	   SAVPF respectively.  The SDP Answer should have the same type as the
858	   Offer but for all practical purposes the implementation should treat
859	   it as the feedback variant.

861	   If an SDP Offer has both a fingerprint and a crypto attribute, it
862	   means the Offerer supports both DTLS-SRTP and SDES and the answer
863	   should select one and return an Answer with only an attribute for the
864	   selected keying mechanism.

866	   These may not look appealing but the alternative is to make cap-neg
867	   mandatory to implement in WebRTC.

869	7.4.  Open Issues

871	   What do do with unrecognized media received at W3C PerrConnection
872	   level?  Suggestion is it creates a new track in whatever stream the
873	   MSID would indicate if present and the default stream if no MSID
874	   header extension in the RTP.

876	7.5.  Confusions

878	   You can decrypt DTLS-SRTP media before receiving an answer, you can't
879	   determine if it is secure or not till you have the fingerprint and
880	   have verified it

882	   You can use RTCP-FB to do things like PLI without signaling the SSRC.
883	   The PLI packets gets the sender SSRC from the incoming media that is
884	   trying to signal the PLI for.

886	8.  Examples

888	   Example of a video client joining a video conference.  The client can
889	   produce and receive two streams of video, one from the slides and the
890	   other of the person.  The video of the person is synchronized with
891	   the audio.  In addition, the client can display up to 10 thumbnails
892	   of video.  The main video is simulcast at HD size and a thumbnail
893	   size.

895	   Offer

897	        v=0
898	        o=alice 2890844526 2890844527 IN IP4 host.example.com
899	        s=
900	        c=IN IP4 host.atlanta.example.com
901	        t=0 0
902	        a=group:LS 1,2,3
903	        a=group:SIMULCAST 2,3

905	        m=audio 49170 RTP/AVP 96      <- This is the Audio
906	        a=mid:1
907	        a=rtpmap:96 iLBC/8000
908	        a=content:main

910	        m=video 51372 RTP/AVP 97      <- This is the main video
911	        a=mid:2
912	        a=rtpmap:97 VP8/90000
913	        a=fmtp:97 max-fr=30;max-fs=3600;
914	        a=imageattr:97 [x=1080,y=720]
915	        a=content:main

917	        m=video 51372 RTP/AVP 98       <- This is the slides
918	        a=mid:2
919	        a=rtpmap:98 VP8/90000
920	        a=fmtp:98 max-fr=30;max-fs=3600;
921	        a=imageattr:98 [x=1080,y=720]
922	        a=content:slides

924	        m=video 51372 RTP/AVP 99       <- This is the simulcast of main
925	        a=mid:3
926	        a=rtpmap:99 VP8/90000
927	        a=fmtp:99 max-fr=15;max-fs=300;
928	        a=imageattr:99 [x=320,y=240]

930	        m=video 51372 RTP/AVP 100      <- This is the 10 thumbnails
931	        a=mid:4
932	        a=multi-render:10
933	        a=recvonly
934	        a=rtpmap:100 VP8/90000
935	        a=fmtp:100 max-fr=15;max-fs=300;
936	        a=imageattr:100 [x=320,y=240]

938	   Example of a three-screen video endpoint connecting to a two-screen
939	   system which ends up selecting the left and middle screens.

941	   Offer

943	        v=0
944	        o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com
945	        s=
946	        c=IN IP4 host.atlanta.example.com
947	        t=0 0
948	        a=rtcp-fb

950	        m=audio 49100 RTP/SAVPF 96
951	        a=rtpmap:96 iLBC/8000

953	        m=video 49102 RTP/SAVPF 97
954	        a=content:main
955	        a=rtpmap:97 H261/90000

957	        m=video 49104 RTP/SAVPF 98
958	        a=content:left
959	        a=rtpmap:98 H261/90000

961	        m=video 49106 RTP/SAVPF 99
962	        a=content:right
963	        a=rtpmap:99 H261/90000

965	   Answer

967	        v=0
968	        o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com
969	        s=
970	        c=IN IP4 host.biloxi.example.com
971	        t= 0 0
972	        a=rtcp-fb

974	        m=audio 50100 RTP/SAVPF 96
975	        a=rtpmap:96 iLBC/8000

977	        m=video 50102 RTP/SAVPF 97
978	        a=content:main
979	        a=rtpmap:97 H261/90000

981	        m=video 50104 RTP/SAVPF 98
982	        a=content:left
983	        a=rtpmap:98 H261/90000

985	        m=video 0 RTP/SAVPF 99
986	        a=content:right
987	        a=rtpmap:99 H261/90000

989	   Example of a client that supports SRTP-DTLS and SDES connecting to a
990	   client that supports SRTP-DTLS.

992	   Offer

994	        v=0
995	        o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com
996	        s=
997	        c=IN IP4 host.atlanta.example.com
998	        t=0 0

1000	        m=audio 49170 RTP/AVP 99
1001	        a=fingerprint:sha-1 99:41:49:83:4a:97:0e:1f:ef:6d
1002	                            :f7:c9:c7:70:9d:1f:66:79:a8:07
1003	        a=crypto:1 AES_CM_128_HMAC_SHA1_80
1004	          inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
1005	        a=rtpmap:99 iLBC/8000

1007	        m=video 51372 RTP/AVP 96
1008	        a=fingerprint:sha-1 92:81:49:83:4a:23:0a:0f:1f:9d:f7:
1009	                             c0:c7:70:9d:1f:66:79:a8:07
1010	        a=crypto:1 AES_CM_128_HMAC_SHA1_32
1011	          inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
1012	        a=rtpmap:96 H261/90000

1014	   Answer

1016	        v=0
1017	        o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com
1018	        s=
1019	        c=IN IP4 host.biloxi.example.com
1020	        t=0 0

1022	        m=audio 49172 RTP/AVP 99
1023	        a=crypto:1 AES_CM_128_HMAC_SHA1_80
1024	          inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
1025	        a=rtpmap:99 iLBC/8000

1027	        m=video 51374 RTP/AVP 96
1028	        a=crypto:1 AES_CM_128_HMAC_SHA1_80
1029	          inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
1030	        a=rtpmap:96 H261/90000

1032	9.  Tasks

1034	   This section outlines work that needs to be done in various
1035	   specifications to make the proposal here actually happen.

1037	   Tasks:

1039	   1.   Extend the W3C API to be able to set and read the CSRC list for
1040	        a PC-Track.

1042	   2.   Extend the W3C API to be able to read SSRC of last RTP packed
1043	        received.

1045	   3.   Write an RTP Header Extension draft to cary the MSID.

1047	   4.   Fix up MSID draft to align with this proposal.

1049	   5.   Write a draft to add left, right to the SDP content attribute.
1050	        Add the stuff to the W3C API to read and write this on a track.

1052	   6.   Write a draft on SDP "SIMULCAST" group to signal multiple
1053	        m-block as are simulcast of same video content.

1055	   7.   Complete the bundle draft.

1057	   8.   Provide guidance for ways to use SDP for reduced glare when
1058	        adding of one way media streams.

1060	   9.   Write a draft defining the multi render attribute.

1062	   10.  Change W3C API to say that a PC-Track can be in only one
1063	        PeerConnection or make an object inside the PeerConnection for
1064	        each track in the PC that can be used to set constraints and
1065	        settings and get information related to the RTP flow.

1067	   11.  Sort out how to tell a PC-Track, particularly one meant for
1068	        receiving information, that it can do simulcast, layered coding,
1069	        RTX, FEC, etc.

1071	10.  Security Considerations

1073	   TBD

1075	11.  IANA Considerations

1077	   This document requires no actions from IANA.

1079	12.  Acknowledgments

1081	   I would like to thank Suhas Nandakumar, Eric Rescorla, Charles Eckel,
1082	   Mo Zanaty, and Lyndsay Campbell for help with this draft.

1084	13.  Open Issues

1086	   The overall solution is complicated considerably by the fact that
1087	   WebRTC allows a PC-Track to be used in more than one PC-Stream but
1088	   requires only one copy of the RTP data for the track to be sent.  I
1089	   am not aware of any use case for this and think it should be removed.
1090	   If a PC-Track needs to be synchronized with two different things,
1091	   they should all go in one PC-Stream instead of two.

1093	14.  Existing SDP

1095	   The following shows some examples of SDP today that any new system
1096	   needs to be able to receive and work with in a backwards compatible
1097	   way.

1099	14.1.  Multiple Encodings

1101	   Multiple codecs accepted on same m-line [RFC4566].

1103	   Offer

1105	        v=0
1106	        o=alice 2890844526 2890844527 IN IP4 host.atlanta.example.com
1107	        s=
1108	        c=IN IP4 host.atlanta.example.com
1109	        t=0 0

1111	        m=audio 49170 RTP/AVP 99
1112	        a=rtpmap:99 iLBC/8000

1114	        m=video 51372 RTP/AVP 31 32
1115	        a=rtpmap:31 H261/90000
1116	        a=rtpmap:32 MPV/90000

1118	   Answer

1120	        v=0
1121	        o=bob 2808844564 2808844565 IN IP4 host.biloxi.example.com
1122	        s=
1123	        c=IN IP4 host.biloxi.example.com
1124	        t=0 0

1126	        m=audio 49172 RTP/AVP 99
1127	        a=rtpmap:99 iLBC/8000

1129	        m=video 51374 RTP/AVP 31 32
1130	        a=rtpmap:31 H261/90000
1131	        a=rtpmap:32 MPV/90000

1133	   This means that a sender can switch back and forth between H261 and
1134	   MVP without any further signaling.  The receiver MUST be capable of
1135	   receiving both formats.  At any point in time, only one video format
1136	   is sent, thus implying that only one video is meant to be displayed.

1138	14.2.  Forward Error Correction

1140	   Multiple m-blocks identified with respective "mid" grouped to
1141	   indicate FEC operation using FEC-FR semantics defined in [RFC5956].

1143	   Offer

1145	       v=0
1146	       o=ali 1122334455 1122334466 IN IP4 fec.example.com
1147	       s=Raptor RTP FEC Example
1148	       t=0 0
1149	       a=group:FEC-FR S1 R1

1151	       m=video 30000 RTP/AVP 100
1152	       c=IN IP4 233.252.0.1/127
1153	       a=rtpmap:100 MP2T/90000
1154	       a=fec-source-flow: id=0
1155	       a=mid:S1

1157	       m=application 30000 RTP/AVP 110
1158	       c=IN IP4 233.252.0.2/127
1159	       a=rtpmap:110 raptorfec/90000
1160	       a=fmtp:110 raptor-scheme-id=1; Kmax=8192; T=128;
1161	       P=A; repair-window=200000
1162	       a=mid:R1

1164	14.3.  Same Video Codec With Different Settings

1166	   This example shows a single codec,say H.264, signaled with different
1167	   settings [RFC4566].

1169	   Offer

1171	       v=0

1173	       m=video 49170 RTP/AVP 100 99 98
1174	       a=rtpmap:98 H264/90000
1175	       a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
1176	       sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
1177	       a=rtpmap:99 H264/90000
1178	       a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
1179	       sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
1180	       a=rtpmap:100 H264/90000
1181	       a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
1182	       sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==;
1183	       sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
1184	       sprop-init-buf-time=102478; deint-buf-cap=128000

1186	14.4.  Different Video Codecs With Different Resolutions Formats

1188	   The SDP below shows some m-blocks with various ways to specify
1189	   resolutions for video codecs signaled [RFC4566].

1191	   Offer

1193	       m=video 49170 RTP/AVP 31
1194	       a=rtpmap:31 H261/90000
1195	       a=fmtp:31 CIF=2;QCIF=1;D=1

1197	       m=video 49172 RTP/AVP 99
1198	       a=rtpmap:99 jpeg2000/90000
1199	       a=fmtp:99 sampling=YCbCr-4:2:0;width=128;height=128

1201	       m=video 49174 RTP/AVP 96
1202	       a=rtpmap:96 VP8/90000
1203	       a=fmtp:96 max-fr=30;max-fs=3600;
1204	       a=imageattr:96 [x=1280,y=720]

1206	14.5.  Lip Sync Group

1208	   [RFC5888] grouping semantics for Lip Synchronization between audio
1209	   and video

1211	   Offer

1213	       v=0
1214	       o=Laura 289083124 289083124 IN IP4 one.example.com
1215	       c=IN IP4 192.0.2.1
1216	       t=0 0
1217	       a=group:LS 1 2

1219	       m=audio 30000 RTP/AVP 0
1220	       a=mid:1

1222	       m=video 30002 RTP/AVP 31
1223	       a=mid:2

1225	14.6.  BFCP

1227	   [RFC4583] defines SDP format for Binary Floor Control Protocol (BFCP)
1228	   as shown below
1229	   Offer

1231	       m=application 50000 TCP/TLS/BFCP *
1232	       a=setup:passive
1233	       a=connection:new
1234	       a=fingerprint:SHA-1 \
1235	       4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
1236	       a=floorctrl:s-only
1237	       a=confid:4321
1238	       a=userid:1234
1239	       a=floorid:1 m-stream:10
1240	       a=floorid:2 m-stream:11

1242	       m=audio 50002 RTP/AVP 0
1243	       a=label:10

1245	       m=video 50004 RTP/AVP 31
1246	       a=label:11

1248	   Answer

1250	       m=application 50000 TCP/TLS/BFCP *
1251	       a=setup:passive
1252	       a=connection:new
1253	       a=fingerprint:SHA-1 \
1254	       4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
1255	       a=floorctrl:s-only
1256	       a=confid:4321
1257	       a=userid:1234
1258	       a=floorid:1 m-stream:10
1259	       a=floorid:2 m-stream:11

1261	       m=audio 50002 RTP/AVP 0
1262	       a=label:10

1264	       m=video 50004 RTP/AVP 31
1265	       a=label:11

1267	14.7.  Retransmission

1269	   The SDP given below shows SDP signaling for retransmission of the
1270	   original media stream(s) as defined in [RFC4756]
1271	   Offer

1273	      v=0
1274	      o=mascha 2980675221 2980675778 IN IP4 host.example.net
1275	      c=IN IP4 192.0.2.0
1276	      a=group:FID 1 2
1277	      a=group:FID 3 4

1279	      m=audio 49170 RTP/AVPF 96
1280	      a=rtpmap:96 AMR/8000
1281	      a=fmtp:96 octet-align=1
1282	      a=rtcp-fb:96 nack
1283	      a=mid:1

1285	      m=audio 49172 RTP/AVPF 97
1286	      a=rtpmap:97 rtx/8000
1287	      a=fmtp:97 apt=96;rtx-time=3000
1288	      a=mid:2

1290	      m=video 49174 RTP/AVPF 98
1291	      a=rtpmap:98 MP4V-ES/90000
1292	      a=rtcp-fb:98 nack
1293	      a=fmtp:98 profile-level-id=8;config=01010000012000884006682C209\
1294	         0A21F
1295	      a=mid:3

1297	      m=video 49176 RTP/AVPF 99
1298	      a=rtpmap:99 rtx/90000
1299	      a=fmtp:99 apt=98;rtx-time=3000
1300	      a=mid:4

1302	   Note that RTX RFC also has the following SSRC multiplexing example
1303	   but this is meant for declarative use of SDP as there was no way in
1304	   this RFC to accept, reject, or otherwise negotiate this in a an offer
1305	   / answer SDP usage.

1307	   SDP

1309	      v=0
1310	      o=mascha 2980675221 2980675778 IN IP4 host.example.net
1311	      c=IN IP4 192.0.2.0

1313	      m=video 49170 RTP/AVPF 96 97
1314	      a=rtpmap:96 MP4V-ES/90000
1315	      a=rtcp-fb:96 nack
1316	      a=fmtp:96 profile-level-id=8;config=01010000012000884006682C209\
1317	        0A21F
1318	      a=rtpmap:97 rtx/90000
1319	      a=fmtp:97 apt=96;rtx-time=3000

1321	14.8.  Layered coding dependency

1323	   [RFC5583] "depend" attribute is shown here to indicate dependency
1324	   between layers represented by the individual m-blocks
1325	   Offer

1327	       a=group:DDP L1 L2 L3

1329	       m=video 20000 RTP/AVP 96 97 98
1330	       a=rtpmap:96 H264/90000
1331	       a=fmtp:96 profile-level-id=4de00a; packetization-mode=0;
1332	         mst-mode=NI-T; sprop-parameter-sets={sps0},{pps0};
1333	       a=rtpmap:97 H264/90000
1334	       a=fmtp:97 profile-level-id=4de00a; packetization-mode=1;
1335	         mst-mode=NI-TC; sprop-parameter-sets={sps0},{pps0};
1336	       a=rtpmap:98 H264/90000
1337	       a=fmtp:98 profile-level-id=4de00a; packetization-mode=2;
1338	         mst-mode=I-C; init-buf-time=156320;
1339	         sprop-parameter-sets={sps0},{pps0};
1340	       a=mid:L1

1342	       m=video 20002 RTP/AVP 99 100
1343	       a=rtpmap:99 H264-SVC/90000
1344	       a=fmtp:99 profile-level-id=53000c; packetization-mode=1;
1345	         mst-mode=NI-T; sprop-parameter-sets={sps1},{pps1};
1346	       a=rtpmap:100 H264-SVC/90000
1347	       a=fmtp:100 profile-level-id=53000c; packetization-mode=2;
1348	         mst-mode=I-C; sprop-parameter-sets={sps1},{pps1};
1349	       a=mid:L2
1350	       a=depend:99 lay L1:96,97; 100 lay L1:98

1352	       m=video 20004 RTP/AVP 101
1353	       a=rtpmap:101 H264-SVC/90000
1354	       a=fmtp:101 profile-level-id=53001F; packetization-mode=1;
1355	         mst-mode=NI-T; sprop-parameter-sets={sps2},{pps2};
1356	       a=mid:L3
1357	       a=depend:101 lay L1:96,97 L2:99

1359	14.9.  SSRC Signaling

1361	   [RFC5576] "ssrc" attribute is shown here to signal synchronization
1362	   sources in a given RTP Session

1364	   Offer

1366	       m=video 49170 RTP/AVP 96
1367	       a=rtpmap:96 H264/90000
1368	       a=ssrc:12345 cname:user@example.com
1369	       a=ssrc:67890 cname:user@example.com

1371	   This indicates what the sender will send.  It's at best a guess
1372	   because in the case of SSRC collision, it's all wrong.  It does not
1373	   allow one to reject a stream.  It does not mean that both streams are
1374	   displayed at the same time.

1376	14.10.  Content Signaling

1378	   [RFC4796] "content" attribute is used to specify the semantics of
1379	   content represented by the video streams.

1381	   Offer

1383	       v=0
1384	       o=Alice 292742730 29277831 IN IP4 131.163.72.4
1385	       s=Second lecture from information technology
1386	       c=IN IP4 131.164.74.2
1387	       t=0 0

1389	       m=video 52886 RTP/AVP 31
1390	       a=rtpmap:31 H261/9000
1391	       a=content:slides

1393	       m=video 53334 RTP/AVP 31
1394	       a=rtpmap:31 H261/9000
1395	       a=content:speaker

1397	       m=video 54132 RTP/AVP 31
1398	       a=rtpmap:31 H261/9000
1399	       a=content:sl

1401	15.  References

1403	15.1.  Normative References

1405	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1406	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1408	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1409	              with Session Description Protocol (SDP)", RFC 3264,
1410	              June 2002.

1412	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1413	              Description Protocol", RFC 4566, July 2006.

1415	15.2.  Informative References

1417	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1418	              A., Peterson, J., Sparks, R., Handley, M., and E.
1419	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1420	              June 2002.

1422	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1423	              Jacobson, "RTP: A Transport Protocol for Real-Time
1424	              Applications", STD 64, RFC 3550, July 2003.

1426	   [RFC4583]  Camarillo, G., "Session Description Protocol (SDP) Format
1427	              for Binary Floor Control Protocol (BFCP) Streams",
1428	              RFC 4583, November 2006.

1430	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1431	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1432	              July 2006.

1434	   [RFC4756]  Li, A., "Forward Error Correction Grouping Semantics in
1435	              Session Description Protocol", RFC 4756, November 2006.

1437	   [RFC4796]  Hautakorpi, J. and G. Camarillo, "The Session Description
1438	              Protocol (SDP) Content Attribute", RFC 4796,
1439	              February 2007.

1441	   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
1442	              (ICE): A Protocol for Network Address Translator (NAT)
1443	              Traversal for Offer/Answer Protocols", RFC 5245,
1444	              April 2010.

1446	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1447	              Media Attributes in the Session Description Protocol
1448	              (SDP)", RFC 5576, June 2009.

1450	   [RFC5583]  Schierl, T. and S. Wenger, "Signaling Media Decoding
1451	              Dependency in the Session Description Protocol (SDP)",
1452	              RFC 5583, July 2009.

1454	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
1455	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

1457	   [RFC5956]  Begen, A., "Forward Error Correction Grouping Semantics in
1458	              the Session Description Protocol", RFC 5956,
1459	              September 2010.

1461	   [webrtc-api]
1462	              Bergkvist, Burnett, Jennings, Narayanan, "WebRTC 1.0:
1463	              Real-time Communication Between Browsers", October 2011.

1465	              Available at
1466	              http://dev.w3.org/2011/webrtc/editor/webrtc.html

1468	Author's Address

1470	   Cullen Jennings
1471	   Cisco
1472	   400 3rd Avenue SW, Suite 350
1473	   Calgary, AB  T2P 4H2
1474	   Canada

1476	   Email: fluffy@iii.ca