idnits 2.17.1 

draft-ietf-clue-rtp-mapping-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 22, 2014) is 3566 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'I-D.lennox-clue-rtp-usage' is defined on line 703,
     but no explicit reference was found in the text

  == Unused Reference: 'I-D.westerlund-avtext-codec-operation-point' is
     defined on line 742, but no explicit reference was found in the text

  == Unused Reference: 'RFC5285' is defined on line 773, but no explicit
     reference was found in the text

  ** Downref: Normative reference to an Informational draft:
     draft-even-mmusic-application-token (ref.
     'I-D.even-mmusic-application-token')

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-16

  == Outdated reference: A later version (-03) exists of
     draft-westerlund-avtext-rtcp-sdes-srcname-01

  == Outdated reference: A later version (-05) exists of
     draft-lennox-mmusic-sdp-source-selection-04

  == Outdated reference: A later version (-02) exists of
     draft-westerlund-avtcore-rtp-topologies-update-01

  == Outdated reference: A later version (-01) exists of
     draft-westerlund-avtext-codec-operation-point-00

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)

  -- Obsolete informational reference (is this intentional?): RFC 5285
     (Obsoleted by RFC 8285)


     Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	CLUE WG                                                          R. Even
3	Internet-Draft                                       Huawei Technologies
4	Intended status: Standards Track                               J. Lennox
5	Expires: January 23, 2015                                          Vidyo
6	                                                           July 22, 2014

8	               Mapping RTP streams to CLUE media captures
9	                   draft-ietf-clue-rtp-mapping-02.txt

11	Abstract

13	   This document describes mechanisms and recommended practice for
14	   mapping RTP media streams defined in SDP to CLUE media captures.

16	Status of This Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on January 23, 2015.

33	Copyright Notice

35	   Copyright (c) 2014 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
51	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
52	   3.  RTP topologies for CLUE . . . . . . . . . . . . . . . . . . .   3
53	   4.  Mapping CLUE Capture Encodings to RTP streams . . . . . . . .   5
54	     4.1.  Review of current directions in MMUSIC, AVText and
55	           AVTcore . . . . . . . . . . . . . . . . . . . . . . . . .   6
56	     4.2.  Requirements of a solution  . . . . . . . . . . . . . . .   7
57	     4.3.  Static Mapping  . . . . . . . . . . . . . . . . . . . . .   8
58	     4.4.  Dynamic mapping . . . . . . . . . . . . . . . . . . . . .   9
59	     4.5.  Recommendations . . . . . . . . . . . . . . . . . . . . .   9
60	   5.   Application to CLUE Media Requirements . . . . . . . . . . .  10
61	   6.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .  11
62	     6.1.  Static mapping  . . . . . . . . . . . . . . . . . . . . .  12
63	     6.2.  Dynamic Mapping . . . . . . . . . . . . . . . . . . . . .  14
64	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
65	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
66	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
67	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
68	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  15
69	     10.2.  Informative References . . . . . . . . . . . . . . . . .  16
70	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

72	1.  Introduction

74	   Telepresence systems can send and receive multiple media streams.
75	   The CLUE framework [I-D.ietf-clue-framework] defines media captures
76	   as a source of Media, such as from one or more Capture Devices.  A
77	   Media Capture (MC) may be the source of one or more Media streams.  A
78	   Media Capture may also be constructed from other Media streams.  A
79	   middle box can express conceptual Media Captures that it constructs
80	   from Media streams it receives.

82	   SIP offer answer [RFC3264] uses SDP [RFC4566]  to describe the
83	   RTP[RFC3550] media streams.  Each RTP stream has a unique SSRC within
84	   its RTP session.  The content of the RTP stream is created by an
85	   encoder in the endpoint.  This may be an original content from a
86	   camera or a content created by an intermediary device like an MCU.

88	   This document makes recommendations, for the telepresence
89	   architecture, about how RTP and RTCP streams should be encoded and
90	   transmitted, and how their relation to CLUE Media Captures should be
91	   communicated.  The proposed solution supports multiple RTP
92	   topologies.

94	2.  Terminology

96	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
97	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
98	   document are to be interpreted as described in RFC2119[RFC2119] and
99	   indicate requirement levels for compliant RTP implementations.

101	3.  RTP topologies for CLUE

103	   The typical RTP topologies used by Telepresence systems specify
104	   different behaviors for RTP and RTCP distribution.  A number of RTP
105	   topologies are described in
106	   [I-D.westerlund-avtcore-rtp-topologies-update].  For telepresence,
107	   the relevant topologies include point-to-point, as well as media
108	   mixers, media- switching mixers, and source-projection middleboxs.

110	   In the point-to-point topology, one peer communicates directly with a
111	   single peer over unicast.  There can be one or more RTP sessions, and
112	   each RTP session can carry multiple RTP streams identified by their
113	   SSRC.  All SSRCs will be recognized by the peers based on the
114	   information in the RTCP SDES report that will include the CNAME and
115	   SSRC of the sent RTP streams.  There are different point to point use
116	   cases as specified in CLUE use case [RFC7205].  There may be a
117	   difference between the symmetric and asymmetric use cases.  While in
118	   the symmetric use case the typical mapping will be from a Media
119	   capture device to a render device (e.g. camera to monitor) in the
120	   asymmetric case the render device may receive different capture
121	   information (RTP stream from a different camera) if it has fewer
122	   rendering devices (monitors).  In some cases, a CLUE session which,
123	   at a high-level, is point-to-point may nonetheless have RTP which is
124	   best described by one of the mixer topologies below.  For example, a
125	   CLUE endpoint can produce composited or switched captures for use by
126	   a receiving system with fewer displays than the sender has cameras.

128	   In the Media Mixer topology, the peers communicate only with the
129	   mixer.  The mixer provides mixed or composited media streams, using
130	   its own SSRC for the sent streams.  There are two cases here.  In the
131	   first case the mixer may have separate RTP sessions with each peer
132	   (similar to the point to point topology) terminating the RTCP
133	   sessions on the mixer; this is known as Topo-RTCP-Terminating MCU in
134	   [RFC5117].  In the second case, the mixer can use a conference-wide
135	   RTP session similar to RFC 5117's Topo-mixer or Topo-Video-switching.
136	   The major difference is that for the second case, the mixer uses
137	   conference-wide RTP sessions, and distributes the RTCP reports to all
138	   the RTP session participants, enabling them to learn all the CNAMEs
139	   and SSRCs of the participants and know the contributing source or
140	   sources (CSRCs) of the original streams from the RTP header.  In the
141	   first case, the Mixer terminates the RTCP and the participants cannot
142	   know all the available sources based on the RTCP information.  The
143	   conference roster information including conference participants,
144	   endpoints, media and media-id (SSRC) can be available using the
145	   conference event package [RFC4575] element.

147	   In the Media-Switching Mixer topology, the peer to mixer
148	   communication is unicast with mixer RTCP feedback.  It is
149	   conceptually similar to a compositing mixer as described in the
150	   previous paragraph, except that rather than compositing or mixing
151	   multiple sources, the mixer provides one or more conceptual sources
152	   selecting one source at a time from the original sources.  The Mixer
153	   creates a conference-wide RTP session by sharing remote SSRC values
154	   as CSRCs to all conference participants.

156	   In the Source-Projection middlebox topology, the peer to mixer
157	   communication is unicast with RTCP mixer feedback.  Every potential
158	   sender in the conference has a source which is "projected" by the
159	   mixer into every other session in the conference; thus, every
160	   original source is maintained with an independent RTP identity to
161	   every receiver, maintaining separate decoding state and its original
162	   RTCP SDES information.  However, RTCP is terminated at the mixer,
163	   which might also perform reliability, repair, rate adaptation, or
164	   transcoding on the stream.  Senders' SSRCs may be renumbered by the
165	   mixer.  The sender may turn the projected sources on and off at any
166	   time, depending on which sources it thinks are most relevant for the
167	   receiver; this is the primary reason why this topology must act as an
168	   RTP mixer rather than as a translator, as otherwise these disabled
169	   sources would appear to have enormous packet loss.  Source switching
170	   is accomplished through this process of enabling and disabling
171	   projected sources, with the higher-level semantic assignment of
172	   reason for the RTP streams assigned externally.

174	   The above topologies demonstrate two major RTP/RTCP behaviors:

176	   1.  The mixer may either use the source SSRC when forwarding RTP
177	       packets, or use its own created SSRC.  Still the mixer will
178	       distribute all RTCP information to all participants creating
179	       conference-wide RTP session/s.  This allows the participants to
180	       learn the available RTP sources in each RTP session.  The
181	       original source information will be the SSRC or in the CSRC
182	       depending on the topology.  The point to point case behaves like
183	       this.

185	   2.  The mixer terminates the RTCP from the source, creating separate
186	       RTP sessions with the peers.  In this case the participants will
187	       not receive the source SSRC in the CSRC.  Since this is usually a
188	       mixer topology, the source information is available from the SIP
189	       conference event package [RFC4575].  Subscribing to the
190	       conference event package allows each participant to know the
191	       SSRCs of all sources in the conference.

193	4.  Mapping CLUE Capture Encodings to RTP streams

195	   The different topologies described in Section 3 support different
196	   SSRC distribution models and RTP stream multiplexing points.

198	   Most video conferencing systems today can separate multiple RTP
199	   sources by placing them into separate RTP sessions using, the SDP
200	   description.  For example, main and slides video sources are
201	   separated into separate RTP sessions based on the content attribute
202	   [RFC4796].  This solution works straightforwardly if the multiplexing
203	   point is at the UDP transport level, where each RTP stream uses a
204	   separate RTP session.  This will also be true for mapping the RTP
205	   streams to Media Captures Encodings if each media capture encodings
206	   uses a separate RTP session, and the consumer can identify it based
207	   on the receiving RTP port.  In this case, SDP only needs to label the
208	   RTP session with an identifier that identifies the media capture in
209	   the CLUE description.  In this case, it does not change the mapping
210	   even if the RTP session is switched using same or different SSRC.
211	   (The multiplexing is not at the SSRC level).

213	   Even though Session multiplexing is supported by CLUE, for scaling
214	   reasons, CLUE recommends using SSRC multiplexing in a single or
215	   multiple sessions.  So we need to look at how to map RTP streams to
216	   Media Captures Encodings when SSRC multiplexing is used.

218	   When looking at SSRC multiplexing we can see that in various
219	   topologies, the SSRC behavior may be different:

221	   1.  The SSRCs are static (assigned by the MCU/Mixer), and there is an
222	       SSRC for each media capture encoding defined in the CLUE
223	       protocol.  Source information may be conveyed using CSRC, or, in
224	       the case of topo-RTCP-Terminating MCU, is not conveyed.

226	   2.  The SSRCs are dynamic, representing the original source and are
227	       relayed by the Mixer/MCU to the participants.

229	   In the above two cases the MCU/Mixer may create an advertisement,
230	   with a virtual room capture scene.

232	   Another case we can envision is that the MCU / Mixer relays all the
233	   capture scenes from all advertisements to all consumers.  This means
234	   that the advertisement will include multiple capture scenes, each
235	   representing a separate TelePresence room with its own coordinate
236	   system.

238	4.1.  Review of current directions in MMUSIC, AVText and AVTcore

240	   Editor's note: This section provides an overview of the RFCs and
241	   drafts that can be used a base for a mapping solution.  This section
242	   is for information only, and if the WG thinks that it is the right
243	   direction, the authors will bring the required work to the relevant
244	   WGs.

246	   The solution needs to also support the simulcast case where more than
247	   one RTP session may be advertised for a Media Capture.  Support of
248	   such simulcast is out of scope for CLUE.

250	   When looking at the available tools based on current work in MMUSIC,
251	   AVTcore and AVText for supporting SSRC multiplexing the following
252	   documents are considered to be relevant.

254	   SDP Source attribute [RFC5576] mechanisms to describe specific
255	   attributes of RTP sources based on their SSRC.

257	   Negotiation of generic image attributes in SDP [RFC6236] provides the
258	   means to negotiate the image size.  The image attribute can be used
259	   to offer different image parameters like size but in order to offer
260	   multiple RTP streams with different resolutions it does it using
261	   separate RTP session for each image option.

263	   [I-D.westerlund-avtcore-max-ssrc] proposes a signaling solution for
264	   how to use multiple SSRCs within one RTP session.

266	   [I-D.westerlund-avtext-rtcp-sdes-srcname]  provides an extension that
267	   may be send in SDP, as an RTCP SDES information or as an RTP header
268	   extension that uniquely identifies a single media source.  It defines
269	   an hierarchical order of the SRCNAME parameter that can be used to
270	   for example to describe multiple resolution from the same source (see
271	   section 5.1 of [I-D.westerlund-avtcore-rtp-simulcast]).  Still all
272	   the examples are using RTP session multiplexing.

274	   Other documents reviewed by the authors but are currently not used in
275	   a proposed solution include:

277	   [I-D.lennox-mmusic-sdp-source-selection] specifies how participants
278	   in a multimedia session can request a specific source from a remote
279	   party.

281	   [I-D.westerlund-avtext-codec-operation-point](expired) extends the
282	   codec control messages by specifying messages that let participants
283	   communicate a set of codec configuration parameters.

285	   Using the above documents it is possible to negotiate the max number
286	   of received and sent RTP streams inside an RTP session (m-line or
287	   bundled m-line).  This allows also offering allowed combinations of
288	   codec configurations using different payload type numbers

290	   Examples: max-recv-ssrc:{96:2 & 97:3) where 96 and 96 are different
291	   payload type numbers.  Or max-send-ssrc{*:4}.

293	   In the next sections, the document will propose mechanisms to map the
294	   RTP streams to media captures addressing.

296	4.2.  Requirements of a solution

298	   This section lists, more briefly, the requirements a media
299	   architecture for Clue telepresence needs to achieve, summarizing the
300	   discussion of previous sections.  In this section, RFC 2119 [RFC2119]
301	   language refers to requirements on a solution, not an implementation;
302	   thus, requirements keywords are not written in capital letters.

304	   Media-1: It must not be necessary for a Clue session to use more than
305	   a single transport flow for transport of a given media type (video or
306	   audio).

308	   Media-2: It must, however, be possible for a Clue session to use
309	   multiple transport flows for a given media type where it is
310	   considered valuable (for example, for distributed media, or
311	   differential quality-of-service).

313	   Media-3: It must be possible for a Clue endpoint or MCU to
314	   simultaneously send sources corresponding to static, to composited,
315	   and to switched captures, in the same transport flow.  (Any given
316	   device might not necessarily be able send all of these source types;
317	   but for those that can, it must be possible for them to be sent
318	   simultaneously.)

320	   Media-4: It must be possible for an original source to move among
321	   switched captures (i.e. at one time be sent for one switched capture,
322	   and at a later time be sent for another one).

324	   Media-5: It must be possible for a source to be placed into a
325	   switched capture even if the source is a "late joiner", i.e. was
326	   added to the conference after the receiver requested the switched
327	   source.

329	   Media-6: Whenever a given source is assigned to a switched capture,
330	   it must be immediately possible for a receiver to determine the
331	   switched capture it corresponds to, and thus that any previous source
332	   is no longer being mapped to that switched capture.

334	   Media-7: It must be possible for a receiver to identify the actual
335	   source that is currently being mapped to a switched capture, and
336	   correlate it with out-of-band (non-Clue) information such as rosters.

338	   Media-8: It must be possible for a source to move among switched
339	   captures without requiring a refresh of decoder state (e.g., for
340	   video, a fresh I-frame), when this is unnecessary.  However, it must
341	   also be possible for a receiver to indicate when a refresh of decoder
342	   state is in fact necessary.

344	   Media-9: If a given source is being sent on the same transport flow
345	   for more than one reason (e.g. if it corresponds to more than one
346	   switched capture at once, or to a static capture), it should be
347	   possible for a sender to send only one copy of the source.

349	   Media-10: On the network, media flows should, as much as possible,
350	   look and behave like currently-defined usages of existing protocols;
351	   established semantics of existing protocols must not be redefined.

353	   Media-11: The solution should seek to minimize the processing burden
354	   for boxes that distribute media to decoding hardware.

356	   Media-12: If multiple sources from a single synchronization context
357	   are being sent simultaneously, it must be possible for a receiver to
358	   associate and synchronize them properly, even for sources that are
359	   are mapped to switched captures.

361	4.3.  Static Mapping

363	   Static mapping is widely used in current MCU implementations.  It is
364	   also common for a point to point symmetric use case when both
365	   endpoints have the same capabilities.  For capture encodings with
366	   static SSRCs, it is most straightforward to indicate this mapping
367	   outside the media stream, in the CLUE or SDP signaling.  An SDP
368	   source attribute [RFC5576] can be used to associate CLUE encoding
369	   using appIds[I-D.even-mmusic-application-token] with SSRCs in SDP.
370	   Each SSRC will have an appId value that will be specified also in the
371	   CLUE media capture as an attribute.  The provider advertisement
372	   could, if it wished, use the same SSRC for media capture encodings
373	   that are mutually exclusive.  (This would be natural, for example, if
374	   two advertised captures are implemented as different configurations
375	   of the same physical camera, zoomed in or out.).  Section 6 provide
376	   an example of an SDP offer and CLUE advertisement.

378	4.4.  Dynamic mapping

380	   Dynamic mapping by tagging each media packet with the appId.  This
381	   means that a receiver immediately knows how to interpret received
382	   media, even when an unknown SSRC is seen.  As long as the media
383	   carries a known appId, it can be assumed that this media stream will
384	   replace the stream currently being received with that appId.

386	   This gives significant advantages to switching latency, as a switch
387	   between sources can be achieved without any form of negotiation with
388	   the receiver.

390	   However, the disadvantage in using a appId in the stream that it
391	   introduces additional processing costs for every media packet, as
392	   appIds are scoped only within one hop (i.e., within a cascaded
393	   conference a appId that is used from the source to the first MCU is
394	   not meaningful between two MCUs, or between an MCU and a receiver),
395	   and so they may need to be added or modified at every stage.

397	   If the appIds are chosen by the media sender, by offering a
398	   particular capture encoding to multiple recipients with the same ID,
399	   this requires the sender to only produce one version of the stream
400	   (assuming outgoing payload type numbers match).  This reduces the
401	   cost in the multicast case, although does not necessarily help in the
402	   switching case.

404	   An additional issue with putting appIds in the RTP packets comes from
405	   cases where a non-CLUE aware endpoint is being switched by an MCU to
406	   a CLUE endpoint.  In this case, we may require up to an additional 12
407	   bytes in the RTP header, which may push a media packet over the MTU.
408	   However, as the MTU on either side of the switch may not match, it is
409	   possible that this could happen even without adding extra data into
410	   the RTP packet.  The 12 additional bytes per packet could also be a
411	   significant bandwidth increase in the case of very low bandwidth
412	   audio codecs.

414	4.5.  Recommendations

416	   The recommendation is that endpoints MUST support both the static
417	   declaration of capture encoding SSRCs, and appId in every media
418	   packet.  For low bandwidth situations, this may be considered
419	   excessive overhead; in which case endpoints MAY support the approach
420	   where appIds are sent selectively.  The SDP offer MAY specify the
421	   SSRC mapping to capture encoding.  In the case of static mapping
422	   topologies there will be no need to use the header extensions in the
423	   media, since the SSRC for the RTP stream will remain the same during
424	   the call unless a collision is detected and handled according to
425	   RFC5576 [RFC5576].  If the used topology uses dynamic mapping then
426	   the appId will be used to indicate the RTP stream switch for the
427	   media capture.  In this case the SDP description may be used to
428	   negotiate the initial SSRC but this will be left for the
429	   implementation.  Note that if the SSRC is defined explicitly in the
430	   SDP the SSRC collision should be handled as in RFC5576.

432	5.  Application to CLUE Media Requirements

434	   The requirement section Section 4.2 offers a number of requirements
435	   that are believed to be necessary for a CLUE RTP mapping.  The
436	   solutions described in this document are believed to meet these
437	   requirements, though some of them are only possible for some of the
438	   topologies.  (Since the requirements are generally of the form "it
439	   must be possible for a sender to do something", this is adequate; a
440	   sender which wishes to perform that action needs to choose a topology
441	   which allows the behavior it wants.

443	   In this section we address only those requirements where the
444	   topologies or the association mechanisms treat the requirements
445	   differently.

447	   Media-4: It must be possible for an original source to move among
448	   switched captures (i.e. at one time be sent for one switched capture,
449	   and at a later time be sent for another one).

451	   This applies naturally for static sources with a Switched Mixer.  For
452	   dynamic sources with a Source-Projecting middlebox, this just
453	   requires the appId in the header extension element to be updated
454	   appropriately.

456	   Media-6: Whenever a given source is transmitted for a switched
457	   capture, it must be immediately possible for a receiver to determine
458	   the switched capture it corresponds to, and thus that any previous
459	   source is no longer being mapped to that switched capture.

461	   For a Switched Mixer, this applies naturally.  For a Source-
462	   Projecting middlebox, this is done based on the appId.

464	   Media-7: It must be possible for a receiver to identify the original
465	   source that is currently being mapped to a switched capture, and
466	   correlate it with out-of-band (non-Clue) information such as rosters.

468	   For a Switched Mixer, this is done based on the CSRC, if the mixer is
469	   providing CSRCs; if for a Source-Projecting middlebox, this is done
470	   based on the SSRC.

472	   Media-8: It must be possible for a source to move among switched
473	   captures without requiring a refresh of decoder state (e.g., for
474	   video, a fresh I-frame), when this is unnecessary.  However, it must
475	   also be possible for a receiver to indicate when a refresh of decoder
476	   state is in fact necessary.

478	   This can be done by a Source-Projecting middlebox, but not by a
479	   Switching Mixer.  The last requirement can be accomplished through an
480	   FIR message [RFC5104], though potentially a faster mechanism (not
481	   requiring a round-trip time from the receiver) would be preferable.

483	   Media-9: If a given source is being sent on the same transport flow
484	   to satisfy more than one capture (e.g. if it corresponds to more than
485	   one switched capture at once, or to a static capture as well as a
486	   switched capture), it should be possible for a sender to send only
487	   one copy of the source.

489	   For a Source-Projecting middlebox, this can be accomplished by
490	   sending multiple dynamic appIds for the same source; this can also be
491	   done for an environment with a hybrid of mixer topologies and static
492	   and dynamic captures, described below in Section 6.  It is not
493	   possible for static captures from a Switched Mixer.

495	   Media-12: If multiple sources from a single synchronization context
496	   are being sent simultaneously, it must be possible for a receiver to
497	   associate and synchronize them properly, even for sources that are
498	   mapped to switched captures.

500	   For a Mixed or Switched Mixer topology, receivers will see only a
501	   single synchronization context (CNAME), corresponding to the mixer.
502	   For a Source-Projecting middlebox, separate projecting sources keep
503	   separate synchronization contexts based on their original CNAMEs,
504	   thus allowing independent synchronization of sources from independent
505	   rooms without needing global synchronization.  In hybrid cases,
506	   however (e.g. if audio is mixed), all sources which need to be
507	   synchronized with the mixed audio must get the same CNAME (and thus a
508	   mixer-provided timebase) as the mixed audio.

510	6.  Examples

512	   It is possible for a CLUE device to send multiple instances of the
513	   topologies in Section 3 simultaneously.  For example, an MCU which
514	   uses a traditional audio bridge with switched video would be a Mixer
515	   topology for audio, but a Switched Mixer or a Source-Projecting
516	   middlebox for video.  In the latter case, the audio could be sent as
517	   a static source, whereas the video could be dynamic.

519	   More notably, it is possible for an endpoint to send the same sources
520	   both for static and dynamic captures.  Consider the example
521	   [I-D.ietf-clue-framework], where an endpoint can provide both three
522	   cameras (VC0, VC1, and VC2) for left, center, and right views, as
523	   well as a switched view (VC3) of the loudest panel.

525	   It is possible for a consumer to request both the (VC0 - VC2) set and
526	   VC3.  It is worth noting that the content of VC3 is, at all times,
527	   exactly the content of one of VC0, VC1, or VC2.  Thus, if the sender
528	   uses the Source-Projecting middlebox topology for VC3, the consumer
529	   that receives these three sources would not need to send any
530	   additional media traffic over just sending (VC0 - VC2).

532	   In this case, the advertiser could describe VC0, VC1, and VC2 in its
533	   initial advertisement or SDP with static SSRCs, whereas VC3 would
534	   need to be dynamic.  The role of VC3 would move among VC0, VC1, or
535	   VC2, indicated by the appId RTP header extension on those streams'
536	   RTP packets.

538	6.1.  Static mapping

540	   Using the video capture example from the framework for a three camera
541	   system with four monitors where one is for the presentation stream
542	   [I-D.ietf-clue-framework] document:

544	   o  VC0- (the camera-left camera stream, purpose=main, switched:no

546	   o  VC1- (the center camera stream, purpose=main, switched:no

548	   o  VC2- (the camera-right camera stream), purpose=main, switched:no

550	   o  VC3- (the loudest panel stream), purpose=main, switched:yes

552	   o  VC4- (the loudest panel stream with PiPs), purpose=main,
553	      composed=true; switched:yes

555	   o  VC5- (the zoomed out view of all people in the room),
556	      purpose=main, composed=no; switched:no

558	   o  VC6- (presentation stream), purpose=presentation, switched:no

560	   Where the physical simultaneity information is:

562	      {VC0, VC1, VC2, VC3, VC4, VC6}

564	      {VC0, VC2, VC5, VC6}

566	   In this case the provider can send up to six simultaneous streams and
567	   receive four one for each monitor.  This is the maximum case but it
568	   can be further limited by the capture scene entries which may propose
569	   sending only three camera streams and one presentation, still since
570	   the consumer can select any media captures that can be sent
571	   simultaneously the offer will specify 6 streams where VC5 and VC1 are
572	   using the same resource and are mutually exclusive.

574	   In the Advertisement there may be two capture scenes:

576	   The first capture scene may have four entries:

578	      {VC0, VC1, VC2}

580	      {VC3}

582	      {VC4}

584	      {VC5}

586	   The second capture scene will have the following single entry.

588	   {VC6}

590	   We assume that an intermediary will need to look at CLUE if want to
591	   have better decision on handling specific RTP streams for example
592	   based on them being part of the same capture scene so the SDP will
593	   not group streams by capture scene.

595	   The SIP offer may be

597	      m=video 49200 RTP/AVP 99

599	      a=extmap:1 urn:ietf:params:rtp-hdrex:appId/ for support of dynamic
600	      mapping

602	      a=rtpmap:99 H264/90000

604	      a=max-send-ssrc:{*:6}

606	      a=max-recv-ssrc:{*:4}

608	      a=ssrc:11111 appId:1

610	      a=ssrc:22222appId:2

612	      a=ssrc:33333 appId:3

614	      a=ssrc:44444appId:4

616	      a=ssrc:55555 appId:5
617	      a=ssrc:66666 appId:6

619	   In the above example the provider can send up to five main streams
620	   and one presentation stream.

622	   Note that VC1 and VC5 have the same SSRC since they are using the
623	   same resource.

625	   o  VC0- (the camera-left camera stream, purpose=main, switched:no,
626	      appId =1

628	   o  VC1- (the center camera stream, purpose=main, switched:no, appId
629	      =2

631	   o  VC2- (the camera-right camera stream), purpose=main, switched:no,
632	      appId =3

634	   o  VC3- (the loudest panel stream), purpose=main, switched:yes, appId
635	      =4

637	   o  VC4- (the loudest panel stream with PiPs), purpose=main,
638	      composed=true; switched:yes, appId =5

640	   o  VC5- (the zoomed out view of all people in the room),
641	      purpose=main, composed=no; switched:no, appId =2

643	   o  VC6- (presentation stream), purpose=presentation, switched:no,
644	      appId =6

646	   Note: We can allocate an SSRC for each MC which will not require the
647	   indirection of using an appId.  This will require if a switch to
648	   dynamic is done to provide information about which SSRC is being
649	   replaced by the new one.

651	6.2.  Dynamic Mapping

653	   For topologies that use dynamic mapping there is no need to provide
654	   the SSRCs in the offer (they may not be available if the offers from
655	   the sources will not include them when connecting to the mixer or
656	   remote endpoint) In this case the appId will be specified first in
657	   the advertisement.

659	   The SIP offer may be

661	      m=video 49200 RTP/AVP 99

663	      a=extmap:1 urn:ietf:params:appId
664	      a=rtpmap:99 H264/90000

666	      a=max-send-ssrc:{*:4}

668	      a=max-recv-ssrc:{*:4}

670	   This will work for ssrc multiplex.  It is not clear how it will work
671	   when RTP streams of the same media are not multiplexed in a single
672	   RTP session.  How to know which encoding will be in which of the
673	   different RTP sessions.

675	7.  Acknowledgements

677	   The authors would like to thanks Allyn Romanow and Paul Witty for
678	   contributing text to this work.

680	8.  IANA Considerations

682	   TBD

684	9.  Security Considerations

686	   TBD.

688	10.  References

690	10.1.  Normative References

692	   [I-D.even-mmusic-application-token]
693	              Even, R., Lennox, J., and Q. Wu, "The Session Description
694	              Protocol (SDP) Application Token Attribute", draft-even-
695	              mmusic-application-token-03 (work in progress), April
696	              2014.

698	   [I-D.ietf-clue-framework]
699	              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
700	              "Framework for Telepresence Multi-Streams", draft-ietf-
701	              clue-framework-16 (work in progress), June 2014.

703	   [I-D.lennox-clue-rtp-usage]
704	              Lennox, J., Witty, P., and A. Romanow, "Real-Time
705	              Transport Protocol (RTP) Usage for Telepresence Sessions",
706	              draft-lennox-clue-rtp-usage-04 (work in progress), June
707	              2012.

709	   [I-D.westerlund-avtcore-max-ssrc]
710	              Westerlund, M., Burman, B., and F. Jansson, "Multiple
711	              Synchronization sources (SSRC) in RTP Session Signaling",
712	              draft-westerlund-avtcore-max-ssrc-02 (work in progress),
713	              July 2012.

715	   [I-D.westerlund-avtext-rtcp-sdes-srcname]
716	              Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES
717	              Item SRCNAME to Label Individual Sources", draft-
718	              westerlund-avtext-rtcp-sdes-srcname-01 (work in progress),
719	              July 2012.

721	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
722	              Requirement Levels", BCP 14, RFC 2119, March 1997.

724	10.2.  Informative References

726	   [I-D.lennox-mmusic-sdp-source-selection]
727	              Lennox, J. and H. Schulzrinne, "Mechanisms for Media
728	              Source Selection in the Session Description Protocol
729	              (SDP)", draft-lennox-mmusic-sdp-source-selection-04 (work
730	              in progress), March 2012.

732	   [I-D.westerlund-avtcore-rtp-simulcast]
733	              Westerlund, M., Burman, B., Lindqvist, M., and F. Jansson,
734	              "Using Simulcast in RTP sessions", draft-westerlund-
735	              avtcore-rtp-simulcast-04 (work in progress), July 2014.

737	   [I-D.westerlund-avtcore-rtp-topologies-update]
738	              Westerlund, M. and S. Wenger, "RTP Topologies", draft-
739	              westerlund-avtcore-rtp-topologies-update-01 (work in
740	              progress), October 2012.

742	   [I-D.westerlund-avtext-codec-operation-point]
743	              Westerlund, M., Burman, B., and L. Hamm, "Codec Operation
744	              Point RTCP Extension", draft-westerlund-avtext-codec-
745	              operation-point-00 (work in progress), March 2012.

747	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
748	              with Session Description Protocol (SDP)", RFC 3264, June
749	              2002.

751	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
752	              Jacobson, "RTP: A Transport Protocol for Real-Time
753	              Applications", STD 64, RFC 3550, July 2003.

755	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
756	              Description Protocol", RFC 4566, July 2006.

758	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
759	              Initiation Protocol (SIP) Event Package for Conference
760	              State", RFC 4575, August 2006.

762	   [RFC4796]  Hautakorpi, J. and G. Camarillo, "The Session Description
763	              Protocol (SDP) Content Attribute", RFC 4796, February
764	              2007.

766	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
767	              "Codec Control Messages in the RTP Audio-Visual Profile
768	              with Feedback (AVPF)", RFC 5104, February 2008.

770	   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
771	              January 2008.

773	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
774	              Header Extensions", RFC 5285, July 2008.

776	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
777	              Media Attributes in the Session Description Protocol
778	              (SDP)", RFC 5576, June 2009.

780	   [RFC6236]  Johansson, I. and K. Jung, "Negotiation of Generic Image
781	              Attributes in the Session Description Protocol (SDP)", RFC
782	              6236, May 2011.

784	   [RFC7205]  Romanow, A., Botzko, S., Duckworth, M., and R. Even, "Use
785	              Cases for Telepresence Multistreams", RFC 7205, April
786	              2014.

788	Authors' Addresses

790	   Roni Even
791	   Huawei Technologies
792	   Tel Aviv
793	   Israel

795	   Email: roni.even@mail01.huawei.com

797	   Jonathan Lennox
798	   Vidyo, Inc.
799	   433 Hackensack Avenue
800	   Seventh Floor
801	   Hackensack, NJ  07601
802	   US

804	   Email: jonathan@vidyo.com