idnits 2.17.1 

draft-even-clue-rtp-mapping-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 16, 2012) is 4296 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC5888' is defined on line 460, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6184' is defined on line 463, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6236' is defined on line 466, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-05

  == Outdated reference: A later version (-09) exists of
     draft-ietf-clue-telepresence-use-cases-02

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	CLUE WG                                                          R. Even
3	Internet-Draft                                       Huawei Technologies
4	Intended status: Standards Track                               J. Lennox
5	Expires: January 17, 2013                                          Vidyo
6	                                                           July 16, 2012

8	               Mapping RTP streams to CLUE media captures
9	                   draft-even-clue-rtp-mapping-03.txt

11	Abstract

13	   This document describes mechanisms and recommended practice for
14	   mapping RTP media streams defined in SDP to CLUE media captures.

16	Status of this Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on January 17, 2013.

33	Copyright Notice

35	   Copyright (c) 2012 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
51	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   3.  RTP topologies for CLUE  . . . . . . . . . . . . . . . . . . .  3
53	   4.  Mapping CLUE Media Captures to RTP streams . . . . . . . . . .  5
54	     4.1.  Static Mapping . . . . . . . . . . . . . . . . . . . . . .  6
55	     4.2.  Dynamic mapping  . . . . . . . . . . . . . . . . . . . . .  7
56	     4.3.  Recommendations  . . . . . . . . . . . . . . . . . . . . .  7
57	   5.  Application to CLUE Media Requirements . . . . . . . . . . . .  7
58	   6.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
59	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
60	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
61	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
62	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
63	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 10
64	     10.2. Informative References . . . . . . . . . . . . . . . . . . 10
65	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

67	1.  Introduction

69	   Telepresence systems can send and receive multiple media streams.
70	   The CLUE framework [I-D.ietf-clue-framework] defines media captures
71	   as a source of Media, such as from one or more Capture Devices.  A
72	   Media Capture (MC) may be the source of one or more Media streams.  A
73	   Media Capture may also be constructed from other Media streams.  A
74	   middle box can express Media Captures that it constructs from Media
75	   streams it receives.

77	   SIP offer answer [RFC3264] uses SDP [RFC4566] to describe the
78	   RTP[RFC3550] media streams.  Each RTP stream has a payload type
79	   number and SSRC.  The content of the RTP stream is created by the
80	   encoder in the endpoint.  This may be an original content from a
81	   camera or a content created by an intermediary device like an MCU.

83	   This document makes recommendations, for this telepresence
84	   architecture, about how RTP and RTCP streams should be encoded and
85	   transmitted, and how their relation to CLUE Media Captures should be
86	   communicated.  The proposed solution supports multiple RTP topologies

88	2.  Terminology

90	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
91	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
92	   document are to be interpreted as described in RFC2119[RFC2119] and
93	   indicate requirement levels for compliant RTP implementations.

95	3.  RTP topologies for CLUE

97	   The typical RTP topologies used by telepresence systems specify
98	   different behaviors for RTP and RTCP distribution.  The relevant
99	   topologies include point-to-point, as well as media mixers, media-
100	   switching mixers, and source-projection mixers.

102	   In the point-to-point topology, one peer communicates directly with a
103	   single peer over unicast.  There can be one or more RTP sessions, and
104	   each RTP session can carry multiple RTP streams identified by their
105	   SSRC.  All SSRCs will be recognized by the peers based on the
106	   information in the RTCP SDES report that will include the CNAME and
107	   SSRC of the sent RTP streams.  There are different use point to point
108	   ise cases as specified in CLUE use case
109	   [I-D.ietf-clue-telepresence-use-cases].  There may be a difference
110	   between the symmetric and asymmetric use cases.  While in the
111	   symmetric use case the typical mapping will be from a Media capture
112	   device to a render device (e.g. camera to monitor) in the asymmetric
113	   case the render device may receive different capture information (RTP
114	   stream from a different camera) if it has fewer rendering devices
115	   (monitors).  In some cases, a CLUE session which, at a high-level, is
116	   point-to-point may nonetheless have RTP which is best described by
117	   one of the mixer topologies below.  For example, a CLUE endpoint can
118	   produce composited or switched captures for use by a receiving system
119	   with fewer displays than the sender has cameras.

121	   In the Media Mixer topology, the peers communicate only with the
122	   mixer.  The mixer provides mixed or composited media streams, using
123	   its own SSRC for the sent streams.  There are two cases here.  In the
124	   first case the mixer may have separate RTP sessions with each peer
125	   (similar to the point to point topology) terminating the RTCP
126	   sessions on the mixer; this is known as Topo-RTCP-Terminating MCU in
127	   [RFC5117].  In the second case, the mixer can use a conference-wide
128	   RTP session similar to RFC 5117's Topo-mixer or Topo-Video-switching.
129	   The major difference is that for the second case, the mixer uses
130	   conference-wide RTP sessions, and distributes the RTCP reports to all
131	   the RTP session participants, enabling them to learn all the CNAMEs
132	   and SSRCs of the participants and know the contributing source or
133	   sources (CSRCs) of the original streams from the RTP header.  In the
134	   first case, the Mixer terminates the RTCP and the participants cannot
135	   know all the available sources based on the RTCP information.  The
136	   conference roster information including conference participants,
137	   endpoints, media and media-id (SSRC) can be available using the
138	   conference event package [RFC4575] element.

140	   In the Media-Switching Mixer topology, the peer to mixer
141	   communication is unicast with mixer RTCP feedback.  It is
142	   conceptually similar to a compositing mixer as described in the
143	   previous paragraph, except that rather than compositing or mixing
144	   multiple sources, the mixer provides one or more conceptual sources
145	   selecting one source at a time from the original sources.  The Mixer
146	   creates a conference-wide RTP session by sharing remote SSRC values
147	   as CSRCs to all conference participants.

149	   In the Source-Projection Mixer topology, the peer to mixer
150	   communication is unicast with RTCP mixer feedback.  Every potential
151	   sender in the conference has a source which is "projected" by the
152	   mixer into every other session in the conference; thus, every
153	   original source is maintained with an independent RTP identity to
154	   every receiver, maintaining separate decoding state and its original
155	   RTCP SDES information.  However, RTCP is terminated at the mixer,
156	   which might also perform reliability, repair, rate adaptation, or
157	   transcoding on the stream.  Senders' SSRCs may be renumbered by the
158	   mixer.  The sender may turn the projected sources on and off at any
159	   time, depending on which sources it thinks are most relevant for the
160	   receiver; this is the primary reason why this topology must act as an
161	   RTP mixer rather than as a translator, as otherwise these disabled
162	   sources would appear to have enormous packet loss.  Source switching
163	   is accomplished through this process of enabling and disabling
164	   projected sources, with the higher-level semantic assignment of
165	   reason for the RTP streams assigned externally.

167	   The above topologies demonstrate two major RTP/RTCP behaviors:

169	   1.  The mixer may either use the source SSRC when forwarding RTP
170	       packets, or use its own created SSRC.  Still the mixer will
171	       distribute all RTCP information to all participants creating
172	       conference-wide RTP session/s.  This allows the participants to
173	       learn the available RTP sources in each RTP session.  The
174	       original source information will be the SSRC or in the CSRC
175	       depending on the topology.  The point to point case behaves like
176	       this.

178	   2.  The mixer terminates the RTCP from the source, creating separate
179	       RTP sessions with the peers.  In this case the participants will
180	       not receive the source SSRC in the CSRC.  Since this is usually a
181	       mixer topology, the source information is available from the SIP
182	       conference event package [RFC4575] Subscribing to the conference
183	       event package allows each participant to know the SSRCs of all
184	       sources in the conference.

186	4.  Mapping CLUE Media Captures to RTP streams

188	   The different topologies described in Section 3 support different
189	   SSRC distribution models and RTP stream multiplexing points.

191	   Most video conferencing systems today can separate multiple RTP
192	   sources by placing them into separate RTP sessions using, the SDP
193	   description.  For example, main and slides video sources are
194	   separated into separate RTP sessions based on the content attribute
195	   [RFC4796].  This solution works straightforwardly if the multiplexing
196	   point is at the UDP transport level, where each RTP stream uses a
197	   separate RTP session.  This will also be true for mapping the RTP
198	   streams to Media Captures if each media capture uses a separate RTP
199	   session, and the consumer can identify it based on the receiving RTP
200	   port.  In this case, SDP only needs to label the RTP session with an
201	   identifier that identifies the media capture in the CLUE description.
202	   In this case, it does not change the mapping even if the RTP session
203	   is switched using same or different SSRC.  (The multiplexing is not
204	   at the SSRC level).

206	   Even though Session multiplexing is supported by CLUE, for scaling
207	   reasons, CLUE recommends using SSRC multiplexing in a single or
208	   multiple sessions.  So we need to look at how to map RTP streams to
209	   Media Capture IDs when SSRC multiplexing is used.

211	   When looking at SSRC multiplexing we can see that in various
212	   topologies, the SSRC behavior may be different:

214	   1.  The SSRCs are static (assigned by the MCU/Mixer), and there is an
215	       SSRC for each media capture encoding defined in the CLUE
216	       protocol.  Source information may be conveyed using CSRC, or, in
217	       the case of topo-RTCP-Terminating MCU, is not conveyed.

219	   2.  The SSRCs are dynamic, representing the original source and are
220	       relayed by the Mixer/MCU to the participants.

222	   In the above two cases the MCU/Mixer creates its own advertisement,
223	   with a virtual room capture scene.

225	   Another case we can envision is that the MCU / Mixer relays all the
226	   capture scenes from all advertisements to all consumers.  This means
227	   that the advertisement will include multiple capture scenes, each
228	   representing a separate TP room with its own coordinate system.  A
229	   general tools for distributing roster information is by using an
230	   event package, for example by extending the conference event package.

232	4.1.  Static Mapping

234	   Static mapping is widely used in current MCU implementations.  It is
235	   also common for a point to point symmetric use case when both
236	   endpoints have the same capabilities.  For capture encodings with
237	   static SSRCs, it is most straightforward to indicate this mapping
238	   outside the media stream, in the CLUE or SDP signaling.  An SDP
239	   source attribute [RFC5576] could be defined to associate CLUE capture
240	   IDs with SSRCs in SDP.  Each SSRC will have a captureID value that
241	   will be specified also in the CLUE media capture as an attribute.
242	   The provider advertisement could, if it wished, use the same SSRC for
243	   media capture encodings that are mutually exclusive.  (This would be
244	   natural, for example, if two advertised captures are implemented as
245	   different configurations of the same physical camera, zoomed in or
246	   out.)

248	   Note: there may be more than one RTP session for a media capture like
249	   in simulcast.  We still need to figure out how to describe it in SDP
250	   and CLUE.

252	   Another method for static mapping may be to use the provider
253	   advertisement could to indicate the intended SSRC directly.  The
254	   advnatge of using the SDP SSRC attribute is that RFC5576 [RFC5576]
255	   the issue of SSRC collision and provide guideline how to address
256	   them.

258	4.2.  Dynamic mapping

260	   Dynamic mapping using RTP header extension is described in
261	   draft-lennox-clue-rtp-usage [I-D.lennox-clue-rtp-usage]section 10.2.
262	   The draft does not specify what is the capture id value.  As
263	   specified for the static case there should be a capture id attribute
264	   in the CLUE media capture information to enable this mapping.

266	4.3.  Recommendations

268	   The recommendation is that endpoints MUST support both the static
269	   declaration of capture encoding SSRCs, and the RTP header extension
270	   method of sharing capture IDs, with the extension in every media
271	   packet.  For low bandwidth situations, this may be considered
272	   excessive overhead; in which case endpoints MAY support the combined
273	   approach from [I-D.lennox-clue-rtp-usage].  The SDP offer MAY specify
274	   the SSRC mapping to media capture.  In the case of static mapping
275	   topologies there will be no need to use the header extensions in the
276	   media, since the SSRC for the RTP stream will remain the same during
277	   the call unless a collision is detected and handled according to
278	   RFC5576 [RFC5576].  If the used topology uses dynamic mapping then
279	   the RTP header extension will be used to indicate the RTP stream
280	   switch for the media capture.  In this case the SDP description may
281	   be used to negotiate the initial SSRC but this will be left for the
282	   implementation.  Note that if the SSRC is defined explicitly in the
283	   SDP the SSRC collision should be handled as in RFC5576.

285	5.   Application to CLUE Media Requirements

287	   [I-D.lennox-clue-rtp-usage] offers a number of requirements that are
288	   believed to be necessary for a CLUE RTP mapping.  The solutions
289	   described in this document are believed to meet that requirement,
290	   though some of them are only possible for some of the topologies.
291	   (Since the requirements are generally of the form "it must be
292	   possible for a sender to do something", this is adequate; a sender
293	   which wishes to perform that action needs to choose a topology which
294	   allows the behavior it wants.

296	   In this section we address only those requirements where the
297	   topologies or the association mechanisms treat the requirements
298	   differently.

300	   Media-4: It must be possible for an original source to move among
301	   switched captures (i.e. at one time be sent for one switched capture,
302	   and at a later time be sent for another one).

304	   This applies naturally for static sources with a Switched Mixer.  For
305	   dynamic sources with a Source-Projecting Mixer, this just requires
306	   the capture tag in the header extension element to be updated
307	   appropriately.

309	   Media-6: Whenever a given source is transmitted for a switched
310	   capture, it must be immediately possible for a receiver to determine
311	   the switched capture it corresponds to, and thus that any previous
312	   source is no longer being mapped to that switched capture.

314	   For a Switched Mixer, this applies naturally.  For a Source-
315	   Projecting mixer, this is done based on the header extension.

317	   Media-7: It must be possible for a receiver to identify the original
318	   source that is currently being mapped to a switched capture, and
319	   correlate it with out-of-band (non-Clue) information such as rosters.

321	   For a Switched Mixer, this is done based on the CSRC, if the mixer is
322	   providing CSRCs; if for a Source-Projecting Mixer, this is done based
323	   on the SSRC.

325	   Media-8: It must be possible for a source to move among switched
326	   captures without requiring a refresh of decoder state (e.g., for
327	   video, a fresh I-frame), when this is unnecessary.  However, it must
328	   also be possible for a receiver to indicate when a refresh of decoder
329	   state is in fact necessary.

331	   This can be done by a Source-Projecting Mixer, but not by a Switching
332	   Mixer.  The last requirement can be accomplished through an FIR
333	   message [RFC5104], though potentially a faster mechanism (not
334	   requiring a round-trip time from the receiver) would be preferable.

336	   Media-9: If a given source is being sent on the same transport flow
337	   to satisfy more than one capture (e.g. if it corresponds to more than
338	   one switched capture at once, or to a static capture as well as a
339	   switched capture), it should be possible for a sender to send only
340	   one copy of the source.

342	   For a Source-Projecting Mixer, this can be accomplished by sending
343	   multiple dynamic capture IDs for the same source; this can also be
344	   done for an environment with a hybrid of mixer topologies and static
345	   and dynamic captures, described below in Section 6.  It is not
346	   possible for static captures from a Switched Mixer.

348	   Media-12: If multiple sources from a single synchronization context
349	   are being sent simultaneously, it must be possible for a receiver to
350	   associate and synchronize them properly, even for sources that are
351	   are mapped to switched captures.

353	   For a Mixed or Switched Mixer topology, receivers will see only a
354	   single synchronization context (CNAME), corresponding to the mixer.
355	   For a Source-Projecting Mixer, separate projecting sources keep
356	   separate synchronization contexts based on their original CNAMEs,
357	   thus allowing independent synchronization of sources from independent
358	   rooms without needing global synchronization.  In hybrid cases,
359	   however (e.g. if audio is mixed), all sources which need to be
360	   synchronized with the mixed audio must get the same CNAME (and thus a
361	   mixer-provided timebase) as the mixed audio.

363	6.  Examples

365	   It is possible for a CLUE device to send multiple instances of the
366	   topologies in Section 3 simultaneously.  For example, an MCU which
367	   uses a traditional audio bridge with switched video would be a Mixer
368	   topology for audio, but a Switched Mixer or a Source-Projecting Mixer
369	   for video.  In the latter case, the audio could be sent as a static
370	   source, whereas the video could be dynamic.

372	   More notably, it is possible for an endpoint to send the same sources
373	   both for static and dynamic captures.  Consider the example in
374	   Section 11.1 of [I-D.ietf-clue-framework], where an endpoint can
375	   provide both three cameras (VC0, VC1, and VC2) for left, center, and
376	   right views, as well as a switched view (VC3) of the loudest panel.

378	   It is possible for a consumer to request both the (VC0 - VC2) set and
379	   VC3.  It is worth noting that the content of VC3 is, at all times,
380	   exactly the content of one of VC0, VC1, or VC2.  Thus, if the sender
381	   uses the Source-Selection Mixer topology for VC3, the consumer that
382	   receives these three sources would not need to send any additional
383	   media traffic over just sending (VC0 - VC2).

385	   In this case, the advertiser could describe VC0, VC1, and VC2 in its
386	   initial advertisement or SDP with static SSRCs, whereas VC3 would
387	   need to be dynamic.  The role of VC3 would move among VC0, VC1, or
388	   VC2, indicated by the RTP header extension on those streams' RTP
389	   packets.

391	7.  Acknowledgements

393	   place holder

395	8.  IANA Considerations

397	   TBD

399	9.  Security Considerations

401	   TBD.

403	10.  References

405	10.1.  Normative References

407	   [I-D.ietf-clue-framework]
408	              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
409	              "Framework for Telepresence Multi-Streams",
410	              draft-ietf-clue-framework-05 (work in progress),
411	              February 2012.

413	   [I-D.lennox-clue-rtp-usage]
414	              Lennox, J., Witty, P., and A. Romanow, "Real-Time
415	              Transport Protocol (RTP) Usage for Telepresence Sessions",
416	              draft-lennox-clue-rtp-usage-04 (work in progress),
417	              June 2012.

419	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
420	              Requirement Levels", BCP 14, RFC 2119, March 1997.

422	10.2.  Informative References

424	   [I-D.ietf-clue-telepresence-use-cases]
425	              Romanow, A., Botzko, S., Duckworth, M., Even, R., and I.
426	              Communications, "Use Cases for Telepresence Multi-
427	              streams", draft-ietf-clue-telepresence-use-cases-02 (work
428	              in progress), January 2012.

430	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
431	              with Session Description Protocol (SDP)", RFC 3264,
432	              June 2002.

434	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
435	              Jacobson, "RTP: A Transport Protocol for Real-Time
436	              Applications", STD 64, RFC 3550, July 2003.

438	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
439	              Description Protocol", RFC 4566, July 2006.

441	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
442	              Initiation Protocol (SIP) Event Package for Conference
443	              State", RFC 4575, August 2006.

445	   [RFC4796]  Hautakorpi, J. and G. Camarillo, "The Session Description
446	              Protocol (SDP) Content Attribute", RFC 4796,
447	              February 2007.

449	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
450	              "Codec Control Messages in the RTP Audio-Visual Profile
451	              with Feedback (AVPF)", RFC 5104, February 2008.

453	   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
454	              January 2008.

456	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
457	              Media Attributes in the Session Description Protocol
458	              (SDP)", RFC 5576, June 2009.

460	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
461	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

463	   [RFC6184]  Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP
464	              Payload Format for H.264 Video", RFC 6184, May 2011.

466	   [RFC6236]  Johansson, I. and K. Jung, "Negotiation of Generic Image
467	              Attributes in the Session Description Protocol (SDP)",
468	              RFC 6236, May 2011.

470	Authors' Addresses

472	   Roni Even
473	   Huawei Technologies
474	   Tel Aviv,
475	   Israel

477	   Email: even.roni@huawei.com

479	   Jonathan Lennox
480	   Vidyo, Inc.
481	   433 Hackensack Avenue
482	   Seventh Floor
483	   Hackensack, NJ  07601
484	   US

486	   Email: jonathan@vidyo.com