idnits 2.17.1 

draft-schierl-avt-rtp-multi-session-transmission-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 941.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 952.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 959.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 965.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 829 has weird spacing: '...channel  audio...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 27, 2008) is 5657 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-03) exists of
     draft-ietf-avt-rtp-mps-01

  == Outdated reference: A later version (-27) exists of
     draft-ietf-avt-rtp-svc-14

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mmusic-decoding-dependency-04

  == Outdated reference: A later version (-02) exists of
     draft-ietf-mmusic-sdp-source-attributes-01

  == Outdated reference: A later version (-05) exists of
     draft-wang-avt-rtp-mvc-02

  -- Obsolete informational reference (is this intentional?): RFC 3388
     (Obsoleted by RFC 5888)

  -- Obsolete informational reference (is this intentional?): RFC 3984
     (Obsoleted by RFC 6184)

  -- Obsolete informational reference (is this intentional?): RFC 4566
     (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	AVT                                                           T. Schierl
3	Internet-Draft                                            Fraunhofer HHI
4	Intended status: Informational                                 J. Lennox
5	Expires: April 30, 2009                                            Vidyo
6	                                                        October 27, 2008

8	 Multi-Session and Multi-Source Transmission in the Real-Time Transport
9	                             Protocol (RTP)
10	          draft-schierl-avt-rtp-multi-session-transmission-00

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on April 30, 2009.

37	Abstract

39	   In this draft, we discuss problems related to multi-session and
40	   multi-source transmission using the Real-Time Transport Protocol
41	   (RTP).  Most of the input to this draft is taken from email
42	   discussion.  Multi-session and multi-source transmission is motivated
43	   by media data which allows for different transport layer treatment of
44	   parts of the media.  This is typically the case for layered media.
45	   Multi-session transmission is when media data from a single media
46	   source is split over multiple RTP sessions.  Single-session multi-
47	   source transmission (from now on just called "multi-source
48	   transmission") is when data from a single media source is sent as
49	   several RTP streams in the same RTP session.  The main problems
50	   discussed are the mechanisms used for data alignment and source
51	   correlation.  This draft gives further an overview of payload formats
52	   using multi-sessions/multi-source transmission and highlights other
53	   transport related issues.  The draft concludes with recommendations
54	   for the discussed problems.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
60	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  5
61	   4.  Existing Users of Multi-Session and Multi-Source
62	       Transmission . . . . . . . . . . . . . . . . . . . . . . . . .  5
63	     4.1.  Progressive Video with Hybrid (PVH)  . . . . . . . . . . .  5
64	     4.2.  H.264 Scalable Video Coding (SVC)  . . . . . . . . . . . .  6
65	     4.3.  H.264 Multi-View Coding (MVC)  . . . . . . . . . . . . . .  6
66	     4.4.  G.718: Embedded Variable Bit-Rate (EV-VBR)
67	           Speech/Audio Codec . . . . . . . . . . . . . . . . . . . .  6
68	     4.5.  MPEG Surround  . . . . . . . . . . . . . . . . . . . . . .  7
69	     4.6.  RTP Forward Error Correction . . . . . . . . . . . . . . .  7
70	     4.7.  RTP Retransmission . . . . . . . . . . . . . . . . . . . .  7
71	   5.  Topology Overview  . . . . . . . . . . . . . . . . . . . . . .  8
72	   6.  Requirements for multi-session transmission  . . . . . . . . .  8
73	     6.1.  Requirements on Data Alignment . . . . . . . . . . . . . .  8
74	     6.2.  Requirements on Source Correlation . . . . . . . . . . . .  9
75	   7.  Review of techniques for Data Alignment  . . . . . . . . . . .  9
76	     7.1.  NTP Timestamp Alignment using RTCP Sender Report (SR)
77	           Packets  . . . . . . . . . . . . . . . . . . . . . . . . .  9
78	       7.1.1.  Identified problems  . . . . . . . . . . . . . . . . . 10
79	     7.2.  Review of other potential techniques for Data Alignment  . 12
80	       7.2.1.  RTP Timestamp Alignment  . . . . . . . . . . . . . . . 12
81	       7.2.2.  Initial RTP Timestamp or RTP Timestamp Offset
82	               Signaling  . . . . . . . . . . . . . . . . . . . . . . 12
83	       7.2.3.  CCM message - need NTP update  . . . . . . . . . . . . 13
84	       7.2.4.  Multiple early RTCP SRs  . . . . . . . . . . . . . . . 13
85	       7.2.5.  Codec-Specific Mechanisms  . . . . . . . . . . . . . . 13
86	       7.2.6.  RTP header extension . . . . . . . . . . . . . . . . . 14
87	   8.  Review of techniques for Source Correlation  . . . . . . . . . 14
88	     8.1.  Source Correlation using CNAME in SDES . . . . . . . . . . 14
89	     8.2.  Review of other potential techniques for Source
90	           Correlation  . . . . . . . . . . . . . . . . . . . . . . . 15
91	       8.2.1.  Single SSRC Space  . . . . . . . . . . . . . . . . . . 15
92	       8.2.2.  SSRC Groups  . . . . . . . . . . . . . . . . . . . . . 15
93	       8.2.3.  CNAME in Source Attributes . . . . . . . . . . . . . . 16
94	       8.2.4.  Application-specific Inference of Association  . . . . 16
95	   9.  Summary of RTP solution for Data Alignment and Source
96	       Correlation  . . . . . . . . . . . . . . . . . . . . . . . . . 16
97	     9.1.  Data Alignment in RTP  . . . . . . . . . . . . . . . . . . 16
98	     9.2.  Source Correlation in RTP  . . . . . . . . . . . . . . . . 16
99	     9.3.  Dependency signaling . . . . . . . . . . . . . . . . . . . 17
100	   10. Recommendations  . . . . . . . . . . . . . . . . . . . . . . . 17
101	   11. Other transport related issues for multi-session
102	       transmission . . . . . . . . . . . . . . . . . . . . . . . . . 18
103	     11.1. Inter-session Jitter . . . . . . . . . . . . . . . . . . . 18
104	     11.2. Inter-session Interleaving . . . . . . . . . . . . . . . . 18
105	   12. Security Considerations  . . . . . . . . . . . . . . . . . . . 18
106	   13. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
107	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
108	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 18
109	     14.2. Informative References . . . . . . . . . . . . . . . . . . 19
110	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 20
111	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
112	   Intellectual Property and Copyright Statements . . . . . . . . . . 22

114	1.  Introduction

116	   Multi-session transmission is when media data from a single media
117	   source is split over multiple Real-Time Transport Protocol (RTP)
118	   [RFC3550] sessions.  This is usually done because different transport
119	   layer treatment is desired for different aspects of the media source,
120	   e.g., different multicast groups or different traffic classes.  If
121	   the traffic is being sent using multicast routing, this is often
122	   known as "layered multicast."

124	   Single-session multi-source transmission (from now on just called
125	   "multi-source transmission") is when data from a single media source
126	   is sent as several RTP streams in the same RTP session.  In this
127	   case, the streams need to be treated differently by RTP (e.g. with
128	   separate RTCP statistics, or selective forwarding by RTP translators)
129	   but do not need different transport characteristics.  This is often
130	   referred to as "SSRC multiplexing", after the synchronization source
131	   identifier (SSRC) which distinguishes sources in an RTP session.

133	   Such techniques are often used for "layered" or "embedded" codecs
134	   (the former term is typically used for video, the latter for audio).
135	   A lower-bitrate, and often lower-complexity, stream (known as the
136	   "base"), often backward-compatible with older codecs, provides basic
137	   media quality, while one or more additional streams (known as
138	   "enhancements") provide richer media or otherwise provide an enhanced
139	   user experience.  Various layered and embedded codecs are discussed
140	   in Section 4.

142	   Multi-session and multi-source transmission are also used for stream
143	   robustness.  Both RTP Forward Error Correction [RFC5109] and RTP
144	   Retransmission [RFC4588] use multi-session transmission, and the
145	   latter can optionally use multi-source transmission as well.

147	   For both multi-session and multi-source transmission, two issues
148	   arise: how streams are correlated, i.e. how receivers determine which
149	   base and enhancement streams carry data for the same media source;
150	   and how streams are aligned, i.e. how receivers determine which
151	   packets of the base stream are associated with which packets of the
152	   enhancement stream.

154	2.  Definitions

156	   multi-session transmission:  In multi-session transmission, media
157	      data from a single media source is split over multiple RTP
158	      sessions.  The term "layered multicast" is equivalent to multi-
159	      session transmission for sessions using multicast addresses.

161	   multi-source transmission:  In multi-source transmission, data from a
162	      single media source is sent as several RTP streams in the same RTP
163	      session.  The sources contained in an RTP session are identified
164	      by their synchronization source identifiers (SSRCs) or, if
165	      combined by a RTP mixer, by their contributing source identifiers
166	      (CSRCs), as defined in RTP [RFC3550].
167	   associated multimedia streams:  Associated multimedia streams are
168	      independent media sources from the same session participant, e.g.
169	      audio and video sources, or multiple cameras from a single
170	      participant.  Each source can have an independent media clock,
171	      reflecting the device that captured the media.  For live media,
172	      these clocks will often drift relative to each other, over and
173	      above their often inherently-different clock rates.  In RTP, each
174	      stream has separate initial RTP timestamps and sequence numbers.
175	      Related sources are associated using the RTCP Canonical Name
176	      (CNAME) Source Description (SDES) field.  A common time base may
177	      be computed using NTP timestamps, based on information carried in
178	      RTCP Sender Report (SR) packets.  The sources are typically
179	      synchronized ("lip-synced") by receivers when rendered, based on
180	      the computed NTP timestamps.
181	   Data Alignment:  Assembling data of the same media frame which is
182	      transferred in different sessions or as different sources in the
183	      same session as part of a layered media.  The assembly of the
184	      media frame must be achieved before decoding, otherwise the
185	      decoding process typically fails or may be only possible at a
186	      reduced quality.
187	   Source Correlation:  The logical association of RTP streams
188	      transferred as multiple separate sessions or as multiple sources
189	      in the same session to one layered media.

191	3.  Terminology

193	   "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
194	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
195	   document are to be interpreted as described in RFC 2119 [RFC2119].

197	4.  Existing Users of Multi-Session and Multi-Source Transmission

199	4.1.  Progressive Video with Hybrid (PVH)

201	   Progressive Video with Hybrid transform (PVH) [McCa96] was used in
202	   the initial demonstration of multi-session transmission.  PVH was the
203	   initial driver for adding text on layered multicast to the Real-Time
204	   Transport Protocol (RTP) [RFC3550].  Data Alignment was done using
205	   packets' RTP timestamps.

207	4.2.  H.264 Scalable Video Coding (SVC)

209	   H.264 Scalable Video Coding (SVC) [I-D.ietf-avt-rtp-svc] extends the
210	   H.264 [RFC3984] video standard to provide spatial, temporal, and
211	   quality (signal-to-noise) enhancements.  The base layer of SVC is
212	   backward-compatible with existing H.264 decoders.  A base layer sent
213	   separately using the H.264 [RFC3984] payload format can be received
214	   and processed by existing devices.  The Payload Format for SVC uses
215	   the multi-session transmission approach.  Currently two basic modes
216	   are defined in the SVC Payload Format for decoding order recovery of
217	   media data received from multiple sessions:
218	   Data Alignment based on NTP timestamps:  This method is used in the
219	      NI-T and NI-TC mode defined in [I-D.ietf-avt-rtp-svc].  These
220	      modes currently rely on exact NTP timestamp alignment in order to
221	      recover the decoding order.
222	   Cross-Session Decoding Order Number (CS-DON):  This method is used in
223	      the NI-C, NI-TC and I-C modes defined in [I-D.ietf-avt-rtp-svc].
224	      These modes rely on a number (CS-DON) which is associated to
225	      packets indicating the decoding order across sessions.

227	4.3.  H.264 Multi-View Coding (MVC)

229	   H.264 Multi View Coding (MVC) [I-D.wang-avt-rtp-mvc] extends the
230	   H.264 [RFC3984] video standard to provide multiple views of a video
231	   stream, for multi view and 3D applications.  MVC is similarly to SVC
232	   an extension of H.264 and has a backward compatible base view, which
233	   can be also decoded by existing H.264 receivers.  Thus it is possible
234	   to provide the base view of a multi sessions transmission in a
235	   compatible way using the H.264 [RFC3984] as Payload Format.  Since
236	   the new coding approach is mainly based on exploiting temporal
237	   references to other frames of the same view or different views, there
238	   is not always the need to receive the base view in order to decode a
239	   desired view.  The payload format will rely on the same approaches as
240	   defined in the RTP Payload Format for SVC video
241	   [I-D.ietf-avt-rtp-svc] for decoding order recovery when receiving
242	   data from multiple sessions.

244	4.4.  G.718: Embedded Variable Bit-Rate (EV-VBR) Speech/Audio Codec

246	   G.718, the Embedded Variable Bit-Rate (EV-VBR) speech/audio codec
247	   [I-D.lakaniemi-avt-rtp-evbr] provides an embedded speech-rate
248	   encoder.  This codec also allows for multi-session transmission.  The
249	   current draft mandates RTP SR for Data Alignment in multi-session
250	   transmission.

252	4.5.  MPEG Surround

254	   MPEG Surround (Spatial Audio Coding, SAC) [I-D.ietf-avt-rtp-mps]
255	   enhances MPEG two-channel audio with multi-channel surround sound
256	   while maintaining backward compatibility with two-channel receivers.
257	   The payload relies on NTP timestamp alignment for multi-session
258	   transmission.  The audio codec typically has different sampling rates
259	   for base and enhancements.

261	4.6.  RTP Forward Error Correction

263	   RTP Generic Forward Error Correction [RFC5109] allows a supplemental
264	   stream to provide additional data for recovery from packet loss using
265	   a separate session for transmitting the FEC stream.  The repair
266	   stream is typically sent as a separate RTP session.  A special case
267	   is when the FEC stream is being sent as a secondary codec in the
268	   redundant encoding format.  In this case the FEC stream is sent as a
269	   separate source in the same session as the redundant codec.  Data
270	   Alignment is achieved using sequence numbers of the FEC protected
271	   packets.

273	   FEC Grouping Issues in Session Description Protocol
274	   [I-D.begen-mmusic-fec-grouping-issues] describes a grouping framework
275	   for FEC and media streams based on the Grouping of Media Lines in the
276	   Session Description Protocol (SDP) [RFC3388] framework.  The
277	   framework relies on transmitting the FEC streams in separate
278	   sessions.  Data Alignment is achieved by the FEC Framework and relies
279	   on the used FEC scheme, i.e. there is a specific solution for
280	   associating data of the protected and the protecting packet stream.

282	4.7.  RTP Retransmission

284	   RTP Retransmission [RFC4588] allows senders to retransmit RTP packets
285	   indicated by the receiver as lost.  The re-sent packets are
286	   transported in a separate stream and may be transmitted within a
287	   separate RTP session or may be transmitted as a separate source in
288	   the same session as the media stream.

290	   If multi-source (i.e., single-session) transmission is being used,
291	   retransmitted packets are sent with a different SSRC.  Source
292	   association in this case done by sources' CNAMEs, with the further
293	   requirement that a receiver MUST NOT have two outstanding requests
294	   for the same packet sequence number in two different original streams
295	   before the association is resolved.

297	5.  Topology Overview

299	   A number of different RTP Topologies [RFC5117] are relevant for
300	   consideration for multi-source and multi-session transmission.

302	   [Ed.  TBD: more text on the relation between the approaches presented
303	   in the memo and the mentioned topologies.]

305	   o  Point-to-point - Two endpoints communicating using unicast.
306	   o  Point-to-multipoint via multicast - Using a multicast transport
307	      mechnisms to send packets of one participant to all the other
308	      participants in the multicast group.
309	   o  Point-to-multipoint via RTP translator - Using [RFC3550]
310	      translators to send packets of one participant to other
311	      participants of a group.  Packets of one or more participants may
312	      be forwarded to the group.
313	   o  Point-to-multipoint via RTP mixer - Using [RFC3550] mixers to send
314	      packets of one participant to other participants of a group.
315	      Packets of one or more participants may be forwarded to the group.
316	   o  Point-to-multipoint via Video Switching MCUs - Allows for sending
317	      packets from one participant to the other participants in a group.
318	      But typically only one participant's video data is forwarded at a
319	      time to the other participants.
320	   o  Point-to-multipoint via RTCP-terminating MCUs - Each participant
321	      is running a point-to-point session with the MCU.  Typically, only
322	      one participant's video data is forwarded at a time to the other
323	      participants.
324	   o  Point-to-multipoint without a feedback channel - These channels
325	      typically provide IP multicast over a broadcast transmission
326	      medium, which naturally do not provide a bi-directional channel.
327	      This is the case, e.g. for DVB channels using IP over MPE over
328	      MPEG-2 Transport Stream as for DVB-H or the emerging DVB-SH.

330	6.  Requirements for multi-session transmission

332	6.1.  Requirements on Data Alignment

334	   Synchronization of media streams received from multiple sessions is
335	   typically used for lip-synchronization of audio and video data.  For
336	   this case, RTP provides a strong tool, which is the presence of (RTP)
337	   timestamps for each media frame, generated from individual clocks for
338	   each session.  Additionally, RTCP Sender Report packets are sent
339	   periodically in each session containing (NTP) timestamps from a
340	   wallclock common across all of the sessions, plus a reference to the
341	   corresponding (RTP) timestamp that would be generated for a media
342	   frame with the signaled wallclock time.  The interval between
343	   transmission of RTCP SRs is typically in the range of multiple
344	   seconds.  For a more detailed review of RTP synchronization
345	   techniques, see Section 7.1.

347	   For the reception of layered media, either on multiple sessions or as
348	   multiple sources, it is absolutely essential to allow for immediate
349	   Data Alignment.  That is, the Data Alignment must be applied before
350	   the decoding process of the layered media.  If Data Alignment is not
351	   applied before decoding, the decoder may not be able to decode the
352	   media at all, or may only be able to produce a media representation
353	   at reduced quality.

355	6.2.  Requirements on Source Correlation

357	   For the reception of layered media, whether on multiple sessions or
358	   as multiple sources, it is absolutely essential to find out prior to
359	   decoding which sessions and sources are correlated.  That is, the
360	   receiver needs to know, prior to Data Alignment and decoding, the
361	   inter-session and the inter-source dependency.  Notably, for cases in
362	   which multiple independent media sources are transmitted as layered
363	   media in the same session or set of sessions, miscorrelation of
364	   sources could lead to a decoder attempting to use one source's base
365	   layer with another source's enhancement layer.

367	7.  Review of techniques for Data Alignment

369	7.1.  NTP Timestamp Alignment using RTCP Sender Report (SR) Packets

371	   The inter-media synchronization mechanism defined in [RFC3550] uses
372	   RTP timestamps in the RTP packets and a combination of RTP timestamp
373	   and NTP wallclock carried in the RTCP Sender Report (SR) packets.
374	   The RTCP SR packet contains a RTP timestamp in the media timescale
375	   and as reference to an absolute wallclock time the NTP timestamp.
376	   The definitions for timestamp generation and synchronization in
377	   section 5.1 and 6.4.1 of [RFC3550] are summarized in the following
378	   list:

380	   o  The timestamp reflects the sampling instant of the first octet in
381	      the RTP data packet.
382	   o  The sampling instant MUST be derived from a clock that increments
383	      monotonically and linearly in time to allow synchronization and
384	      jitter calculations (see Section 6.4.1).
385	   o  The resolution of the clock MUST be sufficient for the desired
386	      synchronization accuracy and for measuring packet arrival jitter
387	      (one tick per video frame is typically not sufficient).
388	   o  If RTP packets are generated periodically, the nominal sampling
389	      instant as determined from the sampling clock is to be used, not a
390	      reading of the system clock.

392	   o  RTP timestamps from different media streams may advance at
393	      different rates and usually have independent, random offsets.
394	      Therefore, although these timestamps are sufficient to reconstruct
395	      the timing of a single stream, directly comparing RTP timestamps
396	      from different media is not effective for synchronization.
397	      Instead, for each medium the RTP timestamp is related to the
398	      sampling instant by pairing it with a timestamp from a reference
399	      clock (wallclock) that represents the time when the data
400	      corresponding to the RTP timestamp was sampled..
401	   o  Receivers should expect that the measurement accuracy of the
402	      timestamp may be limited to far less than the resolution of the
403	      NTP timestamp.
404	   o  On a system that has no notion of wallclock time but does have
405	      some system-specific clock such as "system uptime", a sender MAY
406	      use that clock as a reference to calculate relative NTP
407	      timestamps.
408	   o  It is important to choose a commonly used clock so that if
409	      separate implementations are used to produce the individual
410	      streams of a multimedia session, all implementations will use the
411	      same clock.
412	   o  [Ed. : The RTP timestamp in the SR] corresponds to the same time
413	      as the NTP timestamp (above), but in the same units and with the
414	      same random offset as the RTP timestamps in data packets.
415	   o  This correspondence may be used for intra- and inter-media
416	      synchronization for sources whose NTP timestamps are synchronized,
417	      and may be used by media-independent receivers to estimate the
418	      nominal RTP clock frequency.
419	   o  Rather, it MUST be calculated from the corresponding NTP timestamp
420	      using the relationship between the RTP timestamp counter and real
421	      time as maintained by periodically checking the wallclock time at
422	      a sampling instant.

424	   To summarize, the definitions in [RFC3550]: the RTCP SR is used for
425	   deriving the media timestamp using the RTP timestamp and the NTP
426	   wallclock.  If this synchronization mechanism is correctly
427	   implemented and there is no clock jitter in neither the media clock
428	   nor in the clock thus it can be always guaranteed, that a RTP
429	   timestamp and its NTP wallclock timestamp are perfectly aligned, the
430	   RTP approach should work fine for Data Alignment.  [Ed. : need more
431	   text for summary / review of text above ]

433	7.1.1.  Identified problems

435	7.1.1.1.  Synchronization Delay

437	   Since [RFC3550] mandates RTCP SRs to be sent in intervals of multiple
438	   seconds, Data Alignment based on this information may introduce a
439	   delay to this process, which may lead to delayed tune-in for the
440	   decoding process.  This is typically not the case for decoding media
441	   transferred in exactly one session and source, since synchronization
442	   is not required for decoding, but only for playout.  A delay for
443	   playout or lip synchronization does not usually pose a fundamental
444	   problem.

446	7.1.1.2.  Losing synchronization information

448	   The loss of RTCP SR packets may introduce additional delay to the
449	   Data Alignment process, thus a more robust mechanism would be
450	   desirable.

452	7.1.1.3.  Clock Skew

454	   Clock skew between the NTP/system clock and the media clock will
455	   affect the NTP media timestamp generation derived from RTCP SRs and
456	   RTP timestamps.  That typically results in different NTP timestamps
457	   for packets of the same media frame transmitted in the different
458	   sessions or transferred as different sources, and leads to
459	   misalignment for the Data Alignment.  As far as we know, there is no
460	   way to always guarantee the presence of perfect clocks for media and
461	   NTP/system clock.  From the standardization point of view this may
462	   seem to be an implementation issue.  However, if this implementation
463	   issue puts a burden on the senders like the presence of a perfect
464	   clocks for generating timestamps, this issue needs to be solved in an
465	   easy and general way.

467	   Following the RTP philosophy, clock skew can be estimated by
468	   observing several RTCP SRs.  The receiver may use the observation to
469	   compensate for the clock skew.  However, this is only possible if
470	   there is no requirement for immediate synchronization of the sort
471	   which is essential for Data Alignment of layered codecs.

473	   The case of clock skew between in media and NTP/system clocks may be
474	   overcome by using the same clock instance, e.g. the system clock, for
475	   RTP as well as NTP timestamp generation.  However, this is not
476	   compliant with RTP, since [RFC3550] mandates the use of a media clock
477	   which is different from the system clock (see definitions in RTP as
478	   cited above in Section 7.1).  Indeed, for many codecs, notably audio,
479	   correct decoding requires that the timestamp difference between
480	   subsequent frames exactly correspond to the amount of data sent in
481	   each frame.

483	7.1.1.4.  Accuracy of clocks

485	   Assuming that we have clocks without skew, there is still the
486	   question of accuracy of the clock used for generating the timestamps.
487	   Notably, the Windows system clock is only updated on each system
488	   clock tick, typically every 10 or 15 milliseconds on Windows XP and
489	   Vista.  RTP says that a receiver should not make any assumption on
490	   this, but an implementation which may have to cope with rounding done
491	   in the low-order microsecond cannot simply compare two NTP timestamps
492	   for being identical.  An application may have to compare "ranges" of
493	   timestamps in order to get rid of rounding problems.  However, in
494	   some cases the ranges of NTP timestamps required may indeed be
495	   greater than the time interval between consecutive media frames.

497	7.1.1.5.  Existing RTCP SR implementations

499	   As far as we know, existing RTCP SR implementations show a wide range
500	   of alignment problems for generating exact NTP media timestamps for
501	   Data Alignment.  NTP alignment issues can be modeled for existing
502	   RTCP senders by capturing an NTP and RTP timestamps in consecutive SR
503	   packets, projecting the NTP timestamp in one SR packet based on the
504	   RTP timestamp in that SR packet, the NTP and RTP timestamps in the
505	   previous SR packet, and the codec's nominal clock rate.  Initial
506	   experiments have shown NTP timestamp alignment problems on the order
507	   of 40-50 milliseconds for several implementations.

509	7.2.  Review of other potential techniques for Data Alignment

511	7.2.1.  RTP Timestamp Alignment

513	   The idea here is to signal the same RTP timestamp for packets
514	   containing data of the same media time instance in the different
515	   sessions.  That is the same clock would have to be used for the
516	   multiple sessions and the same RTP random offset would have to be
517	   used.  This method is backward compatible with using NTP timestamps
518	   for inter-media synchronization as well as for jitter calculation.
519	   Furthermore, this is the only alternative used up to our knowledge
520	   (see Section 4.1) for layered transmission of media.

522	7.2.1.1.  Identified problems

524	   Using the same RTP timestamp random offset may lead to getting weak
525	   initialization vectors for the encryption method defined in [RFC3550]
526	   if keys are shared across the sessions or streams.  Additionally,
527	   that it may be unnatural for some codecs to use the same clockrate
528	   for the multiple sessions, for example an audio wideband enhancement
529	   layer enhancing a narrow-band base layer.

531	7.2.2.  Initial RTP Timestamp or RTP Timestamp Offset Signaling

533	   Signaling the initial RTP timestamp or the initial offsets as an
534	   media or source level attribute in SDP associated with each stream.
535	   This could be done, e.g., using

537	   [I-D.ietf-mmusic-sdp-source-attributes].

539	7.2.2.1.  Identified problems

541	   This may have an implication for implementations, since one needs to
542	   know packet stream related information as initial RTP timestamp, or
543	   offset between RTP timestamps during while offering a session.  This
544	   may be a problem for sessions where multiple senders are present: it
545	   may not always be possible for an SDP creator to include all initial
546	   offsets / timestamps for all participants for sessions with multiple
547	   sending parties.

549	7.2.3.  CCM message - need NTP update

551	   In this case, a receiver would request for immediate synchronization
552	   information.  This method may reduce the initial delay, but just work
553	   for topologies with bi-directional channels.

555	7.2.3.1.  Identified problems

557	   This method is only feasible for topologies with bidirectional and
558	   reasonably rapid communication channels, i.e. unicast or small-group
559	   multicast.  This method also assumes that the NTP timestamp alignment
560	   always works.

562	7.2.4.  Multiple early RTCP SRs

564	   In this case, the sender would generate more RTCP SRs than typically
565	   required and send them at an early point in the session.  This method
566	   does also work for topologies with uni-directional communication
567	   channels.

569	7.2.4.1.  Identified problems

571	   This method may overflow the RTCP bandwidth.  Enhancing the RTCP
572	   sender bandwidth may be achieved using SDP bandwidth parameters.
573	   This method may require an adjustment of the RTCP bandwidth of the
574	   session depending on the number of participants and senders.
575	   Further, this approach does not solve the problem for receivers
576	   tuning in to the session after it begins ("random entry").  This
577	   method also assumes that the NTP timestamp alignment always works.

579	7.2.5.  Codec-Specific Mechanisms

581	   This mechanism exploits signaling contained within the payload's data
582	   sections in order to allow the Data Alignment.  Example is the Cross
583	   Session Decoding Order Number (CS-DON) as defined in
584	   [I-D.ietf-avt-rtp-svc] or as proposed in

586	   [I-D.hannuksela-avt-rtp-svc], where a timestamp or a timestamp delta
587	   of the RTP packet to be aligned is carried by payload specific means.

589	7.2.5.1.  Identified problems

591	   A payload independent solution for the basic functionality of Data
592	   Alignment is desirable.

594	7.2.6.  RTP header extension

596	   The RTP header extension may be used to add generic signaling about
597	   Data Alignment to RTP packets.

599	7.2.6.1.  Identified problems

601	   RTP header extensions are required to be ancillary information which
602	   can safely be discarded by receivers which do not understand them.
603	   Data alignment mechanisms do not satisfy this requirement.

605	8.  Review of techniques for Source Correlation

607	8.1.  Source Correlation using CNAME in SDES

609	   In RTP, associated multimedia streams (e.g., audio and video sources
610	   from a single participant) have different SSRCs, and are associated
611	   using SDES CNAME fields.  While in principle the same technique can
612	   be used to associate streams for multi-session or multi-source
613	   transmission, several issues arise.

615	   Startup latency: while slow lipsync convergence of multimedia streams
616	   is often tolerable, layered sources have to be associated from the
617	   start in order to be decodable, particularly for codec types such as
618	   video with inter-frame decoding dependencies.

620	   If multiple sources are sent from the same participant on the same
621	   session or family of sessions, e.g. multiple video cameras, they will
622	   have the same CNAME, because they are synchronized with each other
623	   and with any other sources for the session.  This makes it impossible
624	   to definitively associate base and enhancement sources, as there may
625	   be more than one of each with the same CNAME.  This potential for
626	   confusion is the reason for RTP retransmission's restriction on
627	   multiple outstanding RTP NACKs before stream association has
628	   completed, as described in Section 4.7.

630	8.2.  Review of other potential techniques for Source Correlation

632	8.2.1.  Single SSRC Space

634	   Motivated by the problems with CNAME association, RTP [RFC3550]
635	   specifies instead a single SSRC space for layered multicast
636	   (multiple-session transmission).  Furthermore, as described in
637	   Section 9.2, it specifies that SSRC collision detection is performed
638	   only in the base layer.

640	   Applying SSRC collision detection in just the base layer in case of
641	   using multi-session transmission seems to work for current codec
642	   implementations.

644	   By definition one of the multiple views possible in MVC media
645	   Section 4.3 is the base view and this view is backward compatible to
646	   H.264.  Decoding a view other than the base view may not require the
647	   presence of the base view.  Although MVC is by its nature a layered
648	   codec, it may not always be reasonable to require the reception of
649	   the base layer for collision detection, even when it is not required
650	   for decoding.

652	   Currently, we do not see major relevance for the MVC codec format,
653	   due to its lack in coding efficiency, thus we tend not to take MVC as
654	   the killer application for new Source Correlation functionalities.
655	   This means without taking MVC into account, the current solution of
656	   using the base layer for SSRC collision detection seems to be still
657	   appropriate.

659	   If needed, collision detection could instead be performed across all,
660	   or a subset of, the sessions used for multi-session transmission.
661	   However, it is not entirely clear how this would work for senders or
662	   receivers that are only participating in a subset of the sessions,
663	   and this would require further study.

665	8.2.2.  SSRC Groups

667	   The Internet-Draft [I-D.ietf-mmusic-sdp-source-attributes] specifies
668	   a mechanism by which related sources can be described as grouped in
669	   SDP.  For multi-source (single-session) transmission, this can
670	   provide an alternative way to provide source association.

672	   Clearly, this will only be effective in topologies and signaling
673	   architectures in which the SDP author can know about every source in
674	   the session that will be used for multi-source transmission, and the
675	   SDP can be updated on the addition of new sources or SSRCs
676	   collisions.

678	8.2.3.  CNAME in Source Attributes

680	   The draft [I-D.ietf-mmusic-sdp-source-attributes] also provides a
681	   mechanism for sources' SSRCs to be associated to their CNAMEs in SDP.
682	   This can eliminate the startup latency of stream association for the
683	   mechanism described in Section 8.1, though it does not solve the
684	   problem of multiple sources for a session.  It also has the same
685	   architectural limitations as Section 8.2.2 in terms of using SDP.

687	8.2.4.  Application-specific Inference of Association

689	   As described in Section 4.7, it is in some cases possible to use
690	   mechanisms specific to a particular codec or mechanism to determine
691	   stream associations.  For retransmission, for instance, a NACK of a
692	   packet with sequence N with SSRC A, followed by a retransmission of a
693	   packet with sequence N on SSRC B, indicates that SSRC B is the
694	   retransmission stream for SSRC A. Such techniques are mechanism-
695	   specific and cannot easily be generalized.

697	9.  Summary of RTP solution for Data Alignment and Source Correlation

699	9.1.  Data Alignment in RTP

701	   The text on layered multicast in [RFC3550] does not discuss Data
702	   Alignment among the media data carried in the different RTP sessions.
703	   We assume that the intention of the RTP specification was to use NTP
704	   timestamp alignment.  However, Vic, the demonstration code for
705	   layered multicast using PVH, used RTP timestamp alignment for this
706	   purpose.

708	9.2.  Source Correlation in RTP

710	   The text in section 8.3 of [RFC3550] mandates a single SSRC to be
711	   used for multiple sessions containing data of the same layered media
712	   source.  Further, the text mandates the detection of SSRC collisions
713	   using the CNAME item in SDES packets carried in the base layer:

715	      For layered encodings transmitted on separate RTP sessions (see
716	      Section 2.4), a single SSRC identifier space SHOULD be used across
717	      the sessions of all layers and the core (base) layer SHOULD be
718	      used for SSRC identifier allocation and collision resolution.
719	      When a source discovers that it has collided, it transmits an RTCP
720	      BYE packet on only the base layer but changes the SSRC identifier
721	      to the new value in all layers. ...

723	9.3.  Dependency signaling

725	   For signaling the dependency of data transmitted using layered
726	   multicast, SDP [RFC4566] contains rudimentary support, in that it
727	   allows for signaling a range of transport addresses in a certain
728	   media description.  By definition, a higher transport address
729	   identifies a higher layer in the one- dimensional hierarchy.  A
730	   receiver needs only to decode data conveyed over this transport
731	   address and lower transport addresses to decode this Operation Point.

733	   When the media data of one source is transmitted in multiple RTP
734	   sessions, the mechanism defined in Signaling media decoding
735	   dependency in Session Description Protocol (SDP)
736	   [I-D.ietf-mmusic-decoding-dependency] can also be used to indicate
737	   the relationship between the multiple sessions of the same media
738	   type.  Currently, this mechanism is inherited by the new Payload
739	   Formats allowing multi-session transmission: [I-D.ietf-avt-rtp-svc],
740	   [I-D.wang-avt-rtp-mvc], [I-D.ietf-avt-rtp-mps], and
741	   [I-D.lakaniemi-avt-rtp-evbr] .  By definition the base layer is
742	   signaled as the RTP session which does not depend on any other
743	   session.

745	   Since [RFC3550] mandates the correlation of one layered media with
746	   the same source, there is no mechanism to indicate dependencies of
747	   multiple sources.

749	10.  Recommendations

751	   We recommend for Data Alignment of media data from the same source,
752	   that the same RTP timestamp is used for packets of the same time
753	   instance as defined in
754	   [I-D.lennox-avt-rtp-layered-encoding-timestamps].  This method comes
755	   for free and can be implemented in a backward compatible way, since
756	   NTP timing for synchronizing different types of media is not
757	   affected.  This further requires the use of the same timescale of the
758	   sessions of an multi-session or multi-source transmission, which is
759	   anyway the case if the layered media is identified as a unique
760	   source.  Mandating the same timescale for each of the sessions in a
761	   multi-session transmission may need to be discussed with respect to
762	   the audio codec described in Section 4.5.

764	   For Source Correlation, we suggest to keep the mechanism defined in
765	   [RFC3550], i.e. all layers of a layered media source have the same
766	   SSRC and the base layer is used for SSRC collision detection.
767	   Further, it may be useful to have a signaling mechanism, which
768	   indicates the RTP session to be used for SSRC collision detection.

770	11.  Other transport related issues for multi-session transmission

772	11.1.  Inter-session Jitter

774	   The transport of media of the same source in different sessions may
775	   introduce different jitter behaviors in the different sessions.  We
776	   call this issue inter-session jitter.  Inter-session jitter may be
777	   caused by sessions taking different network paths or by any other
778	   packet reordering within the network outside the control of the user.
779	   RTP implementations typically use buffers for de-jittering each of
780	   the sessions separately.  In a simple A/V transmission scenario, de-
781	   jittering the audio and the video input queue separately is not
782	   problematic, since the synchronization is achieved after the decoder
783	   during playout.  Using multi-session transmission, de-jittering and
784	   synchronization (Data Alignment) is required before decoding instead
785	   of synchronizing the data after decoding at playout time.  And the
786	   Data Alignment via NTP timestamp must be 100% exact on a micro second
787	   base, otherwise the synchronization fails.  This is definitely
788	   different from doing synchronization for lip synchronized playout of
789	   audio and video.

791	11.2.  Inter-session Interleaving

793	   Using multi-session transmission allows for data interleaving, while
794	   the data transmitted within one session can still be sent in decoding
795	   order.  Inter-session interleaving may be also realizable using Data
796	   Alignment via timestamps.

798	12.  Security Considerations

800	   [Ed.  TBD]

802	13.  IANA Considerations

804	   No action by IANA is required.

806	14.  References

808	14.1.  Normative References

810	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
811	              Jacobson, "RTP: A Transport Protocol for Real-Time
812	              Applications", STD 64, RFC 3550, July 2003.

814	14.2.  Informative References

816	   [I-D.begen-mmusic-fec-grouping-issues]
817	              Begen, A., "FEC Grouping Issues in Session Description
818	              Protocol", draft-begen-mmusic-fec-grouping-issues-00 (work
819	              in progress), February 2008.

821	   [I-D.hannuksela-avt-rtp-svc]
822	              Hannuksela, M. and Y. Wang, "Session Multiplexing for SVC
823	              Video", draft-hannuksela-avt-rtp-svc-01 (work in
824	              progress), July 2008.

826	   [I-D.ietf-avt-rtp-mps]
827	              Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider,
828	              "RTP Payload Format for Elementary Streams with MPEG
829	              Surround multi- channel  audio", draft-ietf-avt-rtp-mps-01
830	              (work in progress), October 2008.

832	   [I-D.ietf-avt-rtp-svc]
833	              Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
834	              "RTP Payload Format for SVC Video",
835	              draft-ietf-avt-rtp-svc-14 (work in progress),
836	              September 2008.

838	   [I-D.ietf-mmusic-decoding-dependency]
839	              Schierl, T. and S. Wenger, "Signaling media decoding
840	              dependency in Session Description Protocol (SDP)",
841	              draft-ietf-mmusic-decoding-dependency-04 (work in
842	              progress), October 2008.

844	   [I-D.ietf-mmusic-sdp-source-attributes]
845	              Lennox, J., Ott, J., and T. Schierl, "Source-Specific
846	              Media Attributes in the Session Description Protocol
847	              (SDP)", draft-ietf-mmusic-sdp-source-attributes-01 (work
848	              in progress), February 2008.

850	   [I-D.lakaniemi-avt-rtp-evbr]
851	              Lakaniemi, A. and Y. Wang, "RTP payload format for G.718
852	              speech/audio", draft-lakaniemi-avt-rtp-evbr-04 (work in
853	              progress), October 2008.

855	   [I-D.lennox-avt-rtp-layered-encoding-timestamps]
856	              Lennox, J., Schierl, T., and S. Ganesan, "Real-Time
857	              Transport Protocol (RTP) Timestamps for Layered
858	              Encodings",
859	              draft-lennox-avt-rtp-layered-encoding-timestamps-00 (work
860	              in progress), June 2008.

862	   [I-D.wang-avt-rtp-mvc]
863	              Wang, Y. and T. Schierl, "RTP Payload Format for MVC
864	              Video", draft-wang-avt-rtp-mvc-02 (work in progress),
865	              August 2008.

867	   [McCa96]   McCanne, S., "Scalable Compression and Transmission of
868	              Internet Multicast Video", Report No. UCB/CSD-96-928,
869	              December 1996.

871	              Ph.D. Dissertation, University of California Berkeley.

873	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
874	              Requirement Levels", BCP 14, RFC 2119, March 1997.

876	   [RFC3388]  Camarillo, G., Eriksson, G., Holler, J., and H.
877	              Schulzrinne, "Grouping of Media Lines in the Session
878	              Description Protocol (SDP)", RFC 3388, December 2002.

880	   [RFC3984]  Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund,
881	              M., and D. Singer, "RTP Payload Format for H.264 Video",
882	              RFC 3984, February 2005.

884	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
885	              Description Protocol", RFC 4566, July 2006.

887	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
888	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
889	              July 2006.

891	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
892	              Correction", RFC 5109, December 2007.

894	   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
895	              January 2008.

897	Appendix A.  Acknowledgements

899	   Funding for the RFC Editor function is provided by the IETF
900	   Administrative Support Activity (IASA).  Further, the author Thomas
901	   Schierl of Fraunhofer HHI is sponsored by the European Commission
902	   under the contract number FP7-ICT-214063, project SEA.  The authors
903	   want to thank Colin Perkins, Ye-Kui Wang, Randell Jesup, Ingemar
904	   Johansson, Gerard Babonneau, Alex Eleftheriadis, Stefan Doehla, and
905	   Roni Even for their valuable comments on the mailing list.

907	Authors' Addresses

909	   Thomas Schierl
910	   Fraunhofer HHI
911	   Einsteinufer 37
912	   D-10587 Berlin
913	   Germany

915	   Phone: +49-30-31002-227
916	   Email: mail@thomas-schierl.de

918	   Jonathan Lennox
919	   Vidyo, Inc.
920	   433 Hackensack Avenue
921	   Sixth Floor
922	   Hackensack, NJ  07601
923	   US

925	   Email: jonathan@vidyo.com

927	Full Copyright Statement

929	   Copyright (C) The IETF Trust (2008).

931	   This document is subject to the rights, licenses and restrictions
932	   contained in BCP 78, and except as set forth therein, the authors
933	   retain all their rights.

935	   This document and the information contained herein are provided on an
936	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
937	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
938	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
939	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
940	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
941	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

943	Intellectual Property

945	   The IETF takes no position regarding the validity or scope of any
946	   Intellectual Property Rights or other rights that might be claimed to
947	   pertain to the implementation or use of the technology described in
948	   this document or the extent to which any license under such rights
949	   might or might not be available; nor does it represent that it has
950	   made any independent effort to identify any such rights.  Information
951	   on the procedures with respect to rights in RFC documents can be
952	   found in BCP 78 and BCP 79.

954	   Copies of IPR disclosures made to the IETF Secretariat and any
955	   assurances of licenses to be made available, or the result of an
956	   attempt made to obtain a general license or permission for the use of
957	   such proprietary rights by implementers or users of this
958	   specification can be obtained from the IETF on-line IPR repository at
959	   http://www.ietf.org/ipr.

961	   The IETF invites any interested party to bring to its attention any
962	   copyrights, patents or patent applications, or other proprietary
963	   rights that may cover technology that may be required to implement
964	   this standard.  Please address the information to the IETF at
965	   ietf-ipr@ietf.org.