idnits 2.17.1 

draft-ietf-avt-rtp-interleave-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 194: '...ames following the interleaved MUST be...'
     RFC 2119 keyword, line 196: '...terleaved frames MUST only contain fra...'
     RFC 2119 keyword, line 197: '...d packets that do not comply SHOULD be...'
     RFC 2119 keyword, line 200: '...ved audio frames SHALL have a standard...'
     RFC 2119 keyword, line 276: '... implementations SHOULD constrain the ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (6 May 2002) is 8023 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550)

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Downref: Normative reference to an Experimental RFC: RFC 2762 (ref. '3')

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)


     Summary: 7 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                   AVT WG
2	INTERNET-DRAFT                                          O. Hodson / ICSI
3	                                                              6 May 2002
4	                                                  Expires: November 2002

6	                   RTP Payload for Interleaved Audio
7	                  draft-ietf-avt-rtp-interleave-00.txt

9	Status of this Document

11	This document is an Internet-Draft and is in full conformance with all
12	provisions of Section 10 of RFC2026.

14	Internet-Drafts are working documents of the Internet Engineering Task
15	Force (IETF), its areas, and its working groups.  Note that other groups
16	may also distribute working documents as Internet-Drafts.

18	Internet-Drafts are draft documents valid for a maximum of six months
19	and may be updated, replaced, or obsoleted by other documents at any
20	time.  It is inappropriate to use Internet-Drafts as reference material
21	or to cite them other than as "work in progress."

23	The list of current Internet-Drafts can be accessed at
24	http://www.ietf.org/ietf/1id-abstracts.txt

26	The list of Internet-Draft Shadow Directories can be accessed at
27	http://www.ietf.org/shadow.html.

29	This document is a product of the IETF AVT WG.  Comments should be
30	addressed to the author, or the WG's mailing list at avt@ietf.org.

32	                                Abstract

34	     This document describes a payload format for use with the
35	     Real-time Transport Protocol (RTP) version 2 for interleaving
36	     encoded audio data.  It is intended for use in audio streaming
37	     delay tolerant applications operating over best-effort packet
38	     networks.  The goal of interleaving is to disperse burst
39	     losses into a series of shorter losses.  The total amount of
40	     audio lost is not changed by interleaving, but the individual
41	     loss events are shorter and easier to conceal at the receiver.

43	                           Table of Contents

45	     1. Introduction. . . . . . . . . . . . . . . . . . . . . .   3
46	     2. Requirements. . . . . . . . . . . . . . . . . . . . . .   3
47	     3. Interleaver Implementation. . . . . . . . . . . . . . .   4
48	     4. Payload Format Description. . . . . . . . . . . . . . .   4
49	     5. Relation to SDP . . . . . . . . . . . . . . . . . . . .   7
50	     6. Security Considerations . . . . . . . . . . . . . . . .   7
51	     7. Example Packet. . . . . . . . . . . . . . . . . . . . .   8
52	     8. Acknowledgements. . . . . . . . . . . . . . . . . . . .   8
53	     9. Author's Address. . . . . . . . . . . . . . . . . . . .   9
54	     10. References . . . . . . . . . . . . . . . . . . . . . .   9

56	1.  Introduction

58	     The Real-time Transport Protocol (RTP) [1] is the standardized
59	method for transporting between end-systems attached to the Internet.
60	The standard RTP audio profiles [2] allow a number of consecutive audio
61	frames to be encapsulated within a single packet.  Encapsulating
62	multiple audio frames within a single packet increases the latency of
63	communication, but results in fewer packets being transmitted and a
64	smaller amount of network bandwidth dedicated to IP/UDP/RTP headers.

66	     When a packet containing multiple audio frames is lost, or a burst
67	of packet losses occurs, the receiving system experiences a burst of
68	audio frame losses.  The receiver can apply loss concealment algorithms
69	to mitigate the frame losses.  However, the performance of receiver
70	based audio loss concealment schemes varies inversely with the length of
71	loss [4]. The greater the number of consecutive audio frames lost the
72	lower the probability of successful concealment.

74	     Interleaving is a technique for re-arranging the frames from an
75	audio source.  The technique introduces temporal separation between
76	adjacent frames for the purposes of transmission.  When burst frame
77	losses occur in an interleaved stream, they are dispersed into a series
78	of shorter and easier to conceal losses for the receiver to handle.

80	     Interleaving is employed in several proprietary audio protocols
81	used on the Internet and several payloads undergoing standardization
82	support interleaving in their RTP framing.  The format presented here is
83	intended to provide interleaving support for audio codecs with fixed
84	frames and those whose frame size is determinable by inspection of the
85	payload.  It's anticipated use is in broadcast style applications where
86	quality is more important than latency.

88	2.  Requirements

90	o To provide support for interleavers that re-arrange the ordering of
91	  audio frames within an RTP audio stream.

93	o To work with audio codecs that have fixed frame sizes or have self-
94	  describing frames that allow the frame size to be inferred.

96	o To support audio streams employing silence suppression as well as
97	  those that do not.

99	o To support codec changes mid-stream.

101	3.  Interleaver Implementation

103	     For the purpose of clarifying the Payload Format Description we
104	describe the implementation of a model interleaver.  The description is
105	intended to be as straightforward as possible.  There are alternative
106	styles of interleaver implementation, some of which are provably optimal
107	[5] with regard to latency, however these place constraints on the
108	configuration parameters.

110	     Suppose the interleaver module at the sender has two equally sized
111	buffers: an input buffer and output buffer.  The input buffer holds
112	audio frames passed from the media encoder.  The output buffer passes
113	audio frames to the RTP encapsulator.  When a frame is passed to the
114	input buffer, a frame is removed from the output buffer.  When the input
115	buffer is full the output buffer is empty and they swap roles.

117	     We assume throughout this document that frames enter the input
118	buffer in order and are read from the output buffer out of order.  The
119	interleaver cycle length is the number of frames that can be stored in
120	the input buffer.  The interleaver stride length is the separation
121	between frames originally adjacent in the output buffer.  Consider a
122	full output buffer with an interleaver cycle length of 12 and a stride
123	length of 4.  For an input buffer containing audio frames:

125	                        A B C D E F G H I J K L

127	the frames leave the output buffer in the order:

129	                        A E I B F J C G K D H L

131	     If we denote the interleaver stride length as SL and the
132	interleaver cycle length as CL, and assume the frames in the output
133	buffer are labelled 0...CL-1, the buffer index of the n-th frame out of
134	the interleaver will be:

136	                 II[n] = n * SL mod CL + (n * SL) / CL

138	     The payload described in the next section describes how an RTP
139	interleaver places re-ordered frames within an RTP packet.  The RTP
140	interleaver may encapsulate any number of frames within a single packet.

142	4.  Payload Format Description

144	     Since only a limited set of interleaver stride lengths and cycle
145	lengths are likely to be of interest for a session, we rely on an
146	external mechanism, such as the Session Description Protocol [6] , to
147	communicate payload mappings describing these values.  An SDP format is
148	proposed in section 5.

150	     The proposed payload format for interleaved audio is:

152	                    0                   1
153	                    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
154	                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
155	                   |IC |     II      |     PT      |
156	                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

158	IC: Interleaver Cycle (2 bits)
159	     This is a counter that is incremented each time a complete cycle is
160	     completed at the sender.  A receiver may have multiple decode
161	     buffers active and this facilitates placing the incoming frames
162	     into the correct buffer.  The interleaver cycle has a range from 0
163	     to 3 and is incremented by 1 with the complete transmission of a
164	     cycle.

166	II: Interleaver Index (7 bits)
167	     This is the index of the first audio frame from output buffer,
168	     which is encapsulated in the current packet.  The interleaver index
169	     has a range from 0 to the interleaver cycle length - 1.

171	PT: Audio Payload (7 bits)
172	     This identifies the type of audio encoding of all the interleaved
173	     audio frames encapsulated.

175	     This format allows a sender to interleave the audio frames of
176	stream and encapsulate one or multiple frames in each packet.  When
177	multiple frames follow the interleaving header, the offset between each
178	successive frame is the cycle length CL.  When multiple frames follow
179	the interleaving header, they should be packed according to the their
180	default packing rules.  If frames are normally octet aligned, then they
181	MUST be octet aligned when interleaved.

183	     The interleaver payload is only intended for codecs with fixed
184	compressed frame sizes and codecs where the frame boundaries can be
185	determined by examining the codec data.  For sample based codecs the
186	number of samples per frame should be the default for the codec
187	concerned.  In most cases, the number of samples is 160 per frame.  This
188	differs from the RTP A/V profile [2] which suggests sample based codecs
189	should have 160 sample per frame, but frames of any length should be
190	accepted.  This restriction removes the need to specify the length of
191	each audio frame in an interleaved packet.

193	     The interleaved audio payload format only supports a single payload
194	type field.  All of the audio frames following the interleaved MUST be
195	of the same type.  For ease of implementation packets containing
196	multiple interleaved frames MUST only contain frames from one
197	interleaving cycle.  Received packets that do not comply SHOULD be
198	discarded.

200	     An RTP packet carry interleaved audio frames SHALL have a standard
201	RTP header with a payload indicating interleaved audio.  All fields,
202	with the exception of the timestamp, should be implemented according the
203	methods layed out in RTP. The timestamp field merits special
204	consideration because RTP uses the timestamp field to derive jitter
205	estimates for reporting and applications may use this value in their
206	playout calculation.  In the example given in section 3 , frames leave
207	the interleaver in the order:

209	                        A E I B F J C G K D H L

211	     If the encapsulation function only places one or two frames in each
212	packet there is a potential issue with the timestamp associated with
213	each packet.  If the timestamp is derived from the sampling time of each
214	frame then the timestamps will not increase monotonically, e.g. for one
215	frame per packet the timestamp of the fourth packet is less than the
216	timestamp of the third packet, ie (t(I) <= t(B)).

218	     For applications to be able to use interleaving without
219	modification to their playout calculation we propose the timestamp of
220	each outgoing packet is the time stamp of the frame that would have been
221	in the packet if interleaving had not been applied, i.e. for an
222	interleave with cycle length 12, stride length 4, and a packetizer
223	encapsulating 2 frames per packet the packets are:

225	                         AE, IB, FJ, CG, KD, HL

227	and the timestamps of the outgoing packets are:

229	                   t(A), t(C), T(E), t(G), t(I), t(K)

231	which correspond to the timestamps of the packet had interleaving not
232	been applied:

234	                         AB, CD, EF, GH, IJ, KL

236	     This preserves the integrity of existing RTP playout and jitter
237	calculations and allows interleaving to be implemented without modifying
238	the RTP processing in existing applications.

240	     A final point is the interaction with audio codecs using silence
241	suppression.  At the start of a new talkspurt, the Interleaver should
242	reset it's cycle counter (IC) and interleaving index (II) to zero.  If
243	the codec normally sets the marker bit in the RTP header for new
244	talkspurts, then it should do so when used in conjunction with
245	interleaving.

247	5.  Relation to SDP

249	     The interleaved payload is used an external mapping mechanism may
250	be required for end-systems to identify a particular RTP payload as
251	interleaved audio.  A common mechanism for performing this is through
252	the Session Description Protocol (SDP) [6]. The proposed SDP mapping for
253	an interleaved audio payload identifier is:

255	                      m=audio 10000 RTP/AVP 96 14
256	                      a=rtpmap:96 intl/64/8

258	This specifies an interleaved audio stream encapsulated in RTP.  The
259	specified port is 10000 and the payload identifier is 96 (selected from
260	the dynamic payloads).  The interleaved audio is MPEG-I/II audio (static
261	payload 14).  The term 'intl' indicates interleaving.  The slash
262	separated parameters are the interleaving cycle length and the stride
263	length respectively.  In the example, the interleaver has an
264	interleaving cycle length of 64 and an interleaving stride length of 8.

266	6.  Security Considerations

268	     The security considerations and issues presented in the RTP
269	protocol definition [1] and the RTP sampling document [3] apply to RTP
270	streams carrying the interleaved audio payload.

272	     An additional risk with interleaved stream comes from hostile
273	senders transmitting an interleaved audio stream with randomly changing
274	interleaver cycle number and interleaver index fields.  This may cause a
275	receiver to allocate buffer resources and store a large number of audio
276	frames.  As a result, implementations SHOULD constrain the number of de-
277	interleaving buffers at the receiver.

279	7.  Example Packet

281	     For an interleaver with a cycle length of 8, stride length 4, and 2
282	audio frames per packet, the packetized frame sequence is:

284	                             AE, BF, CG, DH

286	As an example consider a stream encoded with G.723.1 audio (RTP A/V
287	payload 4, frame duration 30ms, sample rate 8kHz, channels 1) that uses
288	this interleaver.  If the timestamp of first frame in an interleaver
289	sequence is 100 and this is the interleavers first cycle, the second
290	packet will be:

292	   0                   1                    2                   3
293	   0 1 2 3 4 5 6 7 8 9 0 1 2 3  4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
294	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
295	   |V=2|P|X| CC=0  |M|      PT     |        sequence number        |
296	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
297	   |                          timestamp = 130                      |
298	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
299	   |           synchronization source (SSRC) identifier            |
300	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
301	   | 0 |    II = 1   |   PT = 4    |                               |
302	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
303	   |                                                               |
304	   |                           G.723.1 Frame B                     |
305	   |                                                               |
306	   |                                                               |
307	   +                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
308	   |                               |                               |
309	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
310	   |                                                               |
311	   |                           G.723.1 Frame F                     |
312	   |                                                               |
313	   |                                                               |
314	   +                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
315	   |                               |
316	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

318	8.  Acknowledgements

320	This document derives from an unsubmitted draft that was markedly
321	improved by feedback from Colin Perkins and Ross Finlayson.

323	9.  Author's Address

325	     Orion Hodson
326	     International Computer Science Institute
327	     1947 Center Street (Suite 600)
328	     Berkeley CA94703 USA
329	     hodson@icir.org

331	10.  References

333	[1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A
334	     Transport Protocol for Real-Time Applications", RFC 1889.

336	[2] H. Schulzrinne, and S. Casner, "RTP Profile for Audio and Video
337	     Conferences with Minimal Control", Work In Progress, <draft-ietf-
338	     avt-profile-new-12.txt>, 2001.

340	[3] J. Rosenberg, and H. Schulzrinne, "Sampling of the Group Membership
341	     in RTP", RFC 2762.

343	[4] D.J. Goodman, G.B. Lockhard, O.J. Wasem, and W.-C. Wong, "Waveform
344	     Substitution Techniques for Recovering Missing Speech Segments in
345	     Packet Voice Communications", IEEE Transactions on Acoustics,
346	     Speech, and Signal Processing, pp. 1440-1448, vol. ASSP-34, no. 6,
347	     December 1986.

349	[5] J.L. Ramsey, "Realization of Optimium Interleavers", IEEE
350	     Transactions on Information Theory, pp. 338-345, vol. IT-16, May
351	     1970.

353	[6] M. Handley, and V. Jacobson, "SDP: Session Description Protocol",
354	     RFC 2327.