idnits 2.17.1 draft-ietf-avt-rtp-interleave-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 194: '...ames following the interleaved MUST be...' RFC 2119 keyword, line 196: '...terleaved frames MUST only contain fra...' RFC 2119 keyword, line 197: '...d packets that do not comply SHOULD be...' RFC 2119 keyword, line 200: '...ved audio frames SHALL have a standard...' RFC 2119 keyword, line 276: '... implementations SHOULD constrain the ...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (6 May 2002) is 8023 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Downref: Normative reference to an Experimental RFC: RFC 2762 (ref. '3') -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566) Summary: 7 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force AVT WG 2 INTERNET-DRAFT O. Hodson / ICSI 3 6 May 2002 4 Expires: November 2002 6 RTP Payload for Interleaved Audio 7 draft-ietf-avt-rtp-interleave-00.txt 9 Status of this Document 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering Task 15 Force (IETF), its areas, and its working groups. Note that other groups 16 may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference material 21 or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 This document is a product of the IETF AVT WG. Comments should be 30 addressed to the author, or the WG's mailing list at avt@ietf.org. 32 Abstract 34 This document describes a payload format for use with the 35 Real-time Transport Protocol (RTP) version 2 for interleaving 36 encoded audio data. It is intended for use in audio streaming 37 delay tolerant applications operating over best-effort packet 38 networks. The goal of interleaving is to disperse burst 39 losses into a series of shorter losses. The total amount of 40 audio lost is not changed by interleaving, but the individual 41 loss events are shorter and easier to conceal at the receiver. 43 Table of Contents 45 1. Introduction. . . . . . . . . . . . . . . . . . . . . . 3 46 2. Requirements. . . . . . . . . . . . . . . . . . . . . . 3 47 3. Interleaver Implementation. . . . . . . . . . . . . . . 4 48 4. Payload Format Description. . . . . . . . . . . . . . . 4 49 5. Relation to SDP . . . . . . . . . . . . . . . . . . . . 7 50 6. Security Considerations . . . . . . . . . . . . . . . . 7 51 7. Example Packet. . . . . . . . . . . . . . . . . . . . . 8 52 8. Acknowledgements. . . . . . . . . . . . . . . . . . . . 8 53 9. Author's Address. . . . . . . . . . . . . . . . . . . . 9 54 10. References . . . . . . . . . . . . . . . . . . . . . . 9 56 1. Introduction 58 The Real-time Transport Protocol (RTP) [1] is the standardized 59 method for transporting between end-systems attached to the Internet. 60 The standard RTP audio profiles [2] allow a number of consecutive audio 61 frames to be encapsulated within a single packet. Encapsulating 62 multiple audio frames within a single packet increases the latency of 63 communication, but results in fewer packets being transmitted and a 64 smaller amount of network bandwidth dedicated to IP/UDP/RTP headers. 66 When a packet containing multiple audio frames is lost, or a burst 67 of packet losses occurs, the receiving system experiences a burst of 68 audio frame losses. The receiver can apply loss concealment algorithms 69 to mitigate the frame losses. However, the performance of receiver 70 based audio loss concealment schemes varies inversely with the length of 71 loss [4]. The greater the number of consecutive audio frames lost the 72 lower the probability of successful concealment. 74 Interleaving is a technique for re-arranging the frames from an 75 audio source. The technique introduces temporal separation between 76 adjacent frames for the purposes of transmission. When burst frame 77 losses occur in an interleaved stream, they are dispersed into a series 78 of shorter and easier to conceal losses for the receiver to handle. 80 Interleaving is employed in several proprietary audio protocols 81 used on the Internet and several payloads undergoing standardization 82 support interleaving in their RTP framing. The format presented here is 83 intended to provide interleaving support for audio codecs with fixed 84 frames and those whose frame size is determinable by inspection of the 85 payload. It's anticipated use is in broadcast style applications where 86 quality is more important than latency. 88 2. Requirements 90 o To provide support for interleavers that re-arrange the ordering of 91 audio frames within an RTP audio stream. 93 o To work with audio codecs that have fixed frame sizes or have self- 94 describing frames that allow the frame size to be inferred. 96 o To support audio streams employing silence suppression as well as 97 those that do not. 99 o To support codec changes mid-stream. 101 3. Interleaver Implementation 103 For the purpose of clarifying the Payload Format Description we 104 describe the implementation of a model interleaver. The description is 105 intended to be as straightforward as possible. There are alternative 106 styles of interleaver implementation, some of which are provably optimal 107 [5] with regard to latency, however these place constraints on the 108 configuration parameters. 110 Suppose the interleaver module at the sender has two equally sized 111 buffers: an input buffer and output buffer. The input buffer holds 112 audio frames passed from the media encoder. The output buffer passes 113 audio frames to the RTP encapsulator. When a frame is passed to the 114 input buffer, a frame is removed from the output buffer. When the input 115 buffer is full the output buffer is empty and they swap roles. 117 We assume throughout this document that frames enter the input 118 buffer in order and are read from the output buffer out of order. The 119 interleaver cycle length is the number of frames that can be stored in 120 the input buffer. The interleaver stride length is the separation 121 between frames originally adjacent in the output buffer. Consider a 122 full output buffer with an interleaver cycle length of 12 and a stride 123 length of 4. For an input buffer containing audio frames: 125 A B C D E F G H I J K L 127 the frames leave the output buffer in the order: 129 A E I B F J C G K D H L 131 If we denote the interleaver stride length as SL and the 132 interleaver cycle length as CL, and assume the frames in the output 133 buffer are labelled 0...CL-1, the buffer index of the n-th frame out of 134 the interleaver will be: 136 II[n] = n * SL mod CL + (n * SL) / CL 138 The payload described in the next section describes how an RTP 139 interleaver places re-ordered frames within an RTP packet. The RTP 140 interleaver may encapsulate any number of frames within a single packet. 142 4. Payload Format Description 144 Since only a limited set of interleaver stride lengths and cycle 145 lengths are likely to be of interest for a session, we rely on an 146 external mechanism, such as the Session Description Protocol [6] , to 147 communicate payload mappings describing these values. An SDP format is 148 proposed in section 5. 150 The proposed payload format for interleaved audio is: 152 0 1 153 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 154 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 155 |IC | II | PT | 156 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 158 IC: Interleaver Cycle (2 bits) 159 This is a counter that is incremented each time a complete cycle is 160 completed at the sender. A receiver may have multiple decode 161 buffers active and this facilitates placing the incoming frames 162 into the correct buffer. The interleaver cycle has a range from 0 163 to 3 and is incremented by 1 with the complete transmission of a 164 cycle. 166 II: Interleaver Index (7 bits) 167 This is the index of the first audio frame from output buffer, 168 which is encapsulated in the current packet. The interleaver index 169 has a range from 0 to the interleaver cycle length - 1. 171 PT: Audio Payload (7 bits) 172 This identifies the type of audio encoding of all the interleaved 173 audio frames encapsulated. 175 This format allows a sender to interleave the audio frames of 176 stream and encapsulate one or multiple frames in each packet. When 177 multiple frames follow the interleaving header, the offset between each 178 successive frame is the cycle length CL. When multiple frames follow 179 the interleaving header, they should be packed according to the their 180 default packing rules. If frames are normally octet aligned, then they 181 MUST be octet aligned when interleaved. 183 The interleaver payload is only intended for codecs with fixed 184 compressed frame sizes and codecs where the frame boundaries can be 185 determined by examining the codec data. For sample based codecs the 186 number of samples per frame should be the default for the codec 187 concerned. In most cases, the number of samples is 160 per frame. This 188 differs from the RTP A/V profile [2] which suggests sample based codecs 189 should have 160 sample per frame, but frames of any length should be 190 accepted. This restriction removes the need to specify the length of 191 each audio frame in an interleaved packet. 193 The interleaved audio payload format only supports a single payload 194 type field. All of the audio frames following the interleaved MUST be 195 of the same type. For ease of implementation packets containing 196 multiple interleaved frames MUST only contain frames from one 197 interleaving cycle. Received packets that do not comply SHOULD be 198 discarded. 200 An RTP packet carry interleaved audio frames SHALL have a standard 201 RTP header with a payload indicating interleaved audio. All fields, 202 with the exception of the timestamp, should be implemented according the 203 methods layed out in RTP. The timestamp field merits special 204 consideration because RTP uses the timestamp field to derive jitter 205 estimates for reporting and applications may use this value in their 206 playout calculation. In the example given in section 3 , frames leave 207 the interleaver in the order: 209 A E I B F J C G K D H L 211 If the encapsulation function only places one or two frames in each 212 packet there is a potential issue with the timestamp associated with 213 each packet. If the timestamp is derived from the sampling time of each 214 frame then the timestamps will not increase monotonically, e.g. for one 215 frame per packet the timestamp of the fourth packet is less than the 216 timestamp of the third packet, ie (t(I) <= t(B)). 218 For applications to be able to use interleaving without 219 modification to their playout calculation we propose the timestamp of 220 each outgoing packet is the time stamp of the frame that would have been 221 in the packet if interleaving had not been applied, i.e. for an 222 interleave with cycle length 12, stride length 4, and a packetizer 223 encapsulating 2 frames per packet the packets are: 225 AE, IB, FJ, CG, KD, HL 227 and the timestamps of the outgoing packets are: 229 t(A), t(C), T(E), t(G), t(I), t(K) 231 which correspond to the timestamps of the packet had interleaving not 232 been applied: 234 AB, CD, EF, GH, IJ, KL 236 This preserves the integrity of existing RTP playout and jitter 237 calculations and allows interleaving to be implemented without modifying 238 the RTP processing in existing applications. 240 A final point is the interaction with audio codecs using silence 241 suppression. At the start of a new talkspurt, the Interleaver should 242 reset it's cycle counter (IC) and interleaving index (II) to zero. If 243 the codec normally sets the marker bit in the RTP header for new 244 talkspurts, then it should do so when used in conjunction with 245 interleaving. 247 5. Relation to SDP 249 The interleaved payload is used an external mapping mechanism may 250 be required for end-systems to identify a particular RTP payload as 251 interleaved audio. A common mechanism for performing this is through 252 the Session Description Protocol (SDP) [6]. The proposed SDP mapping for 253 an interleaved audio payload identifier is: 255 m=audio 10000 RTP/AVP 96 14 256 a=rtpmap:96 intl/64/8 258 This specifies an interleaved audio stream encapsulated in RTP. The 259 specified port is 10000 and the payload identifier is 96 (selected from 260 the dynamic payloads). The interleaved audio is MPEG-I/II audio (static 261 payload 14). The term 'intl' indicates interleaving. The slash 262 separated parameters are the interleaving cycle length and the stride 263 length respectively. In the example, the interleaver has an 264 interleaving cycle length of 64 and an interleaving stride length of 8. 266 6. Security Considerations 268 The security considerations and issues presented in the RTP 269 protocol definition [1] and the RTP sampling document [3] apply to RTP 270 streams carrying the interleaved audio payload. 272 An additional risk with interleaved stream comes from hostile 273 senders transmitting an interleaved audio stream with randomly changing 274 interleaver cycle number and interleaver index fields. This may cause a 275 receiver to allocate buffer resources and store a large number of audio 276 frames. As a result, implementations SHOULD constrain the number of de- 277 interleaving buffers at the receiver. 279 7. Example Packet 281 For an interleaver with a cycle length of 8, stride length 4, and 2 282 audio frames per packet, the packetized frame sequence is: 284 AE, BF, CG, DH 286 As an example consider a stream encoded with G.723.1 audio (RTP A/V 287 payload 4, frame duration 30ms, sample rate 8kHz, channels 1) that uses 288 this interleaver. If the timestamp of first frame in an interleaver 289 sequence is 100 and this is the interleavers first cycle, the second 290 packet will be: 292 0 1 2 3 293 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 |V=2|P|X| CC=0 |M| PT | sequence number | 296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 297 | timestamp = 130 | 298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 299 | synchronization source (SSRC) identifier | 300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 301 | 0 | II = 1 | PT = 4 | | 302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 303 | | 304 | G.723.1 Frame B | 305 | | 306 | | 307 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 308 | | | 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 310 | | 311 | G.723.1 Frame F | 312 | | 313 | | 314 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 | | 316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 8. Acknowledgements 320 This document derives from an unsubmitted draft that was markedly 321 improved by feedback from Colin Perkins and Ross Finlayson. 323 9. Author's Address 325 Orion Hodson 326 International Computer Science Institute 327 1947 Center Street (Suite 600) 328 Berkeley CA94703 USA 329 hodson@icir.org 331 10. References 333 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A 334 Transport Protocol for Real-Time Applications", RFC 1889. 336 [2] H. Schulzrinne, and S. Casner, "RTP Profile for Audio and Video 337 Conferences with Minimal Control", Work In Progress, , 2001. 340 [3] J. Rosenberg, and H. Schulzrinne, "Sampling of the Group Membership 341 in RTP", RFC 2762. 343 [4] D.J. Goodman, G.B. Lockhard, O.J. Wasem, and W.-C. Wong, "Waveform 344 Substitution Techniques for Recovering Missing Speech Segments in 345 Packet Voice Communications", IEEE Transactions on Acoustics, 346 Speech, and Signal Processing, pp. 1440-1448, vol. ASSP-34, no. 6, 347 December 1986. 349 [5] J.L. Ramsey, "Realization of Optimium Interleavers", IEEE 350 Transactions on Information Theory, pp. 338-345, vol. IT-16, May 351 1970. 353 [6] M. Handley, and V. Jacobson, "SDP: Session Description Protocol", 354 RFC 2327.