idnits 2.17.1 draft-ietf-avt-rtp-mpeg2aac-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 8 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 25, 1999) is 9072 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '2' is defined on line 315, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Kretschmer-AT&T/Basso-AT&T 2 INTERNET DRAFT Civanlar-AT&T/Quackenbush-AT&T 3 File:draft-ietf-avt-rtp-mpeg2aac-00.txt Snyder-AT&T 4 June 25, 1999 5 Expires: December 25, 1999 7 RTP Payload Format for MPEG-2 AAC Streams 9 STATUS OF THIS MEMO 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering Task 15 Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet- Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 This document describes a payload format for transporting MPEG-2 AAC 32 encoded data using RTP. MPEG-2 AAC is a recent standard from ISO/IEC 33 for the coding of multi-channel audio data. Several services provided 34 by RTP are beneficial for MPEG-2 AAC encoded data transport over the 35 Internet. Additionally, the use of RTP makes it possible to 36 synchronize MPEG-2 AAC data with other real-time data types. 38 1. Introduction 40 The ISO/IEC MPEG-2 Advanced Audio Coding (AAC) [1] technology delivers 41 unsurpassed audio quality at rates at or below 64 kbps/channel. It 42 has a very flexible bitstream syntax that supports from 1 to 48 audio 43 channels, up to 16 subwoofer channels and up to 16 embedded data 44 channels. AAC supports a wide range of sampling frequencies (from 16 45 kHz to 96 kHz) which enables it to have an extremely wide range of 46 bitrates. This permits it to support applications ranging from 47 professional or home theater sound systems to Internet music broadcast 48 systems. 50 The benefits of using RTP for MPEG-2 AAC data stream transport include: 52 i. Ability to synchronize MPEG-2 AAC streams with other RTP payloads 54 ii. Monitoring MPEG-2 AAC delivery performance through RTCP 56 iii. Combining MPEG-2 AAC and other real-time data streams received 57 from multiple end-systems into a set of consolidated streams 58 through RTP mixers 60 iv. Converting data types, etc. through the use of RTP translators. 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in RFC 2119 [3]. 66 1.1 Overview of MPEG-2 AAC 68 AAC combines the coding efficiencies of a high resolution filter bank, 69 a powerful model of audio perception, backward-adaptive prediction, 70 joint channel coding, and Huffman to delivering excellent signal 71 compression. In 1998 the MPEG Audio subgroup tested the family of MPEG 72 audio coders (see http://www.tnt.uni-hannover.de/project/mpeg/audio/ 73 public/w2006.pdf). The test results indicate that for a stereo signal, 74 AAC at 96 kb/s has audio quality comparable to MPEG-3 Layer 3 ("mp3") 75 at 128 kb/s. Therefore at equivalent quality levels, AAC offers 76 approximately 1/3 greater compression than Layer 3. 78 AAC is a block oriented, variable rate coding algorithm, which means 79 that the AAC encoder reads 1024 samples of the input signal file and 80 writes a variable number of compressed output bits that represent that 81 block of input data. A sample can be one or more channels. Rate 82 control can be used in the encoder such that the output bit rate is 83 averaged to a predetermined rate, as would be required for 84 constant-rate communication channels. Each block of AAC compressed 85 bits is called a "raw data block", and it has the nice property that 86 it can be decoded "stand-alone", that is, without knowledge of 87 information in prior bitstream blocks. This is ideal for packet 88 communication channels, in that if the payload of a packet is a single 89 raw data block, packet framing facilitates encoder and decoder 90 synchronization and, most importantly, loss of a single packet does 91 not impair the decodability of adjacent packets. 93 1.2 Bitstream Syntax 95 As already stated, a raw data block represents audio data for a time 96 period of 1024 samples and may also contain related information and 97 other data. The syntax of an AAC bitstream is as follows: 99 => 100 => [] 102 where indicates the AAC bitstream, indicates 103 intermediate tokens, indicates terminal tokens and [] 104 indicates one or more occurance. is a token that indicates the 105 end of a raw_data_block and is a variable length token that 106 forces the total length of a raw_data_block to be an integral number 107 of byes. In general, intermediate tokens are not an integral number of 108 bytes in length. 110 The tokens are a string of bits of varying length, and can 111 be any of the following: 113 represent a single audio channel 114 represent a stereo presentation (2 channels) 115 a mechanism for multi-channel compression 116 represent a special effects channel 117 represent "user data" 118 a mechanism for describing the bitstream 119 content 120 a mechanism to use bits (for constant rate 121 channels) 123 The above can occur several times in a single 124 raw_data_block. For example, the raw_data_block for a 5.1 surround 125 sound signal would be: 127 ... 128 . 129 . 130 . 131 ... 133 corresponding to the center, left and right, left surround and right 134 surround and effects channels. Multiple occurances of the 135 are dis-ambiguated by means of a unique 4-bit 136 id inside the . 138 2. Issues covered by this Payload Format 140 2.1 Repair Information to reconstruct lost AAC Frames 142 Typically, a smart AAC decoder can mitigate the effects of lost 143 packets using techniques such as interpolation in the spectral domain. 144 However if the raw_data_block in a packet is perceptually very 145 significant and also highly unpredictable (e.g. the onset of a symbol 146 crash) then the encoder may choose to send RepairData associated with 147 that raw_data_block. The RepairData in a given packet is typically 148 associated with a raw_data_block in the FUTURE, such that the decoder 149 has the RepairData when faced with the loss of the corresponding 150 packet. The association is indicated by the RSEQ field, which is equal 151 to the SEQ field of the corresponding raw_data_block. 153 The syntax of the RepairData bits is exactly that of the AAC 154 raw_data_block. However, in practical use, the RepairData would be a 155 highly compressed monophonic version of the signal being transmitted. 156 For example, an AAC stereo signal coded to an average rate of 96 kb/s 157 corresponds to a raw_data_block size of 279 bytes. A RepairData 158 version of that block, compressed to 16 kb/s would be 46 bytes. Given 159 that perceptually critical blocks might occur only once per 100 or 160 more blocks, the average rate imposed by the RepairData is very low. 162 RepairData MAY be provide for every frame but, in general, its 163 provision is OPTIONAL. 165 2.2 Fragmentation of AAC Frames 167 For many reasons the packet size on a communications channel may have 168 a practical maximum size (e.g. Ethernet packet size limits). Since it 169 is advantagous to put one AAC raw_data_block per packet, it is 170 desirable to try to limit the size of the AAC raw_data_block. If this 171 is not possible, the raw_data_blockcan be fragmented across several 172 packets. In this case, the raw_data_block can be fragmented at 173 boundaries and the LEN field used to indicate the length of 174 the to within a byte and the UBITS field used to indicate 175 the length of the to a the bit. The LEN and UBITS information 176 permits re-assembly of the raw_data_block without knowledge of the 177 syntax of the bits within each in the raw_data_block. 179 2.3 Priority of AAC Frames 181 Depending on the signal's characteristics AAC uses different encoding 182 strategies. Stationary signals are processed using a 1024 sample 183 FFT. For transient signals a 128 sample FFT is used. Lost AAC frames 184 containing stationary signals can relatively easy be reconstructed, 185 hence they are less important to the decoder than frames containing 186 transient signals which can not or can just roughly be reconstructed. 188 This priority information is very important for AAC streaming over 189 lossy channels since it allows to adapt the reconstruct resp. 190 retransmit behavior of the streaming application or the forwarding 191 strategies inside the network (DiffServ). In order to flexibly respond 192 to packet loss and/or given bandwidth constraints four priority levels 193 are defined: 'low', 'lower', 'higher', 'high'. 'Low' priority denotes 194 frames with low perceptual entropy while 'high' priority denotes 195 frames with high perceptual entropy. 'Lower' and 'higher' priority 196 levels MUST be assigned to frames whose perceptual entropy is between 197 'high' and 'low', accordingly. 199 2.4 Interleaving of AAC Frames 201 Instead of using a static interleaving scheme (i.e. 7x7) only frames 202 with the same priority MUST be grouped. The sequence numbers SEQ of 203 the AAC frames and RSEQ of REPAIRDATA are used to restore the actual 204 order on the receiver side. Hence, the interleaving scheme does not 205 have to be defined rigidly. 207 2.5 Example RTP Packet Sequence 209 The below example shows how a sequence of AAC packets (a...p) with 210 assigned priorities (0=low, 3=high) MAY be grouped. RepairData is not 211 provided for low priority packets: 213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 214 | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | 215 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 216 | 0 | 0 | 0 | 2 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 3 | 217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 219 Proposed interleaving/grouping of AAC frames and assigned RepairData 220 R(x) being sent within the following RTP packet: 222 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 223 |a g j|b h k|c i l| d | e | f | m q | n | o | p | 224 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 225 | |R(d) |R(e) |R(f) | |R(n) |R(o) |R(p) | | | 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 3. Payload Format 229 The RTP payload consists of a 32 or 64 bit header, a variable number 230 of RepairData containing information needed to reconstruct lost AAC 231 frames and a variable number of AAC frames. The header basically 232 contains a vector of Priority Quantizers (PQ) specifying the priority 233 of the current and previous packets to the decoder to reconstruct the 234 original signal. The X bit specifies if the header contains 12 or 28 235 PQs. REPAIRLEN specifies the total number of 32bit words containing 236 RepairData. REPAIRLEN MUST be set to 0 if there is no RepairData. 237 Every REPAIRDATA or AAC FRAME is preceded by a sequence number (R)SEQ 238 and a length specifier (R)LEN. In case of fragmented AAC frames UBITS 239 specifies the number of unused bits in the last byte since frame 240 fragments may not be byte aligned. UBITS MUST be set to 0 if the 241 corresponding frame is not fragmented. 243 0 1 2 3 244 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 246 |X|REPAIRLEN |PRI VECTOR | Header 247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 248 |PRI VECTOR (continued), if X==1 | 249 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 250 |RSEQ |RLEN |REPAIRDATA 1 | 251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 252 | . | Repair 253 | . | Data 254 | . | 255 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 256 | |RSEQ |RLEN |REPAIRDATA N | 257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 258 | | 259 | | 260 | | 261 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 262 |SEQ |LEN |UBITS |AAC FRAME 1 | 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 264 | . | 265 | . | 266 | . | AAC 267 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Frames 268 | |SEQ |LEN |UBITS | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 |AAC FRAME N | 271 | | 272 | | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 PRI VECTOR: The actual priority vector. It contains either 12 or 28 275 Priority Quantifiers (PQ). An PQ element describes the 276 priority of the current packet. The size of an PQ is 2 bit. 277 Hence, four different priority levels can be assigned to 278 an RTP packet. 0 means low and 3 means high priority. 279 The first PQ refers to the current packet. The following 280 PQs refer to the most recent previous packets. 281 So, the vector looks like this: {PQ(t), PQ(t-1), PQ(t-2)...} 283 X: Vector Extension, the priority vector uses 56 instead of 24 284 bits. Hence, another 32bit word is required. 286 REPAIRLEN: The total number of 32bit words containing Repair 287 Data for previous/future frames. If REPAIRLEN==0 then 288 there is no repair information. 290 RSEQ: The SEQ number of the AAC frame REPAIRDATA belongs to. 292 RLEN: The length in bytes of REPAIRDATA. 294 REPAIRDATA: An 8bit aligned data array containing RepairData. 295 This information can be ignored and is not mandatory. 296 The syntax of the RepairData bits is exactly that of the AAC 297 raw_data_block. However, it SHOULD be a highly compressed 298 monophonic version of the signal being transmitted. 300 SEQ: 8 bit. The sequence number of the AAC frame. 301 The application has to make sure that the sequence number of 302 interleaved frames do not overlap. 304 LEN: 12 bit. The length of the actual AAC frame 306 UBITS: 4 bit. The number of unused bits in the last byte of the AAC 307 frame if the frame is fragmented. The RTP M-Bit is used as 308 a 'fragmented' tag. UBITS MUST be set to 0, if the frame is 309 not fragmented. 311 4. References 313 [1] ISO/IEC 13818-7 Advanced Audio Coding (AAC) 315 [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A 316 Transport Protocol for Real Time Applications RFC 1889, 317 Internet Engineering Task Force, January 1996. 319 [3] S. Bradner, Key words for use in RFCs to Indicate 320 Requirement Levels, RFC 2119, March 1997. 322 5. Authors' Addresses 324 Mathias Kretschmer 325 AT&T Labs - Research 326 180 Park Ave. 327 Florham Park, NJ 07932 328 USA 329 e-mail: mathias@research.att.com 331 Andrea Basso 332 AT&T Labs - Research 333 100 Schultz Drive 334 Red Bank, NJ 07701 335 USA 336 e-mail: basso@research.att.com 338 M. Reha Civanlar 339 AT&T Labs - Research 340 100 Schultz Drive 341 Red Bank, NJ 07701 342 USA 343 e-mail: civanlar@research.att.com 345 Schuyler R. Quackenbush 346 AT&T Labs - Research 347 180 Park Ave. 348 Florham Park, NJ 07932 349 USA 350 e-mail: srq@research.att.com 352 James H. Snyder 353 AT&T Labs - Research 354 180 Park Ave. 355 Florham Park, NJ 07932 356 USA 357 e-mail: jhs@research.att.com