idnits 2.17.1 draft-ietf-avt-rtp-eac3-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 774. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 751. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 758. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 764. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 19, 2006) is 6554 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ETSI' ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 3555 (Obsoleted by RFC 4855, RFC 4856) ** Obsolete normative reference: RFC 2327 (Obsoleted by RFC 4566) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Link 3 Internet-Draft Dolby Laboratories 4 Expires: October 21, 2006 April 19, 2006 6 RTP Payload Format for E-AC-3 Audio 7 draft-ietf-avt-rtp-eac3-01 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on October 21, 2006. 34 Copyright Notice 36 Copyright (C) The Internet Society (2006). 38 Abstract 40 This document describes an RTP payload format for transporting 41 Enhanced AC-3 (E-AC-3) encoded audio data. E-AC-3 is a high quality, 42 multichannel audio coding format and is an extension of the AC-3 43 audio coding format, which is used in US HDTV, DVD, cable and 44 satellite television and other media. E-AC-3 is an optional audio 45 format in US and world-wide digital television and high definition 46 DVD formats. The RTP payload format as presented in this document 47 includes support for data fragmentation. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Overview of Enhanced-AC-3 . . . . . . . . . . . . . . . . . . 3 53 2.1. E-AC-3 Bit Stream . . . . . . . . . . . . . . . . . . . . 5 54 2.1.1. Sync Frames and Audio Blocks . . . . . . . . . . . . . 5 55 2.1.2. Programs and Substreams . . . . . . . . . . . . . . . 6 56 2.1.3. Frame Sets . . . . . . . . . . . . . . . . . . . . . . 7 57 3. RTP E-AC-3 Header Fields . . . . . . . . . . . . . . . . . . . 8 58 4. RTP E-AC-3 Payload Format . . . . . . . . . . . . . . . . . . 8 59 4.1. Payload Specific Header . . . . . . . . . . . . . . . . . 9 60 4.2. Fragmentation of E-AC-3 Frames . . . . . . . . . . . . . . 10 61 4.3. Concatenation of E-AC-3 Frames . . . . . . . . . . . . . . 10 62 4.4. Carriage of AC-3 Frames . . . . . . . . . . . . . . . . . 10 63 5. Types and Names . . . . . . . . . . . . . . . . . . . . . . . 10 64 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 11 65 5.2. SDP Usage . . . . . . . . . . . . . . . . . . . . . . . . 13 66 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 67 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 15 68 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 69 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 70 9.1. Normative References . . . . . . . . . . . . . . . . . . . 15 71 9.2. Informative References . . . . . . . . . . . . . . . . . . 16 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 17 73 Intellectual Property and Copyright Statements . . . . . . . . . . 18 75 1. Introduction 77 The Enhanced AC-3 (E-AC-3) [ETSI] audio coding system is built on a 78 foundation of AC-3. It is an enhancement and extension to AC-3, 79 which is an existing audio coding standard commonly used for DVD, 80 broadcast, cable, and satellite television content. E-AC-3 is 81 designed to enable operation at both higher and lower data rates than 82 AC-3, provide expanded channel configurations, and provide greater 83 flexibility for carriage of multiple audio program elements. The 84 relationship between E-AC-3 and AC-3 provides for low-loss, low-cost 85 conversion between the two and makes E-AC-3 especially suitable in 86 applications that require compatibility with the existing broadcast- 87 reception and audio/video decoding infrastructure. Dolby Digital 88 Plus is a branded version of Enhanced AC-3. 90 E-AC-3 has been standardized within both the European 91 Telecommunications Standards Institute (ETSI) and the Advanced 92 Television Standards Committee (ATSC). It is an optional audio 93 format for use in US (ATSC) and DVB digital television transmission. 94 It is also a required audio format for use in the HD-DVD optical- 95 storage media format and proposed for use in the Blu-ray Disc format. 97 There is a need to stream E-AC-3 content over IP networks. E-AC-3 is 98 primarily used in audio-for-video applications, so RTP serves well as 99 a transport solution with its mechanism for synchronizing streams. 100 Applications for streaming E-AC-3 include Internet Protocol 101 television (IPTV), video on demand, interactive features of next 102 generation DVD formats, and transfer of movies across a home network. 104 Section 2 gives a brief overview of the E-AC-3 algorithm. Section 3 105 specifies values for fields in the RTP header, while Section 4 106 specifies the E-AC-3 payload format, itself. Section 5 discusses 107 media types and SDP usage. Security considerations are covered in 108 Section 6, congestion control in Section 7, and IANA considerations 109 in Section 8. 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in [RFC2119]. 115 2. Overview of Enhanced-AC-3 117 Enhanced AC-3 (E-AC-3) is a frequency-domain perceptual audio coding 118 system. Time blocks of an audio signal are converted from the time 119 domain to the frequency domain by a transform (the Modified Discrete 120 Cosine Transform (MDCT)) so that a model of the human auditory 121 perceptual system can be applied. In this domain, quantization noise 122 can be constrained to specific frequency regions. The perceptual 123 model predicts in which frequency regions the auditory system will be 124 least able to detect the quantization noise from data rate reduction. 125 A more detailed technical description of E-AC-3 can be found in this 126 paper [2004AES]. 128 E-AC-3 is built upon a foundation of AC-3. More background on AC-3 129 can be found in the AC-3 specification [ETSI], a technical paper 130 [1994AES], and the AC-3 RTP payload format [RFC4184]. The frame 131 structure and meta-data of AC-3 are maintained. E-AC-3 content is 132 not directly compatible with AC-3 decoders, but it can be converted 133 to the AC-3 format to provide compatibility with existing decoders. 134 Because AC-3 is the foundation of E-AC-3, conversion between the two 135 formats can be done in a way that minimizes the degradations 136 associated with tandem coding. Additionally, the computational cost 137 of the conversion is reduced compared to a full decode and re-encode. 139 E-AC-3 exploits psychoacoustic phenomena that cause a significant 140 fraction of the information contained in a typical audio signal to be 141 inaudible. Substantial data reduction occurs via the removal of 142 inaudible information contained in an audio stream. Source coding 143 techniques are further used to reduce the data rate. 145 Like most perceptual coders, E-AC-3 operates in the frequency domain. 146 A 512-point MDCT transform is taken with 50% overlap, providing 256 147 new frequency samples. Frequency samples are then converted to 148 exponents and mantissas. Exponents are differentially encoded. 149 Mantissas are allocated a varying number of bits depending on the 150 audibility of the spectral components associated with them. 151 Audibility is determined via a masking curve. Bits for mantissas are 152 allocated from a global bit pool. 154 E-AC-3 adds new coding tools, such as a longer filter bank, vector 155 quantization, and spectral extension, to provide greater data 156 efficiency and to operate at lower data rates than AC-3. In the 157 other direction, an expanded bit stream syntax and new frame 158 constraints permit operation at higher data rates than AC-3. The 159 E-AC-3 syntax also allows a larger number of audio channels in one 160 bit stream. E-AC-3 operates at data rates from 32 kbps to 6.144 Mbps 161 and at three sampling rates: 32 kHz, 44.1 kHz, and 48 kHz. 163 E-AC-3 supports the carriage of multiple programs, and the carriage 164 of programs with more than a baseline of 5.1 audio channels. Both of 165 these extensions beyond AC-3 are accomplished by time multiplexing 166 additional data with baseline data. In the case of multiple 167 programs, frames with data for the programs are interleaved. In the 168 case of more than 5.1 channels, frames from substreams carrying the 169 extra channels are interleaved with the independent substream which 170 carries a 5.1-channel compatible mix. Both of these forms of 171 multiplexing can occur in the same bit stream. In other words, 172 mixing multiple programs, some or all with more than 5.1 channels, is 173 permitted. 175 Additional channel capacity is enabled by adding substreams to a 176 program. One primary substream, called the "independent substream," 177 is required for each program. This substream carries a self- 178 contained mix of the audio, using a maximum of 5.1 channels, which 179 makes its channel configuration compatible with AC-3. Then, 180 additional, optional substreams are used in the program to carry 181 additional channels. The data for each additional channel carries an 182 indication of whether that channel provides data for an additional 183 speaker location or replacement data for one of the speaker locations 184 already defined by a previous substream. For example, one common 185 7.1-channel format uses 3 front channels and 4 surround channels. It 186 is packaged with a primary substream, which contains a 5.1 channel 187 downmix of the 7.1-channel content, using left, center, right, left 188 surround, right surround and low frequency effects channels. One 189 dependent substream supplies four channels: replacements for left 190 surround and right surround, along with two additional surround 191 channels (left back and right back.) 193 The specification for E-AC-3 [ETSI] requires that all E-AC-3 decoders 194 be capable of decoding at least a baseline portion of any E-AC-3 bit 195 stream, which consists of the first independent substream of the 196 first program, and of ignoring the other elements of the bit stream. 197 This baseline is limited to 5.1 channels and a system is also able to 198 convert to configurations with fewer channels for a presentation that 199 matches its output capabilities, if needed. More capable decoders 200 can optionally choose among and mix multiple programs, and also 201 decode configurations with more channels than the baseline by 202 decoding dependent substreams. 204 2.1. E-AC-3 Bit Stream 206 2.1.1. Sync Frames and Audio Blocks 208 The basic organizational building block in an E-AC-3 bit stream is 209 the sync frame (also called a frame in this document.) A sync frame 210 contains the data necessary to decode time domain audio samples for 211 one or more channels over a time of one or more audio blocks, so a 212 frame is an Application Data Unit (ADU). Each E-AC-3 frame contains 213 a Sync Information (SI) field, a Bit Stream Information (BSI) field, 214 an Audio Frame (AF) field, and up to six audio blocks (AB). Each AB 215 represents 256 PCM samples for each channel. The frame ends with an 216 optional auxiliary data field (AUX) and an error correction field 217 (CRC). Figure 1 shows the structure of an E-AC-3 frame, where N is 218 the number of blocks in the frame. 220 +---+---+---+---------+- ... -+---------+---+---+ 221 |SI |BSI|AF | AB(0) | ... | AB(N) |AUX|CRC| 222 +---+---+---+---------+- ... -+---------+---+---+ 224 Figure 1. E-AC-3 Frame Format with more than one block 226 The SI field contains information needed to acquire and maintain 227 codec synchronization. The BSI field contains parameters that 228 describe the coded audio service. It carries an indication of the 229 size of the frame in 16-bit words ('frmsiz', Section E.1.3 of [ETSI]) 230 and an indication of the sampling rate ('fscod'). It also carries an 231 indication of the number of blocks in the frame ('numblkscod'); 232 permitted values are one, two, three, or six blocks. The AF field 233 contains information about coding tools that applies to the entire 234 frame. Each block has a duration of 256 samples, so a frame's 235 duration is the corresponding multiple of 256 samples. The time 236 duration of the frame is also dependent on the sampling rate, as 237 shown in Table 1. 239 Table 1. Time duration of E-AC-3 Frame (number of blocks vs. 240 sampling rate) 242 +------------------+--------+-----------------+-----------------+ 243 | blocks per frame | 32 kHz | 44.1 kHz | 48 kHz | 244 +------------------+--------+-----------------+-----------------+ 245 | 1 | 8 ms | approx. 5.8 ms | approx. 5.3 ms | 246 | 2 | 16 ms | approx. 11.6 ms | approx. 10.7 ms | 247 | 3 | 24 ms | approx. 17.4 ms | 16 ms | 248 | 6 | 48 ms | approx. 34.8 ms | 32 ms | 249 +------------------+--------+-----------------+-----------------+ 251 Each audio block contains header fields that indicate the use of 252 various coding tools: block switching, dither, coupling, spectral 253 extension and exponent strategy. They also contain metadata, 254 optionally used to enhance playback, such as dynamic range control. 255 Finally, the exponents and bit allocation data needed to decode the 256 mantissas into audio data, and the mantissas themselves, are 257 included. The format of audio blocks is described in detail in 258 [ETSI]. 260 2.1.2. Programs and Substreams 262 An E-AC-3 bit stream is logically arranged into programs. A bit 263 stream contains one or more programs, up to a maximum of eight. When 264 multiple programs are present in a bit stream, the frames that make 265 them up are interleaved in time. 267 +----------+- -+----------+----------+- -+----------+- 268 |Program(1)| ... |Program(N)|Program(1)| ... |Program(N)| ... 269 | Frame 0 | | Frame 0 | Frame 1 | | Frame 1 | 270 +----------+- -+----------+----------+- -+----------+- 272 Figure 2. Interleaving of multiple programs in an E-AC-3 bit stream 274 Each program contains one independent substream and optionally 275 contains up to eight dependent substreams. The independent substream 276 carries a soundtrack of up to 5.1 channels, the multichannel format 277 that matches the capabilities of AC-3, and can be meaningfully 278 decoded and presented without any of the associated dependent 279 substreams. The dependent substreams are used to provide alternate 280 channel data which enable different channel configurations, for 281 example, to increase the number of channels beyond 5.1. A frame of a 282 dependent substream can decoded by itself, but its content can only 283 be meaningfully presented in conjunction with the corresponding 284 independent substream. The type and identity of the substream to 285 which a frame belongs can be determined from parameters in the 286 frame's BSI (strmtyp and substreamid, in section E.1.3.1 of [ETSI]). 287 When a program contains more than one substream, the frames belonging 288 to those substreams are interleaved in time, and taken together, the 289 frames of a program that correspond to the same time period, are 290 called a 'program set'. Figure 3 shows the interleaving of 291 substreams for a single program. 293 / --------- program set for frame 0 ------- \ 294 : : 295 +-------------+-------------+- -+-------------+-------------+- 296 | Program(1) | Program(1) | | Program(1) | Program(1) | 297 | Independent | Dependent | ... | Dependent | Independent | ... 298 | Substream | Substream(0)| | Substream(n)| Substream | 299 | Frame 0 | Frame 0 | | Frame 0 | Frame 1 | 300 +-------------+-------------+- -+-------------+-------------+- 302 Figure 3. Interleaving of multiple substreams in an E-AC-3 program 304 2.1.3. Frame Sets 306 A further logical organization of the E-AC-3 bit stream is applied to 307 facilitate conversion of E-AC-3 bit streams to AC-3 bit streams. In 308 this organization, the frames carrying six consecutive audio blocks 309 are treated as group, called a 'frame set', regardless of the number 310 of frames needed to carry six audio blocks. This grouping extends 311 across all programs and substreams which cover the time period of the 312 six blocks. Since E-AC-3 frames may carry one, two, three or six 313 blocks, a frame set will consist of six, three, two, or one frames. 314 AC-3 frames always carry six blocks, so the frame set provides 315 framing synchronization between an E-AC-3 bit stream and an AC-3 bit 316 stream. Metadata which indicates the alignment is carried in the 317 first frame (which will be part of an independent substream) of each 318 frame set in an E-AC-3 stream. This first frame can be identified by 319 a parameter in the BSI field of the bit stream: the Converter 320 Synchronization flag (convsync, in section E.1.3.1.34 of [ETSI]) is 321 set to true (1). 323 3. RTP E-AC-3 Header Fields 325 The RTP header is defined in the RTP Specification [RFC3550]. This 326 section defines how a number of fields in the header are used. 328 o Payload Type (PT): The assignment of an RTP payload type for this 329 packet format is outside the scope of this document; it is 330 specified by the RTP profile under which this payload format is 331 used, or signaled dynamically out-of-band (e.g., using SDP). 333 o Marker (M) bit: The M bit is set to one to indicate that the RTP 334 packet payload contains at least one complete E-AC-3 frame or 335 contains the final fragment of an E-AC-3 frame. 337 o Extension (X) bit: Defined by the RTP profile used. 339 o Timestamp: A 32-bit word that corresponds to the sampling instant 340 for the first E-AC-3 frame in the RTP packet. Packets containing 341 fragments of the same frame MUST have the same time stamp. The 342 timestamp of the first RTP packet sent SHOULD be selected at 343 random; thereafter it increases linearly according to the number 344 of samples included in each frame. Note that the number of 345 samples in a frame depends on the number of blocks in the frame, 346 with 256 samples in each block. Also note that more than one 347 frame might correspond to the same time period when multiple 348 channel configurations or programs are present. If these frames 349 occupy multiple packets, it is possible that the resulting packets 350 will have the same time stamp value. 352 4. RTP E-AC-3 Payload Format 354 This payload format is defined for E-AC-3, as defined in Annex E of 355 [ETSI]. Note that E-AC-3 decoders are required to be capable of 356 decoding AC-3 bit streams, so a receiver capable of receiving the 357 E-AC-3 payload format defined in this document MUST also receive the 358 payload format for AC-3 defined in [RFC4184]. 360 According to [RFC2736], RTP payload formats should contain an 361 integral number of application data units (ADUs). The E-AC-3 frame 362 corresponds to an ADU in the context of this payload format. Each 363 RTP payload MUST start with the two-byte payload specific header 364 followed by an integral number of complete E-AC-3 frames, or a single 365 fragment of an E-AC-3 frame. 367 If an E-AC-3 frame exceeds the MTU for a network, it SHOULD be 368 fragmented for transmission within an RTP packet. Section 4.2 369 provides guidelines for creating frame fragments. 371 4.1. Payload Specific Header 373 There is a two-octet Payload Header at the beginning of each payload. 374 Each E-AC-3 RTP payload MUST begin with the following payload header. 376 0 1 377 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 379 | MBZ |F| NF | 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 Figure 4. E-AC-3 RTP Payload Header 384 o Must Be Zero (MBZ): Bits marked MBZ SHALL be set to the value zero 385 and SHALL be ignored by receivers. The bits are reserved for 386 future extensions. 388 o Frame Type (F): This one-bit field indicates the type of frame(s) 389 present in the payload. It takes the following values: 390 0 - One or more complete frames. 391 1 - Fragment of frame. (Note that the M bit in the RTP header 392 is set for the final fragment.) 394 o Number of frames/fragments(NF): An 8-bit field whose meaning 395 depends on the Frame Type (F) in this payload. For complete 396 frames (F of 0), it is used to indicate the number of E-AC-3 397 frames in the RTP payload. For frame fragments (F of 1), it is 398 used to indicate the number of fragments (and therefore packets) 399 that make up the current frame. NF MUST be identical for packets 400 containing fragments of the same frame. 402 When receiving E-AC-3 payloads with F = 0 and more than a single 403 frame (NF > 1), a receiver needs to use the "frmsiz" field in the BSI 404 header in each E-AC-3 frame to determine the frame's length if the 405 receiver needs to determine the boundary of the next frame. Note 406 that the frame length varies from frame to frame in some 407 circumstances. 409 4.2. Fragmentation of E-AC-3 Frames 411 The size of an E-AC-3 frame is signaled in the Frame Size (frmsiz) 412 field in a frame's BSI header. The value of this field is one less 413 than the number of 16-bit words in the frame. If the size of an 414 E-AC-3 frame exceeds the MTU size, the frame SHOULD be fragmented at 415 the RTP level. The fragmentation MAY be performed at any byte 416 boundary in the frame. RTP packets containing fragments of the same 417 E-AC-3 frame SHALL be sent in consecutive order, from first to last 418 fragment. This enables a receiver to assemble the fragments in the 419 correct order. 421 4.3. Concatenation of E-AC-3 Frames 423 There are cases where E-AC-3 frame sizes are smaller than the MTU 424 size and it is advantageous to include multiple frames in a packet. 425 It is useful to take into account the logical arrangement of the bit 426 stream into program sets and frame sets to constrain the effects of 427 the loss of a packet. It is desirable for a complete program set or 428 a complete frame set to be included in one packet. Also, it is 429 undesirable for frames from more than one program set or frame set to 430 be in the same packet, unless the sets are complete. In this way, 431 the loss of a packet is kept from causing the contents of another 432 packet to be unusable. 434 Frames from more than one program set SHOULD NOT be included in the 435 same packet unless all program sets in the packet are complete. 436 Frames from more than one frame set SHOULD NOT be included in the 437 same packet unless all frame sets in the packet are complete. 439 4.4. Carriage of AC-3 Frames 441 The E-AC-3 specification [ETSI] requires that E-AC-3 decoders be 442 capable of decoding AC-3 frames. That specification also supports 443 carriage of AC-3 frames in an E-AC-3 bit stream. Due to differences 444 between E-AC-3 and AC-3 frames, there are restrictions placed on the 445 use of AC-3 frames: they are only used for the independent substream 446 of the first (or only) program in an E-AC-3 bit stream. Note that 447 carriage of: only E-AC-3 frames, only AC-3 frames, a mixture of 448 E-AC-3 and AC-3 frames are all legal configurations. It is legal to 449 change among the configurations in a bit stream. The AC-3 frame 450 format is described in [RFC4184] and specified in [ETSI] 452 5. Types and Names 453 5.1. Media Type Registration 455 This registration uses the template defined in [RFC4288] and follows 456 [RFC3555]. 458 To: ietf-types@iana.org 459 Subject: Registration of media type audio/eac3 461 Type name: audio 463 Subtype name: eac3 465 Required parameter: 467 o rate: The RTP timestamp clock rate which is equal to the audio 468 sampling rate. Permitted rates are 32000, 44100, and 48000. 470 Optional parameter: 472 o bitStreamConfig: The configuration of programs and substreams in 473 the bit stream, expressed as a sequence of ASCII characters. This 474 parameter can serve two purposes. During the creation of a 475 session, the bitStreamConfig parameter might be used to negotiate 476 a match between the requirements of a bit stream and the 477 capabilities of a receiver to avoid using network bandwidth for 478 data that can't be used. Second, it makes the configuration of 479 the bit stream explicit to the receiver so that whenever a packet 480 is lost, the receiver can identify which kind of frame(s) has been 481 lost to aid error mitigation. 483 The format for the value for this parameter is to represent each 484 substream of the bit stream by a single character indicating its 485 type, immediately followed by the number of audio channels 486 resulting if a frame of that substream (plus any other required 487 substreams) is decoded. Note that even though LFE channels are 488 often described as "fractional" channels (e.g., the ".1" in 5.1), 489 for this parameter, an LFE channel is counted as one (e.g., a 5.1- 490 channel configuration is indicated as 6.) The configuration of 491 the bit stream MUST match the value of this parameter for the 492 duration of the session. 494 Allowed values for the substream type are: 495 i - Independent substream. 496 d - Dependent substream. 498 The E-AC-3 specification [ETSI] defines which configurations of 499 bit streams are legal, which constrains the values the 500 bitStreamConfig parameter will take. Each program starts with, 501 and contains exactly one, independent substream ('i'). Each 502 independent substream is followed by between 0 and 8 dependent 503 substreams ('d'), which belong to the same program. See 504 Section 2.1.2 for more discussion of programs and substreams. 506 For example, consider a bit stream containing two programs: 507 * the first program with 508 + a six-channel independent substream 509 + a dependent substream containing the additional channels 510 needed for eight channels 511 + a second dependent substream containing the further channels 512 needed for 14 channels 513 * along with a second program with 514 + another six-channel independent substream 515 + a dependent substream containing the additional channels 516 needed for eight channels 517 Then the configuration of the bit stream is indicated as: 519 bitStreamConfig = i6d8d14i6d8 521 When the bitStreamConfig parameter is being used in an offer/ 522 answer exchange, zero (0) for the number of channels for a 523 substream in an answer is used to indicate a substream that the 524 answerer desires to not receive. 526 Encoding considerations: 528 This media type is framed and contains binary data. 530 Security considerations: 532 See Section 6 of RFC XXXX. 534 Interoperability considerations: 536 To maintain interoperability with AC-3-capable end-points, in 537 cases where negotiation is possible, an E-AC-3 end-point SHOULD 538 declare itself also as AC-3 capable (i.e., supporting also "audio/ 539 ac3" as specified in RFC 4184 [RFC4184]). Note that all E-AC-3 540 end-points are required to be AC-3 capable. 542 Published specification: 544 RFC XXXX and ETSI TS 102.366 [ETSI]. 546 Applications that use this media type: 548 Multichannel audio compression of audio, and audio for video. 550 Additional Information: 552 Magic number(s): 553 The first two octets of an E-AC-3 frame are always the 554 synchronization word, which has the hex value 0x0B77. 556 Person & email address to contact for further information: 558 Brian Link 559 IETF AVT working group. 561 Intended Usage: 563 COMMON 565 Restrictions on usage: 567 This media type depends on RTP framing, and hence is only defined 568 for transfer via RTP [RFC3550]. Transport within other framing 569 protocols is not defined at this time. 571 Author/Change controller: 573 IETF Audio/Video Transport Working Group delegated from the IESG. 575 5.2. SDP Usage 577 The information carried in the media type specification has a 578 specific mapping to fields in the Session Description Protocol (SDP) 579 [RFC2327], which is commonly used to describe RTP sessions. When SDP 580 is used to specify sessions employing E-AC-3, the mapping is as 581 follows: 583 o The Media type ("audio") goes in SDP "m=" as the media name. 585 o The Media subtype ("eac3") goes in SDP "a=rtpmap" as the encoding 586 name. 588 o The required parameter "rate" also goes in "a=rtpmap" as the clock 589 rate. (The optional "channels" rtpmap encoding parameter is not 590 used. Instead, the information is included in the optional 591 parameter bitStreamConfig.) 593 o The optional parameter "bitStreamConfig" goes in the SDP "a=fmtp" 594 attribute. 596 An example of the SDP data for E-AC-3: 598 m=audio 49111 RTP/AVP 100 599 a=rtpmap:100 eac3/48000 600 a=fmtp:100 bitStreamConfig i6d8d14i6d8 602 Certain considerations are needed when SDP is used to perform offer/ 603 answer exchanges [RFC3264]. 605 o The "rate" is a symmetric parameter, and the answer MUST use the 606 same value or the answerer removes the payload type. 608 o The "bitStreamConfig" parameter is declarative and indicates, for 609 sendonly, the intended arrangement of substreams in the bit 610 stream, along with the channel configuration, to transmit, and for 611 recvonly or sendrecv, the desired bit stream arrangement and 612 channel configuration to receive. The format of the 613 bitStreamConfig value in an answer MAY differ from the offer value 614 by replacing the number of channels for any undesired substreams 615 with '0'. It is valid to zero out dependent substreams containing 616 undesired channel configurations and to zero out all the 617 substreams of an undesired program. Then the sender MAY reoffer 618 the stream in the receiver's preferred configuration if it is 619 capable of providing that configuration. Note that all receivers 620 are capable of receiving, and all decoders are capable of 621 decoding, any of the legal bit stream configurations, so the 622 parameter exchange is not needed for interoperability. The 623 parameter exchange might be used to help optimize the transmission 624 to the number of programs or channels the receiver requests. 626 o Since an AC-3 bit stream is a special case of an E-AC-3 bit 627 stream, it is permissible for an AC-3 bit stream to be carried in 628 the E-AC-3 payload format. To ensure interoperability with 629 receivers which support the AC-3 payload format, but not the 630 E-AC-3 payload format, a sender which desires to send an AC-3 bit 631 stream in the E-AC-3 payload format SHOULD also offer the session 632 in the AC-3 payload format by including payload types for both 633 media sub-types: 'ac3' and 'eac3'. 635 6. Security Considerations 637 The payload format described in this document is subject to the 638 security considerations defined in RTP [RFC3550] and in any 639 applicable RTP profile (e.g. [RFC3551]). To protect the user's 640 privacy and any copyrighted material, confidentiality protection 641 would have to be applied. To also protect against modification by 642 intermediate entities and ensure the authenticity of the stream, 643 integrity protection and authentication would be required. 644 Confidentiality, integrity protection, and authentication have to be 645 solved by a mechanism external to this payload format, e.g., SRTP 646 [RFC3711]. 648 The E-AC-3 format is designed so that the validity of data frames can 649 be determined by decoders. The required decoder response to a 650 malformed frame is to discard the malformed data and conceal the 651 errors in the audio output until a valid frame is detected and 652 decoded. This is expected to prevent crashes and other abnormal 653 decoder behavior in response to errors or attacks. 655 7. Congestion Control 657 The general congestion control considerations for transporting RTP 658 data apply to E-AC-3 audio over RTP as well, see RTP [RFC3550], and 659 any applicable RTP profile (e.g., [RFC3551]). 661 E-AC-3 encoders may use a range of bit rates to encode audio data, so 662 it is possible to adapt network bandwidth by adjusting the encoder 663 bit rate in real time or by having multiple copies of content encoded 664 at different bit rates. Additionally, packing more frames in each 665 RTP payload can reduce the number of packets sent and hence the 666 overhead from IP/UDP/RTP headers, at the expense of increased delay 667 and reduced error robustness against packet losses. 669 8. IANA Considerations 671 Registration of a new media subtype for E-AC-3 is requested (see 672 Section 5.) 674 9. References 676 9.1. Normative References 678 [ETSI] ETSI, "Digital Audio Compression (AC-3, Enhanced AC-3) 679 Standard", TS 102 366, February 2005. 681 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 682 Requirement Levels", BCP 14, RFC 2119, March 1997. 684 [RFC4184] Link, B., Hager, T., and J. Flaks, "RTP Payload Format for 685 AC-3 Audio", RFC 4184, October 2005. 687 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 688 Jacobson, "RTP: A Transport Protocol for Real-Time 689 Applications", STD 64, RFC 3550, July 2003. 691 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 692 Registration Procedures", BCP 13, RFC 4288, December 2005. 694 [RFC3555] Casner, S. and P. Hoschka, "MIME Type Registration of RTP 695 Payload Formats", RFC 3555, July 2003. 697 [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description 698 Protocol", RFC 2327, April 1998. 700 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 701 with Session Description Protocol (SDP)", RFC 3264, 702 June 2002. 704 9.2. Informative References 706 [2004AES] Fielder, L., Andersen, R., Crockett, B., Davidson, G., 707 Davis, M., Turner, S., Vinton, M., and P. Williams, 708 "Introduction to Dolby Digital Plus, an Enhancement to the 709 Dolby Digital Coding System", Preprint 6196, Presented at 710 the 117th Convention of the Audio Engineering Society, 711 October 2004. 713 [1994AES] Todd, C., Davidson, G., Davis, M., Fielder, L., Link, B., 714 and S. Vernon, "AC-3: Flexible Perceptual Coding for Audio 715 Transmission and Storage", Preprint 3796, Presented at the 716 96th Convention of the Audio Engineering Society, 717 May 1994. 719 [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP 720 Payload Format Specifications", BCP 36, RFC 2736, 721 December 1999. 723 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 724 Video Conferences with Minimal Control", STD 65, RFC 3551, 725 July 2003. 727 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 728 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 729 RFC 3711, March 2004. 731 Author's Address 733 Brian Link 734 Dolby Laboratories 735 100 Potrero Ave. 736 San Francisco, CA 94103 737 US 739 Phone: +1 415 558 0200 740 Email: bdl@dolby.com 742 Intellectual Property Statement 744 The IETF takes no position regarding the validity or scope of any 745 Intellectual Property Rights or other rights that might be claimed to 746 pertain to the implementation or use of the technology described in 747 this document or the extent to which any license under such rights 748 might or might not be available; nor does it represent that it has 749 made any independent effort to identify any such rights. Information 750 on the procedures with respect to rights in RFC documents can be 751 found in BCP 78 and BCP 79. 753 Copies of IPR disclosures made to the IETF Secretariat and any 754 assurances of licenses to be made available, or the result of an 755 attempt made to obtain a general license or permission for the use of 756 such proprietary rights by implementers or users of this 757 specification can be obtained from the IETF on-line IPR repository at 758 http://www.ietf.org/ipr. 760 The IETF invites any interested party to bring to its attention any 761 copyrights, patents or patent applications, or other proprietary 762 rights that may cover technology that may be required to implement 763 this standard. Please address the information to the IETF at 764 ietf-ipr@ietf.org. 766 Disclaimer of Validity 768 This document and the information contained herein are provided on an 769 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 770 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 771 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 772 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 773 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 774 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 776 Copyright Statement 778 Copyright (C) The Internet Society (2006). This document is subject 779 to the rights, licenses and restrictions contained in BCP 78, and 780 except as set forth therein, the authors retain all their rights. 782 Acknowledgment 784 Funding for the RFC Editor function is currently provided by the 785 Internet Society.