idnits 2.17.1 

draft-ietf-avt-rtp-mpeg2aac-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 8 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 3 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 25, 1999) is 9072 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '2' is defined on line 315, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550)


     Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force            Kretschmer-AT&T/Basso-AT&T
2	INTERNET DRAFT                             Civanlar-AT&T/Quackenbush-AT&T
3	File:draft-ietf-avt-rtp-mpeg2aac-00.txt    Snyder-AT&T
4	                                           June 25, 1999
5	                                           Expires: December 25, 1999

7	                RTP Payload Format for MPEG-2 AAC Streams

9	                         STATUS OF THIS MEMO

11	This document is an Internet-Draft and is in full conformance with all
12	provisions of Section 10 of RFC2026.

14	Internet-Drafts are working documents of the Internet Engineering Task
15	Force (IETF), its areas, and its working groups.  Note that other
16	groups may also distribute working documents as Internet-Drafts.

18	Internet-Drafts are draft documents valid for a maximum of six months
19	and may be updated, replaced, or obsoleted by other documents at any
20	time.  It is inappropriate to use Internet- Drafts as reference
21	material or to cite them other than as "work in progress."

23	The list of current Internet-Drafts can be accessed at
24	http://www.ietf.org/ietf/1id-abstracts.txt

26	The list of Internet-Draft Shadow Directories can be accessed at
27	http://www.ietf.org/shadow.html.

29	                                 Abstract

31	This document describes a payload format for transporting MPEG-2 AAC
32	encoded data using RTP. MPEG-2 AAC is a recent standard from ISO/IEC
33	for the coding of multi-channel audio data. Several services provided
34	by RTP are beneficial for MPEG-2 AAC encoded data transport over the
35	Internet. Additionally, the use of RTP makes it possible to
36	synchronize MPEG-2 AAC data with other real-time data types.

38	1. Introduction

40	The ISO/IEC MPEG-2 Advanced Audio Coding (AAC) [1] technology delivers
41	unsurpassed audio quality at rates at or below 64 kbps/channel.  It
42	has a very flexible bitstream syntax that supports from 1 to 48 audio
43	channels, up to 16 subwoofer channels and up to 16 embedded data
44	channels.  AAC supports a wide range of sampling frequencies (from 16
45	kHz to 96 kHz) which enables it to have an extremely wide range of
46	bitrates.  This permits it to support applications ranging from
47	professional or home theater sound systems to Internet music broadcast
48	systems.

50	The benefits of using RTP for MPEG-2 AAC data stream transport include:

52	    i. Ability to synchronize MPEG-2 AAC streams with other RTP payloads

54	    ii. Monitoring MPEG-2 AAC delivery performance through RTCP

56	    iii. Combining MPEG-2 AAC and other real-time data streams received
57	    from multiple end-systems into a set of consolidated streams
58	    through RTP mixers

60	    iv. Converting data types, etc. through the use of RTP translators.

62	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
63	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
64	document are to be interpreted as described in RFC 2119 [3].

66	1.1 Overview of MPEG-2 AAC

68	AAC combines the coding efficiencies of a high resolution filter bank,
69	a powerful model of audio perception, backward-adaptive prediction,
70	joint channel coding, and Huffman to delivering excellent signal
71	compression. In 1998 the MPEG Audio subgroup tested the family of MPEG
72	audio coders (see http://www.tnt.uni-hannover.de/project/mpeg/audio/
73	public/w2006.pdf). The test results indicate that for a stereo signal,
74	AAC at 96 kb/s has audio quality comparable to MPEG-3 Layer 3 ("mp3")
75	at 128 kb/s.  Therefore at equivalent quality levels, AAC offers
76	approximately 1/3 greater compression than Layer 3.

78	AAC is a block oriented, variable rate coding algorithm, which means
79	that the AAC encoder reads 1024 samples of the input signal file and
80	writes a variable number of compressed output bits that represent that
81	block of input data. A sample can be one or more channels. Rate
82	control can be used in the encoder such that the output bit rate is
83	averaged to a predetermined rate, as would be required for
84	constant-rate communication channels. Each block of AAC compressed
85	bits is called a "raw data block", and it has the nice property that
86	it can be decoded "stand-alone", that is, without knowledge of
87	information in prior bitstream blocks. This is ideal for packet
88	communication channels, in that if the payload of a packet is a single
89	raw data block, packet framing facilitates encoder and decoder
90	synchronization and, most importantly, loss of a single packet does
91	not impair the decodability of adjacent packets.

93	1.2 Bitstream Syntax

95	As already stated, a raw data block represents audio data for a time
96	period of 1024 samples and may also contain related information and
97	other data. The syntax of an AAC bitstream is as follows:

99	<bitstream>        => <raw_data_block><bitstream>
100	<raw_data_block>   => [<element>]<END><PAD>

102	where <bitstream> indicates the AAC bitstream, <lowercase> indicates
103	intermediate tokens, <UPPERCASE> indicates terminal tokens and []
104	indicates one or more occurance. <END> is a token that indicates the
105	end of a raw_data_block and <PAD> is a variable length token that
106	forces the total length of a raw_data_block to be an integral number
107	of byes. In general, intermediate tokens are not an integral number of
108	bytes in length.

110	The <element> tokens are a string of bits of varying length, and can
111	be any of the following:

113	<single_channel_element>     represent a single audio channel
114	<channel_pair_element>       represent a stereo presentation (2 channels)
115	<coupling_channel_element>   a mechanism for multi-channel compression
116	<lfe_channel_element>        represent a special effects channel
117	<data_stream_element>        represent "user data"
118	<program_config_element>     a mechanism for describing the bitstream
119	                             content
120	<fill_element>               a mechanism to use bits (for constant rate
121	                             channels)

123	The <elements> above can occur several times in a single
124	raw_data_block. For example, the raw_data_block for a 5.1 surround
125	sound signal would be:

127	<single_channel_element><channel_pair_element>...
128	                     .
129	                     .
130	                     .
131	...<channel_pair_element><lfe_channel_element><END>

133	corresponding to the center, left and right, left surround and right
134	surround and effects channels. Multiple occurances of the
135	<channel_pair_element> are dis-ambiguated by means of a unique 4-bit
136	id inside the <channel_pair_element>.

138	2. Issues covered by this Payload Format

140	2.1 Repair Information to reconstruct lost AAC Frames

142	Typically, a smart AAC decoder can mitigate the effects of lost
143	packets using techniques such as interpolation in the spectral domain.
144	However if the raw_data_block in a packet is perceptually very
145	significant and also highly unpredictable (e.g. the onset of a symbol
146	crash) then the encoder may choose to send RepairData associated with
147	that raw_data_block. The RepairData in a given packet is typically
148	associated with a raw_data_block in the FUTURE, such that the decoder
149	has the RepairData when faced with the loss of the corresponding
150	packet. The association is indicated by the RSEQ field, which is equal
151	to the SEQ field of the corresponding raw_data_block.

153	The syntax of the RepairData bits is exactly that of the AAC
154	raw_data_block. However, in practical use, the RepairData would be a
155	highly compressed monophonic version of the signal being transmitted.
156	For example, an AAC stereo signal coded to an average rate of 96 kb/s
157	corresponds to a raw_data_block size of 279 bytes. A RepairData
158	version of that block, compressed to 16 kb/s would be 46 bytes. Given
159	that perceptually critical blocks might occur only once per 100 or
160	more blocks, the average rate imposed by the RepairData is very low.

162	RepairData MAY be provide for every frame but, in general, its
163	provision is OPTIONAL.

165	2.2 Fragmentation of AAC Frames

167	For many reasons the packet size on a communications channel may have
168	a practical maximum size (e.g. Ethernet packet size limits). Since it
169	is advantagous to put one AAC raw_data_block per packet, it is
170	desirable to try to limit the size of the AAC raw_data_block. If this
171	is not possible, the raw_data_blockcan be fragmented across several
172	packets. In this case, the raw_data_block can be fragmented at
173	<element> boundaries and the LEN field used to indicate the length of
174	the <element> to within a byte and the UBITS field used to indicate
175	the length of the <element> to a the bit. The LEN and UBITS information
176	permits re-assembly of the raw_data_block without knowledge of the
177	syntax of the bits within each <element> in the raw_data_block.

179	2.3 Priority of AAC Frames

181	Depending on the signal's characteristics AAC uses different encoding
182	strategies. Stationary signals are processed using a 1024 sample
183	FFT. For transient signals a 128 sample FFT is used. Lost AAC frames
184	containing stationary signals can relatively easy be reconstructed,
185	hence they are less important to the decoder than frames containing
186	transient signals which can not or can just roughly be reconstructed.

188	This priority information is very important for AAC streaming over
189	lossy channels since it allows to adapt the reconstruct resp.
190	retransmit behavior of the streaming application or the forwarding
191	strategies inside the network (DiffServ). In order to flexibly respond
192	to packet loss and/or given bandwidth constraints four priority levels
193	are defined: 'low', 'lower', 'higher', 'high'. 'Low' priority denotes
194	frames with low perceptual entropy while 'high' priority denotes
195	frames with high perceptual entropy. 'Lower' and 'higher' priority
196	levels MUST be assigned to frames whose perceptual entropy is between
197	'high' and 'low', accordingly.

199	2.4 Interleaving of AAC Frames

201	Instead of using a static interleaving scheme (i.e. 7x7) only frames
202	with the same priority MUST be grouped.  The sequence numbers SEQ of
203	the AAC frames and RSEQ of REPAIRDATA are used to restore the actual
204	order on the receiver side. Hence, the interleaving scheme does not
205	have to be defined rigidly.

207	2.5 Example RTP Packet Sequence

209	The below example shows how a sequence of AAC packets (a...p) with
210	assigned priorities (0=low, 3=high) MAY be grouped. RepairData is not
211	provided for low priority packets:

213	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
214	| a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p |
215	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
216	| 0 | 0 | 0 | 2 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 3 |
217	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

219	Proposed interleaving/grouping of AAC frames and assigned RepairData
220	R(x) being sent within the following RTP packet:

222	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
223	|a g j|b h k|c i l|  d  |  e  |  f  | m q |  n  |  o  |  p  |
224	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
225	|     |R(d) |R(e) |R(f) |     |R(n) |R(o) |R(p) |     |     |
226	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
227	3. Payload Format

229	The RTP payload consists of a 32 or 64 bit header, a variable number
230	of RepairData containing information needed to reconstruct lost AAC
231	frames and a variable number of AAC frames. The header basically
232	contains a vector of Priority Quantizers (PQ) specifying the priority
233	of the current and previous packets to the decoder to reconstruct the
234	original signal. The X bit specifies if the header contains 12 or 28
235	PQs. REPAIRLEN specifies the total number of 32bit words containing
236	RepairData. REPAIRLEN MUST be set to 0 if there is no RepairData.
237	Every REPAIRDATA or AAC FRAME is preceded by a sequence number (R)SEQ
238	and a length specifier (R)LEN. In case of fragmented AAC frames UBITS
239	specifies the number of unused bits in the last byte since frame
240	fragments may not be byte aligned. UBITS MUST be set to 0 if the
241	corresponding frame is not fragmented.

243	0                   1                   2                   3
244	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
245	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
246	|X|REPAIRLEN    |PRI VECTOR                                     | Header
247	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
248	|PRI VECTOR (continued), if X==1                                |
249	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
250	|RSEQ           |RLEN           |REPAIRDATA 1                   |
251	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
252	|                               .                               | Repair
253	|                               .                               | Data
254	|                               .                               |
255	|               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
256	|               |RSEQ           |RLEN           |REPAIRDATA N   |
257	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
258	|                                                               |
259	|                                                               |
260	|                                                               |
261	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
262	|SEQ            |LEN                    |UBITS  |AAC FRAME 1    |
263	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
264	|                               .                               |
265	|                               .                               |
266	|                               .                               |  AAC
267	|               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  Frames
268	|               |SEQ            |LEN                    |UBITS  |
269	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
270	|AAC FRAME N                                                    |
271	|                                                               |
272	|                                                               |
273	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
274	PRI VECTOR: The actual priority vector. It contains either 12 or 28
275	            Priority Quantifiers (PQ). An PQ element describes the
276	            priority of the current packet. The size of an PQ is 2 bit.
277	            Hence, four different priority levels can be assigned to
278	            an RTP packet. 0 means low and 3 means high priority.
279	            The first PQ refers to the current packet. The following
280	            PQs refer to the most recent previous packets.
281	            So, the vector looks like this: {PQ(t), PQ(t-1), PQ(t-2)...}

283	X:          Vector Extension, the priority vector uses 56 instead of 24
284	            bits. Hence, another 32bit word is required.

286	REPAIRLEN:  The total number of 32bit words containing Repair
287	            Data for previous/future frames. If REPAIRLEN==0 then
288	            there is no repair information.

290	RSEQ:       The SEQ number of the AAC frame REPAIRDATA belongs to.

292	RLEN:       The length in bytes of REPAIRDATA.

294	REPAIRDATA: An 8bit aligned data array containing RepairData.
295	            This information can be ignored and is not mandatory.
296	            The syntax of the RepairData bits is exactly that of the AAC
297	            raw_data_block. However, it SHOULD be a highly compressed
298	            monophonic version of the signal being transmitted.

300	SEQ:        8 bit. The sequence number of the AAC frame.
301	            The application has to make sure that the sequence number of
302	            interleaved frames do not overlap.

304	LEN:        12 bit. The length of the actual AAC frame

306	UBITS:      4 bit. The number of unused bits in the last byte of the AAC
307	            frame if the frame is fragmented. The RTP M-Bit is used as
308	            a 'fragmented' tag. UBITS MUST be set to 0, if the frame is
309	            not fragmented.

311	4. References

313	  [1] ISO/IEC 13818-7 Advanced Audio Coding (AAC)

315	  [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A
316	  Transport Protocol for Real Time Applications  RFC 1889,
317	  Internet Engineering Task Force, January 1996.

319	  [3] S. Bradner, Key words for use in RFCs to Indicate
320	  Requirement Levels, RFC 2119, March 1997.

322	5. Authors' Addresses

324	Mathias Kretschmer
325	AT&T Labs - Research
326	180 Park Ave.
327	Florham Park, NJ 07932
328	USA
329	e-mail: mathias@research.att.com

331	Andrea Basso
332	AT&T Labs - Research
333	100 Schultz Drive
334	Red Bank, NJ 07701
335	USA
336	e-mail: basso@research.att.com

338	M. Reha Civanlar
339	AT&T Labs - Research
340	100 Schultz Drive
341	Red Bank, NJ 07701
342	USA
343	e-mail: civanlar@research.att.com

345	Schuyler R. Quackenbush
346	AT&T Labs - Research
347	180 Park Ave.
348	Florham Park, NJ 07932
349	USA
350	e-mail: srq@research.att.com

352	James H. Snyder
353	AT&T Labs - Research
354	180 Park Ave.
355	Florham Park, NJ 07932
356	USA
357	e-mail: jhs@research.att.com