idnits 2.17.1 draft-ietf-avt-rtp-amrwbplus-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1735. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1708. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1715. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1721. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 15 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 459 has weird spacing: '... loss sever...' == Line 828 has weird spacing: '...payload is th...' == Line 1230 has weird spacing: '... frames needs...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 22, 2005) is 6790 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3267 (ref. '7') (Obsoleted by RFC 4867)

  -- Obsolete informational reference (is this intentional?): RFC 2733 (ref.
     '11') (Obsoleted by RFC 5109)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '16') (Obsoleted by RFC 7826)


     Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	 Network Working Group                                    Johan Sjoberg
3	 INTERNET-DRAFT                                       Magnus Westerlund
4	 Expires: March 2006                                           Ericsson
5	                                                          Ari Lakaniemi
6	                                                         Stephan Wenger
7	                                                                  Nokia
8	                                                     September 22, 2005

10	     RTP Payload Format for Extended AMR Wideband (AMR-WB+) Audio Codec
11	                    

13	 Status of this memo

15	    By submitting this Internet-Draft, each author represents that any
16	    applicable patent or other IPR claims of which he or she is aware
17	    have been or will be disclosed, and any of which he or she becomes
18	    aware will be disclosed, in accordance with Section 6 of BCP 79.

20	    Internet-Drafts are working documents of the Internet Engineering
21	    Task Force (IETF), its areas, and its working groups.  Note that
22	    other groups may also distribute working documents as Internet-
23	    Drafts.

25	    Internet-Drafts are draft documents valid for a maximum of six
26	    months and may be updated, replaced, or obsoleted by other documents
27	    at any time.  It is inappropriate to use Internet-Drafts as
28	    reference material or to cite them other than as "work in progress."

30	    The list of current Internet-Drafts can be accessed at
31	    http://www.ietf.org/1id-abstracts.txt

33	    The list of Internet-Draft Shadow Directories can be accessed at
34	    http://www.ietf.org/shadow.html

36	    This document is a submission of the IETF AVT WG.  Comments should
37	    be directed to the AVT WG mailing list, avt@ietf.org.

39	 Abstract

41	    This document specifies a real-time transport protocol (RTP) payload
42	    format for Extended Adaptive Multi-Rate Wideband (AMR-WB+) encoded
43	    audio signals.  The AMR-WB+ codec is an audio extension of the AMR-
44	    WB speech codec.  It encompasses the AMR-WB frame types and a number
45	    of new frame types designed to support high quality music and
46	    speech.  A media type registration for AMR-WB+ is included in this
47	    specification.

49	 TABLE OF CONTENTS

51	 1. Definitions.....................................................3
52	    1.1. Glossary...................................................3
53	    1.2. Terminology................................................3
54	 2. Introduction....................................................3
55	 3. Background of AMR-WB+ and Design Principles.....................4
56	    3.1. The AMR-WB+ Audio Codec....................................4
57	    3.2. Multi-rate Encoding and Rate Adaptation....................7
58	    3.3. Voice Activity Detection and Discontinuous Transmission....8
59	    3.4. Support for Multi-Channel Session..........................8
60	    3.5. Unequal Bit-error Detection and Protection.................8
61	    3.6. Robustness against Packet Loss.............................9
62	       3.6.1. Use of Forward Error Correction (FEC).................9
63	       3.6.2. Use of Frame Interleaving............................10
64	    3.7. AMR-WB+ Audio over IP scenarios...........................11
65	    3.8. Out-of-Band Signaling.....................................12
66	 4. RTP Payload Format for AMR-WB+.................................12
67	    4.1. RTP Header Usage..........................................13
68	    4.2. Payload Structure.........................................14
69	    4.3. Payload Definitions.......................................14
70	       4.3.1. Payload Header.......................................14
71	       4.3.2. The Payload Table of Contents........................15
72	       4.3.3. Audio Data...........................................21
73	       4.3.4. Methods for Forming the Payload......................21
74	       4.3.5. Payload Examples.....................................22
75	    4.4. Interleaving Considerations...............................24
76	    4.5. Implementation Considerations.............................25
77	       4.5.1. ISF recovery in case of packet loss..................26
78	       4.5.2. Decoding Validation..................................28
79	 5. Congestion Control.............................................28
80	 6. Security Considerations........................................28
81	    6.1. Confidentiality...........................................29
82	    6.2. Authentication and Integrity..............................29
83	 7. Payload Format Parameters......................................29
84	    7.1. Media Type Registration...................................30
85	    7.2. Mapping Media Type Parameters into SDP....................31
86	       7.2.1. Offer-Answer Model Considerations....................32
87	       7.2.2. Examples.............................................34
88	 8. IANA Considerations............................................34
89	 9. Contributors...................................................34
90	 10. Acknowledgements..............................................34
91	 11. References....................................................35
92	    11.1. Normative references.....................................35
93	    11.2. Informative references...................................36
94	 12. Authors' Addresses............................................37
95	 13. IPR Notice....................................................38
96	 14. Copyright Notice..............................................38
97	 1. Definitions

99	 1.1. Glossary

101	    3GPP    - Third Generation Partnership Project
102	    AMR     - Adaptive Multi-Rate (Codec)
103	    AMR-WB  - Adaptive Multi-Rate Wideband (Codec)
104	    AMR-WB+ - Extended Adaptive Multi-Rate Wideband (Codec)
105	    CMR     - Codec Mode Request
106	    CN      - Comfort Noise
107	    DTX     - Discontinuous Transmission
108	    FEC     - Forward Error Correction
109	    FT      - Frame Type
110	    ISF     - Internal Sampling Frequency
111	    SCR     - Source Controlled Rate Operation
112	    SID     - Silence Indicator (the frames containing only CN
113	              parameters)
114	    TFI     - Transport Frame Index
115	    TS      - Timestamp
116	    VAD     - Voice Activity Detection
117	    UED     - Unequal Error Detection
118	    UEP     - Unequal Error Protection

120	 1.2. Terminology

122	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
123	    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
124	    this document are to be interpreted as described in RFC 2119 [2].

126	 2. Introduction

128	    This document specifies the payload format for packetization of
129	    Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio
130	    signals into the Real-time Transport Protocol (RTP) [3].  The
131	    payload format supports the transmission of mono or stereo audio,
132	    aggregating multiple frames per payload, and mechanisms enhancing
133	    the robustness of the packet stream against packet loss.

135	    The AMR-WB+ codec is an extension of the Adaptive Multi-Rate
136	    Wideband (AMR-WB) speech codec.  New features include extended audio
137	    bandwidth to enable high quality for non-speech signals (e.g.
138	    music), native support for stereophonic audio, and the option to
139	    operate on, and switch between, several internal sampling
140	    frequencies (ISFs).  The primary usage scenario for AMR-WB+ is the
141	    transport over IP.  Therefore, interworking with other transport
142	    networks, as discussed for AMR-WB in [7], is not a major concern and
143	    hence not addressed in this memo.

145	    The expected key application for AMR-WB+ is streaming.  To make the
146	    packetization process on a streaming server as efficient as
147	    possible, an octet-aligned payload format is desirable.  Therefore,
148	    a bandwidth efficient mode as defined for AMR-WB in [7] is not
149	    specified herein; the bandwidth-savings of the bandwidth efficient
150	    mode would be very small anyway, since all extension frame types are
151	    octet aligned.

153	    The stereo encoding capability of AMR-WB+ renders the support for
154	    multi-channel transport at RTP payload format level, as specified
155	    for AMR-WB [7], obsolete.  Therefore this feature is not included in
156	    this memo.

158	    This specification does not include a definition of a file format
159	    for AMR-WB+.  Instead, it is referred to the ISO based 3GP file
160	    format [14], which supports AMR-WB+ and provides all functionality
161	    required.  The 3GP format also supports storage of AMR and AMR-WB,
162	    and many other multi-media formats, thereby allowing synchronized
163	    playback.

165	    The rest of the document is organized as follows: Background
166	    information on the AMR-WB+ codec, and design principles, can be
167	    found in Section 3.  The payload format itself is specified in
168	    Section 4.  Sections 5 and 6 discuss congestion control and security
169	    considerations, respectively.  In Section 7, a media type
170	    registration is provided.

172	 3. Background of AMR-WB+ and Design Principles

174	    The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
175	    is designed to compress speech and audio signals at low bit-rate and
176	    good quality.  The codec is specified by the Third Generation
177	    Partnership Project (3GPP).  The primary target applications are 1.
178	    the packet-switched streaming service (PSS) [13], 2. multimedia
179	    messaging service (MMS), and 3. multimedia broadcast and multicast
180	    service (MBMS). However, due to its flexibility and robustness, AMR-
181	    WB+ is also well suited for streaming services in other highly
182	    varying transport environments, for example the Internet.

184	 3.1. The AMR-WB+ Audio Codec

186	    3GPP originally developed the AMR-WB+ audio codec for streaming and
187	    messaging services in  Global System for Mobile communications (GSM)
188	    and third generation (3G) cellular systems.   The codec is designed
189	    as an audio extension of the AMR-WB speech codec.  The extension
190	    adds new functionality to the codec in order to provide high audio
191	    quality for a large range of signals including music.  Stereophonic
192	    operation has also been added.  A new, high-efficiency hybrid stereo
193	    coding algorithm enables stereo operation at bit-rates as low as 6.2
194	    kbit/s.

196	    The AMR-WB+ codec includes the nine frame types specified for AMR-
197	    WB, extended by new bit-rates ranging from 5.2 to 48 kbit/s.  The
198	    AMR-WB frame types can employ only a 16000 Hz sampling frequency and
199	    operate only on monophonic signals.  The newly introduced extension
200	    frame types, however, can operate at a number of internal sampling
201	    frequencies (ISFs), both in mono and stereo.  Please see Table 24 in
202	    [1] for details.  The output sampling frequency of the decoder is
203	    limited to 8, 16, 24, 32 or 48 kHz.

205	    An overview of the AMR-WB+ encoding operations is provided as
206	    follows.  The encoder receives the audio sampled at, for example, 48
207	    kHz.  The encoding process starts with pre-processing and resampling
208	    to the user-selected ISF.  The encoding is performed on equally
209	    sized super-frames.  Each super-frame corresponds to 2048 samples
210	    per channel, at the ISF.  The codec carries out a number of encoding
211	    decisions for each super-frame, thereby choosing between different
212	    encoding algorithms and block lengths, so to achieve a fidelity-
213	    optimized encoding adapted to the signal characteristics of the
214	    source.  The stereo encoding (if used) executes separately from the
215	    monophonic core encoding, thus enabling the selection of different
216	    combinations of core and stereo encoding rates.  The resulting
217	    encoded audio is produced in four transport frames of equal length.
218	    Each transport frame corresponds to 512 samples at the ISF, and is
219	    individually usable by the decoder, provided that its position in
220	    the super-frame structure is known.

222	    The codec supports 13 different ISFs, ranging from 12.8 up to 38.4
223	    kHz, as described by Table 24 of [1].  The high number of ISFs
224	    allows a trade-off between the audio bandwidth and the target bit-
225	    rate.  As encoding is performed on 2048 samples at the ISF, the
226	    duration of a super-frame and the effective bit-rate of the frame
227	    type in use varies.

229	    The ISF of 25600 Hz has a super-frame duration of 80 ms.  It is the
230	    'nominal' value used to describe the encoding bit-rates henceforth.
231	    Assuming this normalization, the ISF selection results in bit-rate
232	    variations from 1/2 up to 3/2 of the nominal bit-rate.

234	    The encoding for the extension modes is performed as one monophonic
235	    core encoding and one stereo encoding.  The core encoding is
236	    executed by splitting the monophonic signal into a lower and a
237	    higher frequency band.  The lower band is encoded employing either
238	    algebraic code excited linear prediction (ACELP), or transform coded
239	    excitation (TCX).  This selection can be made once per transport
240	    frame, but must obey certain limitations of legal combinations
241	    within the super-frame.  The higher band is encoded using a low-rate
242	    parametric bandwidth extension approach.

244	    The stereo signal is encoded employing a similar frequency band
245	    decomposition; however, here the signal is divided into three bands
246	    that are individually parameterized.

248	    The total bit-rate produced by the extension is the result of the
249	    combination of the encoder's core rate, stereo rate and ISF.  The
250	    extension supports 8 different core encoding rates producing bit-
251	    rates between 10.4 and 24.0 kbit/s; see table 22 in [1].  There are
252	    16 stereo encoding rates generating bit-rates between 2.0 and 8.0
253	    kbit/s; see table 23 in [1].  The frame type encodes the AMR-WB
254	    modes, 4 fixed extension rates (see below), 24 combinations of core
255	    and stereo rates for stereo signals, and the 8 core rates for mono
256	    signals, as listed in table 25 in [1].  This results in the AMR-WB+
257	    supporting encoding rates between 10.4 and 32 kbit/s, assuming an
258	    ISF of 25600 Hz.

260	    Different ISFs allow for additional freedom in the produced bit-
261	    rates and audio quality.  The selection of an ISF changes the
262	    available audio bandwidth of the reconstructed signal, and also the
263	    total bit-rate.  The bit-rate for a given combination of frame type
264	    and ISF is determined by multiplying the frame type's bit-rate with
265	    the used ISF's bit-rate factor, see table 24 in [1].

267	    The extension also has four frame types which have fixed ISFs.
268	    Please see frame types 10-13 in Table 21 in [1].  These four pre-
269	    defined frame types have a fixed input sampling frequency at the
270	    encoder, which can be set either at 16 or 24 kHz.  Like the AMR-WB
271	    frame types, transport frames encoded utilizing these frame types
272	    represent exactly 20 ms of the audio signal.  However, they are also
273	    part of 80 ms super-frames.  Frame types 0-13 (AMR-WB and fixed
274	    extension rates), as listed in table 21 in [1], do not require an
275	    explicit ISF indication.  The other frame types 14-47 require the
276	    ISF employed to be indicated.

278	    The 32 different frame types of the extension, in combination with
279	    13 ISFs, allows for a great flexibility in bit-rate and selection of
280	    desired audio quality.  A number of combinations exist that produce
281	    the same codec bit-rate.  For example, a 32 kbit/s audio stream can
282	    be produced by utilizing frame type 41, i.e. 25.6 kbit/s, and the
283	    ISF of 32kHz (5/4 * (19.2+6.4) = 32 kbit/s), or frame type 47 and
284	    the ISF of 25.6 kHz (1 * (24 + 8) = 32 kbit/s).  Which combination
285	    is more beneficial for the perceived audio quality depends on the
286	    content.  In the above example the first case provides a higher
287	    audio bandwidth, while the second one spends the same number of bits
288	    on somewhat narrower audio bandwidth but provides higher fidelity.
289	    Encoders are free to select the combination they deem most
290	    beneficial.

292	    Since a transport frame always corresponds to 512 samples at the
293	    used ISF, its duration is limited to the range 13.33 to 40 ms, see
294	    Table 1.  An RTP Timestamp clock rate of 72000 Hz, as mandated by
295	    this specification, results in AMR-WB+ transport frame lengths of
296	    960 to 2880 timestamp ticks, depending solely on the selected ISF.

298	         Index   ISF   Duration(ms) Duration(TS Ticks @ 72 kHz)
299	         ------------------------------------------------------
300	           0     N/A      20             1440
301	           1    12800     40             2880
302	           2    14400     35.55          2560
303	           3    16000     32             2304
304	           4    17067     30             2160
305	           5    19200     26.67          1920
306	           6    21333     24             1728
307	           7    24000     21.33          1536
308	           8    25600     20             1440
309	           9    28800     17.78          1280
310	          10    32000     16             1152
311	          11    34133     15             1080
312	          12    36000     14.22          1024
313	          13    38400     13.33           960

315	         Table 1: Normative number of RTP Timestamp Ticks for each
316	                  Transport Frame depending on ISF (ISF and Duration in
317	                  ms are rounded)

319	    The encoder is free to change both the ISF and the encoding frame
320	    type (both mono and stereo) during a session.  For the extension
321	    frame types with index 10-13 and 16-47, the ISF and frame type
322	    changes are constrained to occur at super-frame boundaries.  This
323	    implies that, for the frame types mentioned, the ISF is constant
324	    throughout a super-frame.  This limitation does not apply for frame
325	    types with index 0-9, 14 and 15, i.e. the original AMR-WB frame
326	    types.

328	    A number of features of the AMR-WB+ codec require special
329	    consideration from a transport point of view, and solutions that
330	    could perhaps be viewed as unorthodox.  First, there are constraints
331	    on the RTP timestamping, due to the relationship of the frame
332	    duration and the ISFs.  Second, each frame of encoded audio must
333	    maintain information about its frame type, ISF and position in the
334	    super-frame.

336	 3.2. Multi-rate Encoding and Rate Adaptation

338	    The multi-rate encoding capability of AMR-WB+ is designed to
339	    preserve high audio quality under a wide range of bandwidth
340	    requirements and transmission conditions.

342	    AMR-WB+ enables seamless switching between frame types that use the
343	    same number of audio channels and the same ISF.  Every AMR-WB+ codec
344	    implementation is required to support all frame types defined by the
345	    codec, and must be able to handle switching between any two frame
346	    types.  Switching between frame types employing a different number
347	    of audio channels or a different ISF must also be supported, but it
348	    may not be completely seamless.  Therefore it is recommended to
349	    perform such switching infrequently and, if possible, during periods
350	    of silence.

352	 3.3. Voice Activity Detection and Discontinuous Transmission

354	    AMR-WB+ supports the same algorithms as AMR-WB for voice activity
355	    detection (VAD) and generation of comfort noise (CN) parameters
356	    during silence periods.  However, these functionalities can only be
357	    used in conjunction with the AMR-WB frame types (FT=0-8).  This
358	    option allows reducing the number of transmitted bits and packets
359	    during silence periods to a minimum.  The operation of sending CN
360	    parameters at regular intervals during silence periods is usually
361	    called discontinuous transmission (DTX) or source controlled rate
362	    (SCR) operation.  The AMR-WB+ frames containing CN parameters are
363	    called Silence Indicator (SID) frames. More details about the VAD
364	    and DTX functionality is provided in [4] and [5].

366	 3.4. Support for Multi-Channel Session

368	    Some of the AMR-WB+ frame types support the encoding of stereophonic
369	    audio.  Because of this native support for a two-channel
370	    stereophonic signal, it does not seem necessary to support multi-
371	    channel transport with separate codec instances, as specified in the
372	    AMR-WB RTP payload [7].  The codec has the capability of stereo to
373	    mono downmixing as part of the decoding process.  Thus, a receiver
374	    that is only capable of playout of monophonic audio must still be
375	    able to decode and play signals originally encoded and transmitted
376	    as stereo.  However, to avoid spending bits on a stereo encoding
377	    that is not going to be utilized, a mechanism is defined in this
378	    specification to signal mono-only audio.

380	 3.5. Unequal Bit-error Detection and Protection

382	    The audio bits encoded in each AMR-WB frame are sorted according to
383	    their different perceptual sensitivity to bit errors.  In cellular
384	    systems, for example, this property can be exploited to achieve
385	    better voice quality, by using unequal error protection and
386	    detection (UEP and UED) mechanisms.  However, the bits of the
387	    extension frame types of the AMR-WB+ codec do not have a consistent
388	    perceptual significance property and are not sorted in this order.
389	    Thus, UEP or UED is meaningless with the extension frame types.  If
390	    there is a need to use UEP or UED for AMR-WB frame types, it is
391	    recommended to use RFC 3267 [7].

393	 3.6. Robustness against Packet Loss

395	    The payload format supports two mechanisms to improve robustness
396	    against packet loss: simple forward error correction (FEC) and frame
397	    interleaving.

399	 3.6.1. Use of Forward Error Correction (FEC)

401	    Generic forward error correction within RTP is defined, for example
402	    in RFC2733 [11].  Audio redundancy coding is defined in RFC2198
403	    [12].  Either scheme can be used to add redundant information to the
404	    RTP packet stream and make it more resilient to packet losses, at
405	    the expense of a higher bit rate.  Please see either RFC for a
406	    discussion of the implications of the higher bit rate to network
407	    congestion.

409	    In addition to these media-unaware mechanisms, this memo specifies
410	    an AMR-WB+ specific form of audio redundancy coding, which may be
411	    beneficial in terms of packetization overhead.

413	    Conceptually, previously transmitted transport frame(s) are
414	    aggregated together with new one(s).  A sliding window is used to
415	    group the frames to be sent in each payload.  Figure 1 below shows
416	    an example.

418	    --+--------+--------+--------+--------+--------+--------+--------+--
419	      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
420	    --+--------+--------+--------+--------+--------+--------+--------+--

422	      <---- p(n-1) ---->
423	               <----- p(n) ----->
424	                        <---- p(n+1) ---->
425	                                 <---- p(n+2) ---->
426	                                          <---- p(n+3) ---->
427	                                                   <---- p(n+4) ---->

429	    Figure 1: An example of redundant transmission.

431	    Here, each frame is retransmitted once in the following RTP payload
432	    packet.  F(n-2)...f(n+4) denote a sequence of audio frames and p(n-
433	    1)...p(n+4) a sequence of payload packets.

435	    The mechanism described does not require signaling at the session
436	    setup.  In other words, the audio sender can choose to use this
437	    scheme without consulting the receiver.  For a certain timestamp,
438	    the receiver may receive multiple copies of a frame containing
439	    encoded audio data or frames indicated as NO_DATA.  The cost of this
440	    scheme is bandwidth and the receiver delay necessary to allow the
441	    redundant copy to arrive.

443	    This redundancy scheme provides a similar functionality as the one
444	    described in RFC 2198, but works only if both original frames and
445	    redundant representations are AMR-WB+ frames.  When the use of other
446	    media coding schemes is desirable, one has to resort to RFC2198.

448	    The sender is responsible for selecting an appropriate amount of
449	    redundancy based on feedback about the channel conditions, e.g. in
450	    the RTP Control Protocol (RTCP) [3] receiver reports.  The sender is
451	    also responsible for avoiding congestion, which may be exacerbated
452	    by redundancy (see Section 5 for more details).

454	 3.6.2. Use of Frame Interleaving

456	    To decrease protocol overhead, the payload design allows several
457	    audio transport frames to be encapsulated into a single RTP packet.
458	    One of the drawbacks of such an approach is that in case of packet
459	    loss  several consecutive frames are lost.  Consecutive frame loss
460	    normally renders error concealment less efficient and usually causes
461	    clearly audible and annoying distortions in the reconstructed audio.
462	    Interleaving of transport frames can improve the audio quality in
463	    such cases by distributing the consecutive losses into a number of
464	    isolated frame losses, which are easier to conceal.  However,
465	    interleaving and bundling several frames per payload also increases
466	    end-to-end delay and sets higher buffering requirements.  Therefore,
467	    interleaving is not appropriate for all use cases or devices.
468	    Streaming applications should most likely be able to exploit
469	    interleaving to improve audio quality in lossy transmission
470	    conditions.

472	    Note that this payload design supports the use of frame interleaving
473	    as an option.  The usage of this feature needs to be negotiated in
474	    the session set-up.

476	    The interleaving supported by this format is rather flexible.  For
477	    example, a continuous pattern can be defined, as depicted in Figure
478	    2.

480	    --+--------+--------+--------+--------+--------+--------+--------+--
481	      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
482	    --+--------+--------+--------+--------+--------+--------+--------+--

484	               [ P(n)   ]
485	      [ P(n+1) ]                 [ P(n+1) ]
486	                        [ P(n+2) ]                 [ P(n+2) ]
487	                                          [ P(n+3) ]                 [P(
488	                                                            [ P(n+4) ]

490	    Figure 2: An example of interleaving pattern that has constant
491	    delay.

493	    In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
494	    aggregated into packets P(n) to P(n+4), each packet carrying two
495	    frames.  This approach provides an interleaving pattern that allows
496	    for constant delay in both the interleaving and deinterleaving
497	    processes.  The deinterleaving buffer needs to have room for at
498	    least three frames, including the one that is ready to be consumed.
499	    The storage space for three frames is needed, for example, when f(n)
500	    is the next frame to be decoded: since frame f(n) was received in
501	    packet P(n+2) carrying also frame f(n+3), both these frames are
502	    stored in the buffer. Furthermore, frame f(n+1) received in the
503	    previous packet P(n+1) is also in the deinterleaving buffer.  Note
504	    also that in this example the buffer occupancy varies: when frame
505	    f(n+1) is the next one to be decoded, there are only two frames,
506	    f(n+1) and f(n+3), in the buffer.

508	 3.7. AMR-WB+ Audio over IP scenarios

510	    Since the primary target application for the AMR-WB+ codec is
511	    streaming over packet networks, the most relevant usage scenario for
512	    this payload format is IP end-to-end between a server and a
513	    terminal, as shown in Figure 3.

515	              +----------+                          +----------+
516	              |          |    IP/UDP/RTP/AMR-WB+    |          |
517	              |  SERVER  |<------------------------>| TERMINAL |
518	              |          |                          |          |
519	              +----------+                          +----------+

521	               Figure 3: Server to terminal IP scenario

523	 3.8. Out-of-Band Signaling

525	    Some of the options of this payload format remain constant
526	    throughout a session.  Therefore, they can be controlled/negotiated
527	    at the session set-up.  Throughout this specification, these options
528	    and variables are denoted as "parameters to be established through
529	    out-of-band means".  In Section 7, all of the parameters are
530	    formally specified in the form of media type registration for the
531	    AMR-WB+ encoding.  The method used to signal these parameters at
532	    session setup or to arrange prior agreement of the participants is
533	    beyond the scope of this document; however, Section 7.2 provides a
534	    mapping of the parameters into the Session Description Protocol
535	    (SDP) [6] for those applications that use SDP.

537	 4. RTP Payload Format for AMR-WB+

539	    The main emphasis in the payload design for AMR-WB+ has been to
540	    minimize the overhead in typical use cases, while providing full
541	    flexibility with a slightly higher overhead.  In order to keep the
542	    specification reasonably simple, we refrained from defining frame-
543	    specific parameters for each frame type.  Instead, a few common
544	    parameters were specified that cover all types of frames.

546	    The payload format has two modes, basic mode and interleaved mode.
547	    The main structural difference between the two modes is the
548	    extension of the table of content entries with frame displacement
549	    fields (when operating in the interleaved mode).  The basic mode
550	    supports aggregation of multiple consecutive frames in a payload.
551	    The interleaved mode supports aggregation of multiple frames that
552	    are non-consecutive in time.  In both modes it is possible to have
553	    frames encoded with different frame types in the same payload.  The
554	    ISF must remain constant throughout the payload of a single packet.

556	    The payload format is designed around the property of AMR-WB+ frames
557	    that the frames are consecutive in time and share the same frame
558	    duration (in the absence of an ISF change).  This enables the
559	    receiver to derive the timestamp for an individual frame within a
560	    payload.  In basic mode, the deriving process is based on the order
561	    of frames.  In interleaved mode, it is based on the compact
562	    displacement fields.  The frame timestamps are used to regenerate
563	    the correct order of frames after reception, identify duplicates,
564	    and detect lost frames that require concealment.

566	    The interleaving scheme of this payload format is significantly more
567	    flexible than the one specified in RFC 3267.  The AMR and AMR-WB
568	    payload format is only capable of using periodic patterns with
569	    frames taken from an interleaving group at fixed intervals.  The
570	    interleaving scheme of this specification, in contrast, allows for
571	    any interleaving pattern, as long as the distance in decoding order
572	    between any two adjacent frames is not more than 256 frames.  Note
573	    that even at the highest ISF this allows an interleaving depth up to
574	    3.41 seconds.

576	    To allow for error resiliency through redundant transmission, the
577	    periods covered by multiple packets MAY overlap in time.  A receiver
578	    MUST be prepared to receive any audio frame multiple times.  All
579	    redundantly sent frames MUST use the same frame type and ISF, and
580	    MUST have the same RTP timestamp, or MUST be a NO_DATA frame
581	    (FT=15).

583	    The payload consists of octet aligned elements (header, ToC and
584	    audio frames).  Only the audio frames for AMR-WB frame types (0-9)
585	    require padding for octet alignment.  If additional padding is
586	    desired, then the P bit in the RTP header MAY be set and padding MAY
587	    be appended as specified in [3].

589	 4.1. RTP Header Usage

591	    The format of the RTP header is specified in [3].  This payload
592	    format uses the fields of the header in a manner consistent with
593	    that specification.

595	    The RTP timestamp corresponds to the sampling instant of the first
596	    sample encoded for the first frame in the packet.  The timestamp
597	    clock frequency SHALL be 72000 Hz.  This frequency allows the frame
598	    duration to be integer RTP timestamp ticks for the ISFs specified in
599	    Table 1.  It also provides reasonable conversion factors to the
600	    input/output audio sampling frequencies supported by the codec.  See
601	    section 4.3.1 for guidance on how to derive the RTP timestamp for
602	    any audio frame beyond the first one.

604	    The RTP header marker bit (M) SHALL be set to 1 whenever the first
605	    frame carried in the packet is the first frame in a talkspurt (see
606	    definition of the talkspurt in section 4.1 of [9]).  For all other
607	    packets the marker bit SHALL be set to zero (M=0).

609	    The assignment of an RTP payload type for the format defined in this
610	    memo is outside the scope of this document.  The RTP profile in use
611	    either assigns a static payload type or mandates binding the payload
612	    type dynamically.

614	    The media type parameter "channels" is used to indicate the maximum
615	    number of channels allowed for a given payload type.  A payload type
616	    where channels=1 (mono), SHALL only carry mono content.  A payload
617	    type for which channels=2 has been declared MAY carry both mono and
618	    stereo content.  Note that this definition is different from the one
619	    in RFC 3551 [9].  As mentioned before, the AMR-WB+ codec handles the
620	    support of stereo content and the (eventual) downmixing of stereo to
621	    mono internally.  This makes it unnecessary to negotiate for the
622	    number of channels for reasons other than bit-rate efficiency.

624	 4.2. Payload Structure

626	    The payload consists of a payload header, a table of contents, and
627	    the audio data representing one or more audio frames.  The following
628	    diagram shows the general payload format layout:

630	    +----------------+-------------------+----------------
631	    | payload header | table of contents | audio data ...
632	    +----------------+-------------------+----------------

634	    Payloads containing more than one audio frame are called compound
635	    payloads.

637	    The following sections describe the variations taken by the payload
638	    format depending on the mode in use, basic mode or interleaved mode.

640	 4.3. Payload Definitions

642	 4.3.1. Payload Header

644	    The payload header carries data that is common for all frames in the
645	    payload.  The structure of the payload header is described below.

647	     0 1 2 3 4 5 6 7
648	    +-+-+-+-+-+-+-+-+
649	    |   ISF   |TFI|L|
650	    +-+-+-+-+-+-+-+-+

652	    ISF (5 bits): Indicates the Internal Sampling Frequency employed for
653	       all frames in this payload.  The index value corresponds to
654	       internal sampling frequency as specified in Table 24 in [1].
655	       This field SHALL be set to 0 for payloads containing frames with
656	       Frame Type values 0-13.

658	    TFI (2 bits): Transport Frame Index, from 0 (first) to 3 (last),
659	       indicating the position of the first transport frame of this
660	       payload in the AMR-WB+ super-frame structure.  For payloads with
661	       frames of only Frame Type values 0-9 this field SHALL be set to
662	       0.  The TFI value for a frame of type 0-9 SHALL be ignored.  Note
663	       that the frame type is coded in the table of contents (as
664	       discussed later) -- hence the mentioned dependencies of the frame
665	       type can be applied easily by interpreting only values carried in
666	       the payload header.  It is not necessary to interpret the audio
667	       bit stream itself.

669	    L (1 bit): Long displacement field flag for payloads in interleaved
670	       mode.  If set to 0, four-bit displacement fields are used to
671	       indicate interleaving offset; if set to 1, displacement fields of
672	       eight bits are used (see section 4.3.2.2).  For payloads in the
673	       basic mode this bit SHALL be set to 0 and SHALL be ignored by the
674	       receiver.

676	    Note that frames employing different ISF values require
677	    encapsulation in separate packets.   Thus, special considerations
678	    apply when generating interleaved packets and an ISF change is
679	    executed.  In particular, frames that, according to the previously
680	    used interleaving pattern, would be aggregated into a single packet
681	    have to be separated into different packets, so that the
682	    aforementioned condition (all frames in a packet share the ISF)
683	    remains true.  A naive implementation that splits the frames with
684	    different ISF into different packets can result in up to twice the
685	    number of RTP packets, when compared to an optimal interleaved
686	    solution.  Alteration of the interleaving before and after the ISF
687	    change may reduce the need for extra RTP packets.

689	 4.3.2. The Payload Table of Contents

691	    The table of contents (ToC) consists of a list of entries, each
692	    entry corresponds to a group of audio frames carried in the payload,
693	    as depicted below.

695	    +----------------+----------------+- ... -+----------------+
696	    |  ToC entry #1  |  Toc entry #2  |          ToC entry #N  |
697	    +----------------+----------------+- ... -+----------------+

699	    When multiple groups of frames are present in a payload, the ToC
700	    entries SHALL be placed in the packet in order of increasing RTP
701	    timestamp value (modulo 2^32) of the first transport frame the TOC
702	    entry represent.

704	 4.3.2.1. ToC Entry in the Basic Mode

706	    A ToC entry of a payload in the basic mode has the following format:

708	     0                   1
709	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
710	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
711	    |F| Frame Type  |    #frames    |
712	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

714	    F (1 bit): If set to 1, indicates that this ToC entry is followed by
715	       another ToC entry; if set to 0, indicates that this ToC entry is
716	       the last one in the ToC.

718	    Frame Type (FT) (7 bits): Indicates the audio codec frame type used
719	       for the group of frames referenced by this ToC entry.  FT
720	       designates the combination of AMR-WB+ core and stereo rate, one
721	       of the special AMR-WB+ frame types, the AMR-WB rate, or comfort
722	       noise, as specified by Table 25 in [1].

724	    #frames (8 bits): Indicates the number of frames in the group
725	       referenced by this ToC entry.  ToC entries with this field equal
726	       to 0 (that would indicate zero frames) SHALL NOT be used and
727	       received packets with such a TOC entry SHALL be discarded.

729	 4.3.2.2. ToC Entry in the Interleaved Mode

731	    Two different ToC entry formats are defined in interleaved mode.
732	    They differ in the length of the displacement field, 4 bits or 8
733	    bits.  The L-bit in the payload header differentiates between the
734	    two modes.

736	    If L=0, a ToC entry has the following format:

738	     0                   1                   2                   3
739	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
740	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
741	    |F| Frame Type  |    #frames    |  DIS1 |  ...  |  DISi |  ...  |
742	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
743	    |  ...  |  ...  |  DISn |  Padd |
744	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

746	    F (1 bit): See definition in 4.3.2.1.

748	    Frame Type (FT) (7 bits): See definition in 4.3.2.1.

750	    #frames (8 bits): See definition in 4.3.2.1.

752	    DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields
753	       indicating the displacement of the i:th (i=1..n) audio frame
754	       relative to the preceding audio frame in the payload, in units of
755	       frames.  The four-bit unsigned integer displacement values may be
756	       between 0 and 15 indicating the number of audio frames in
757	       decoding order between the (i-1):th and the i:th frame in the
758	       payload.  Note that for the first ToC entry of the payload the
759	       value of DIS1 is meaningless.  It SHALL be set to zero by a
760	       sender, and SHALL be ignored by a receiver. This frame's location
761	       in the decoding order is uniquely defined by the RTP timestamp
762	       and TFI in the payload header.  Note also that for subsequent ToC
763	       entries DIS1 indicates the number of frames between the last
764	       frame of the previous group and the first frame of this group.

766	    Padd (4 bits): To ensure octet alignment, four padding bits SHALL be
767	       included at the end of the ToC entry in case there is odd number
768	       of frames in the group referenced by this entry.  These bits
769	       SHALL be set to zero and SHALL be ignored by the receiver.  If a
770	       group containing an even number of frames is referenced by this
771	       ToC entry, these padding bits SHALL NOT be included in the
772	       payload.

774	    If L=1, a ToC entry has the following format:

776	     0                   1                   2                   3
777	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
778	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
779	    |F| Frame Type  |    #frames    |      DIS1     |      ...      |
780	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
781	    |      ...      |     DISn      |
782	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

784	    F (1 bit): See definition in 4.3.2.1.

786	    Frame Type (FT) (7 bits): See definition in 4.3.2.1.

788	    #frames (8 bits): See definition in 4.3.2.1.

790	    DIS1...DISn (8 bits): A list of n (n=#frames) displacement fields
791	       indicating the displacement of the i:th (i=1..n) audio frame
792	       relative to the preceding audio frame in the payload, in units of
793	       frames.  The eight-bit unsigned integer displacement values may
794	       be between 0 and 255 indicating the number of audio frames in
795	       decoding order between the (i-1):th and the i:th frame in the
796	       payload.  Note that for the first ToC entry of the payload the
797	       value of DIS1 is meaningless.  It SHALL be set to zero by a
798	       sender, and SHALL be ignored by a receiver. This frame's location
799	       in the decoding order is uniquely defined by the RTP timestamp
800	       and TFI in the payload header.  Note also that for subsequent ToC
801	       entries DIS1 indicates the displacement between the last frame of
802	       the previous group and the first frame of this group.

804	 4.3.2.3. RTP Timestamp Derivation

806	    The RTP Timestamp value for a frame SHALL be the timestamp value of
807	    the first audio sample encoded in the frame.  The timestamp value
808	    for a frame is derived differently depending on the payload mode,
809	    basic or interleaved.  In both cases the first frame in a compound
810	    packet has an RTP timestamp equal to the one received in the RTP
811	    header.  In the basic mode, the RTP time for any subsequent frame is
812	    derived in two steps.  First, the sum of the frame durations (see
813	    Table 1) of all the preceding frames in the payload is calculated.
814	    Then, this sum is added to the RTP header timestamp value.  For
815	    example, if the RTP Header timestamp value is 12345, the payload
816	    carries four frames, and the frame duration is 16 ms (ISF = 32 kHz)
817	    corresponding to 1152 timestamp ticks, the RTP timestamp of the
818	    fourth frame in the payload is 12345 + 3 * 1152 = 15801.

820	    In interleaved mode, the RTP timestamp for each frame in the payload
821	    is derived from the RTP header timestamp and the sum of the time
822	    offsets of all preceding frames in this payload.  The frame
823	    timestamps are computed based on displacement fields and the frame
824	    duration derived from the ISF value.  Note that the displacement in
825	    time between frame i-1 and frame i is (DISi + 1) * frame duration
826	    because also the duration of the (i-1):th must be taken into
827	    account.  The timestamp of the first frame of the first group of
828	    frames (TS(1)), i.e. the first frame of the payload  is the RTP
829	    header timestamp. For subsequent frames in the group the timestamp
830	    is computed by

832	      TS(i) = TS(i-1) + (DISi + 1) * frame duration,    2 < i < n

834	    For subsequent groups of frames the timestamp of the first frame is
835	    computed by

837	      TS(1) = TSprev + (DIS1 + 1) * frame duration,

839	    where TSprev denotes the timestamp of the last frame in the previous
840	    group.  The timestamps of the subsequent frames in the group are
841	    computed in the same way as for the first group.

843	    The following example derives the RTP timestamps for the frames in
844	    an interleaved mode payload having the following header and ToC
845	    information:

847	    RTP header timestamp: 12345
848	    ISF = 32 kHz
849	    Frame 1 displacement field: DIS1 = 0
850	    Frame 2 displacement field: DIS2 = 6
851	    Frame 3 displacement field: DIS3 = 4
852	    Frame 4 displacement field: DIS4 = 7

854	    Assuming an ISF of 32 kHz, which implies frame duration of 16 ms,
855	    one frame lasts 1152 ticks.  The timestamp of the first frame in the
856	    payload is the RTP timestamp, i.e. TS(1) = RTP TS.  Note that the
857	    displacement field value for this frame must be ignored.  For the
858	    second frame in the payload the timestamp can be calculated as TS(2)
859	    = TS(1) + (DIS2 + 1) * 1152 = 20409.  For the third frame the
860	    timestamp is TS(3) = TS(2) + (DIS3 + 1) * 1152 = 26169.  Finally,
861	    for the fourth frame of the payload we have TS(4) = TS(3) + (DIS4 +
862	    1) * 1152 = 35385.

864	 4.3.2.4. Frame Type Considerations

866	    The value of Frame Type (FT) is defined in Table 25 in [1].  FT=14
867	    (AUDIO_LOST) is used to denote frames that are lost.  A NO_DATA
868	    (FT=15) frame could be the result of two conditions: First, to
869	    indicate that no data has been produced by the audio encoder, and
870	    second that no data is transmitted in the current payload.  An
871	    example for the latter would be that the frame in question has been
872	    or will be sent in an earlier or later packet.  The duration for
873	    these non-included frames is dependent on the internal sampling
874	    frequency indicated by the ISF field.

876	    For frame types with index 0-13 the ISF field SHALL be set 0.  The
877	    frame duration for these frame types is fixed to 20 ms in time, i.e.
878	    1440 ticks in 72 kHz.  For payloads containing only frames of type
879	    0-9, the TFI field SHALL be set to 0, and SHALL be ignored by the
880	    receiver.  In a payload combining frames of type 0-9 and 10-13 the
881	    TFI values needs to be set to match the transport frames of type 10-
882	    13. Thus, frames of type 0-9 will also have a derived TFI, which is
883	    ignored.

885	 4.3.2.5. Other TOC Considerations

887	    If a ToC entry with an undefined FT value is received, the whole
888	    packet SHALL be discarded.  This is to avoid the loss of data
889	    synchronization in the depacketization process, which can result in
890	    a severe degradation in audio quality.

892	    Packets containing only NO_DATA frames SHOULD NOT be transmitted.
893	    Also, NO_DATA frames at the end of a frame sequence to be carried in
894	    a payload SHOULD NOT be included in the transmitted packet.  The
895	    AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX described in [5]
896	    and can only be used in combination with the AMR-WB frame types (0-
897	    8).

899	    When multiple groups of frames are present, their ToC entries SHALL
900	    be placed in the ToC in the order of increasing RTP timestamp value
901	    (modulo 2^32) of the first transport frame the TOC entry represents,
902	    independent of the payload mode.  In basic mode the frames SHALL be
903	    consecutive in time, while in interleaved mode the frames MAY not
904	    only be non-consecutive in time but MAY even have varying inter
905	    frame distances.

907	 4.3.2.6. ToC Examples

909	    The following example illustrates a ToC for three audio frames in
910	    basic mode.  Note that in this case all audio frames are encoded
911	    using the same frame type, i.e. there is only one ToC entry.

913	     0                   1
914	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
915	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
916	    |0| Frame Type1 |  #frames = 3  |
917	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

919	    The next example depicts a ToC of three entries in basic mode.  Note
920	    that in this case the payload carries also three frames, but three
921	    ToC entries are needed because the frames of the payload are encoded
922	    using different frame types.

924	     0                   1                   2                   3
925	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
926	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
927	    |1| Frame Type1 |  #frames = 1  |1| Frame Type2 |  #frames = 1  |
928	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
929	    |0| Frame Type3 |  #frames = 1  |
930	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

932	    The following example illustrates a ToC with two entries in
933	    interleaved mode using four bit displacement fields.  The payload
934	    includes two groups of frames, the first one including a single
935	    frame, and the other one consisting of two frames.

937	     0                   1                   2                   3
938	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
939	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
940	    |1| Frame Type1 |  #frames = 1  |  DIS1 |  padd |0| Frame Type2 |
941	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
942	    |  #frames = 2  |  DIS1 |  DIS2 |
943	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

945	 4.3.3. Audio Data

947	    Audio data of a payload consists of zero or more audio frames, as
948	    described in the ToC of the payload.

950	    ToC entries with FT=14 or 15 represent frame types with a length of
951	    0.  Hence, no data SHALL be placed in the audio data section to
952	    represent frames of this type.

954	    As already discussed before, each audio frame of an extension frame
955	    type represents an AMR-WB+ transport frame corresponding to the
956	    encoding of 512 samples of audio, sampled with the internal sampling
957	    frequency specified by the ISF indicator.  As an exception, frame
958	    types with index 10-13 are only capable of using a single internal
959	    sampling frequency (25600 Hz).  The encoding rates (combination of
960	    core bit-rate and stereo bit-rate) are indicated in the frame type
961	    field of the corresponding ToC entry.  The octet length of the audio
962	    frame is implicitly defined by the frame type field and is given in
963	    tables 21 and 25 of [1].  The order and numbering notation of the
964	    bits are as specified in [1].  For the AMR-WB+ extension frame types
965	    and comfort noise frames, the bits are in the order produced by the
966	    encoder.  The last octet of each audio frame MUST be padded with
967	    zeroes at the end if not all bits in the octet are used.  In other
968	    words, each audio frame MUST be octet-aligned.

970	 4.3.4. Methods for Forming the Payload

972	    The payload begins with the payload header, followed by the table of
973	    contents that consists of a list of ToC entries.

975	    The audio data follows the table of contents.  All of the octets
976	    comprising an audio frame SHALL be appended to the payload as a
977	    unit.  The audio frames are packetized in timestamp order within
978	    each group of frames (per ToC entry).  The groups of frames are
979	    packetized in the same order as their corresponding ToC entries.
980	    Note that there are no data octets in a group having a ToC entry
981	    with FT=14 or FT=15.

983	 4.3.5. Payload Examples

985	 4.3.5.1. Example 1, Basic Mode Payload Carrying Multiple Frames Encoded
986	    Using the Same Frame Type

988	    Figure 4 depicts a payload that carries three AMR-WB+ frames encoded
989	    using 14 kbit/s frame type (FT=26) with a frame length of 280 bits
990	    (35 bytes).  The internal sampling frequency in this example is 25.6
991	    kHz (ISF = 8).  The TFI for the first frame is 2, indicating that
992	    the first transport frame in this payload is the third in a super-
993	    frame.  Since this payload is in the basic mode the subsequent
994	    frames of the payload are consecutive frames in decoding order, i.e.
995	    the fourth transport frame of the current super-frame and the first
996	    transport frame of the next super-frame.  Note that because the
997	    frames are all encoded using the same frame type, only one ToC entry
998	    is required.

1000	     0                   1                   2                   3
1001	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1002	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1003	    | ISF = 8 | 2 |0|0|  FT = 26    |  #frames = 3  |   f1(0...7)   |
1004	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1005	    : ...                                                           :
1006	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1007	    | ...           | f1(272...279) |   f2(0...7)   |               |
1008	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1009	    : ...                                                           :
1010	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1011	    | f2(272...279) |   f3(0...7)   | ...                           |
1012	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1013	    : ...                                                           :
1014	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1015	    | ...                                           | f3(272...279) |
1016	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1018	    Figure 4: An example of a basic mode payload carrying three frames
1019	    of the same frame type.

1021	 4.3.5.2. Example 2, Basic Mode Payload Carrying Multiple Frames Encoded
1022	    Using Different Frame Types

1024	    Figure 5 depicts a payload that carries three AMR-WB+ frames; the
1025	    first frame is encoded using 18.4 kbit/s frame type (FT=33) with a
1026	    frame length of 368 bits (46 bytes), and the two subsequent frames
1027	    are encoded using 20 kbit/s frame type (FT=35) having frame length
1028	    of 400 bits (50 bytes).  The internal sampling frequency in this
1029	    example is 32 kHz (ISF = 10), implying the overall bit-rates of 23
1030	    kbit/s for the first frame of the payload, and 25 kbit/s for the
1031	    subsequent frames.  The TFI for the first frame is 3, indicating
1032	    that the first transport frame in this payload is the fourth in a
1033	    super-frame.  Since this is a payload in the basic mode, the
1034	    subsequent frames of the payload are consecutive frames in decoding
1035	    order, i.e. the first and second transport frames of the current
1036	    super-frame.  Note that since the payload carries two different
1037	    frame types, there are two ToC entries.

1039	     0                   1                   2                   3
1040	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1041	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1042	    |  ISF=10 | 3 |0|1|  FT = 33    |  #frames = 1  |0|  FT = 35    |
1043	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1044	    |  #frames = 2  |   f1(0...7)   | ...                           |
1045	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1046	    : ...                                                           :
1047	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1048	    | ...                           | f1(360...367) |   f2(0...7)   |
1049	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1050	    : ...                                                           :
1051	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1052	    | f2(392...399) |   f3(0...7)   | ...                           |
1053	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1054	    : ...                                                           :
1055	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1056	    | ...                           | f3(392...399) |
1057	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1059	    Figure 5: An example of a basic mode payload carrying three frames
1060	    employing two different frame types.

1062	 4.3.5.3. Example 3, Payload in Interleaved Mode

1064	    The example in Figure 6 depicts a payload in interleaved mode,
1065	    carrying four frames encoded using 32 kbit/s frame type (FT=47) with
1066	    frame length of 640 bits (80 bytes).  The internal sampling
1067	    frequency is 38.4 kHz (ISF = 13), implying a bit-rate of 48 kbit/s
1068	    for all frames in the payload.  The TFI for the first frame is 0,
1069	    hence it is the first transport frame of a super-frame.  The
1070	    displacement fields for the subsequent frames are DIS2=18, DIS3=15,
1071	    and DIS4=10, which indicates that the subsequent frames have the
1072	    TFIs of 3, 3, and 2, respectively.  The long displacement field flag
1073	    L in the payload header is set to 1, which results in the use of
1074	    eight bits for the displacement fields in the ToC entry.  Note that
1075	    since all frames of this payload are encoded using the same frame
1076	    type, there is need only for a single ToC entry.  Furthermore, the
1077	    displacement field for the first frame (corresponding to the first
1078	    ToC entry with DIS1=0) must be ignored, since its timestamp and TFI
1079	    are defined by the RTP timestamp and the TFI found in the payload
1080	    header.

1082	    The RTP timestamp values of the frames in this example is:
1083	    Frame1: TS1 = RTP Timestamp
1084	    Frame2: TS2 = TS1 + 19 * 960
1085	    Frame3: TS3 = TS2 + 16 * 960
1086	    Frame4: TS4 = TS3 + 11 * 960

1088	     0                   1                   2                   3
1089	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1090	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1091	    |  ISF=13 | 0 |1|0|  FT = 47    |  #frames = 4  |   DIS1 = 0    |
1092	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1093	    |   DIS2 = 18   |   DIS3 = 15   |   DIS4 = 10   |   f1(0...7)   |
1094	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1095	    : ...                                                           :
1096	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1097	    | ...                           | f1(632...639) |   f2(0...7)   |
1098	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1099	    : ...                                                           :
1100	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1101	    | ...                           | f2(632...639) |   f3(0...7)   |
1102	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1103	    : ...                                                           :
1104	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1105	    | ...                           | f3(632...639) |   f4(0...7)   |
1106	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1107	    : ...                                                           :
1108	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1109	    | ...                           | f4(632...639) |
1110	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1112	    Figure 6: An example of an interleaved mode payload carrying four
1113	    frames at the same frame type.

1115	 4.4. Interleaving Considerations

1117	    The use of interleaving requires further considerations.  As
1118	    presented in the example in Section 3.6.2, a given interleaving
1119	    pattern requires a certain amount of the deinterleaving buffer.
1120	    This buffer space, expressed in a number of transport frame slots,
1121	    is indicated by the "interleaving" media parameter.  The number of
1122	    frame slots needed can be converted into actual memory requirements
1123	    by considering the 80 bytes per frame used by the largest
1124	    combination of AMR-WB+'s core and stereo rates.

1126	    The information about the frame buffer size is not always sufficient
1127	    to determine when it is appropriate to start consuming frames from
1128	    the interleaving buffer.  There are two cases in which additional
1129	    information is needed: first, when switching of the ISF occurs, and
1130	    second when the interleaving pattern changes.  The "int-delay" media
1131	    type parameter is defined to convey this information.  It allows a
1132	    sender to indicate the minimal media time that needs to be present
1133	    in the buffer before the decoder can start consuming frames from the
1134	    buffer.  Because the sender has full control over ISF changes and
1135	    the interleaving pattern, it can calculate this value.

1137	    In certain cases, for example if joining a multicast session with
1138	    interleaving mid-session, a receiver may initially receive only part
1139	    of the packets in the interleaving pattern. This initial partial
1140	    reception (in frame sequence order) of frames can yield too few
1141	    frames for acceptable quality from the audio decoding.  This problem
1142	    also arises when using encryption for access control, and the
1143	    receiver does not have the previous key.

1145	    Although the AMR-WB+ is robust and thus tolerant to a high random
1146	    frame erasure rate, it would have difficulties handling consecutive
1147	    frame losses at startup. Thus some special implementation
1148	    considerations are described.  In order to efficiently handle this
1149	    type of startup, it must be noted that decoding is only possible to
1150	    start at the beginning of a super-frame, and that holds true even if
1151	    the first transport frame is indicated as lost.  Secondly, decoding
1152	    is only RECOMMENDED to start if at least 2 transport frames are
1153	    available out of the 4 belonging to that super-frame.

1155	    After receiving a number of packets, in the worst case as many
1156	    packets as the interleaving pattern covers, the previously described
1157	    effects disappear and normal decoding is resumed.

1159	    Similar issues arise when a receiver leaves a session or has lost
1160	    access to the stream. In the case of the receiver leaving the
1161	    session, this would be a minor issue since playout is normally
1162	    stopped. It is also a minor issue for the case of lost access, since
1163	    the AMR-WB+ error concealment will fade out the audio if massive
1164	    consecutive losses are encountered.

1166	    The sender can avoid this type of problems in many sessions by
1167	    starting and ending interleaving patterns correctly when risks of
1168	    losses occur. One such example is a key-change done for access
1169	    control to encrypted streams.  If only some keys are provided to
1170	    clients and there is a risk of them receiving content for which they
1171	    do not have the key, it is recommended that interleaving patterns
1172	    not overlap key changes.

1174	 4.5. Implementation Considerations

1176	    An application implementing this payload format MUST understand all
1177	    the payload parameters.  Any mapping of the parameters to a
1178	    signaling protocol MUST support all parameters.  So an
1179	    implementation of this payload format in an application using SDP is
1180	    required to understand all the payload parameters in there SDP-
1181	    mapped form.  This requirement ensures that an implementation always
1182	    can decide whether it is capable to communicate.

1184	    Both basic and interleaving mode SHALL be implemented.  The
1185	    implementation burden of both is rather small and requiring both
1186	    ensures interoperability.  As the AMR-WB+ codec contains the full
1187	    functionality of the AMR-WB codec, it is RECOMMENDED to also
1188	    implement the payload format in RFC 3267 [7] for the AMR-WB frame
1189	    types when implementing this specification.  Doing so makes the
1190	    interoperability with devices that only support AMR-WB more likely.

1192	    The switching of ISF combined with packet loss could result in
1193	    concealment using the wrong audio frame length.  This can occur if
1194	    packet loss(es) result in lost frames directly after the point of
1195	    ISF change.  The packet loss would prevent the receiver from
1196	    noticing the changed ISF and thereby conceal the lost transport
1197	    frame with the previous ISF, instead of the new one.  Such an error,
1198	    although always later detectable results in boundary misalignment,
1199	    which can cause audio distortions and problems with synchronization,
1200	    as too many or too few audio samples were created.  This problem can
1201	    be mitigated in most cases by performing ISF recovery prior to
1202	    concealment as outlined in section 4.5.1 below.

1204	 4.5.1. ISF recovery in case of packet loss

1206	    In case of packet loss, it is important that the AMR-WB+ decoder
1207	    initiates a proper error concealment to replace the frames carried
1208	    in the lost packet.  A loss concealment algorithm requires a codec
1209	    framing that matches the timestamps of the correctly received
1210	    frames.  Hence, it is necessary to recover the timestamps of the
1211	    lost frames.  Doing in so is non-trivial because the codec frame
1212	    length that is associated with the ISF may have changed during the
1213	    frame loss.

1215	    In the following, the recovery of the timestamp information of lost
1216	    frames is illustrated by the means of an example.  Two frames with
1217	    timestamps t0 and t1 have been received properly, the first one
1218	    being the last packet before the loss, and the latter one is the
1219	    first packet after the loss period.  The ISF values for these
1220	    packets are isf0 and isf1, respectively.  The TFIs of these frames
1221	    are tfi0 and tfi1, respectively.  The associated frame lengths (in
1222	    timestamp ticks) are given as L0 and L1, respectively.  In this
1223	    example three frames with timestamps x1 - x3 have been lost.  The
1224	    example further assumes that ISF changes once from isf0 to isf1
1225	    during the frame loss period, as shown in the figure below.

1227	    Since not all information required for the full recovery of the
1228	    timestamps is generally known in the receiver, an algorithm is
1229	    needed to estimate the ISF associated with the lost frames.  Also
1230	    the number of lost frames  needs to be recovered.

1232	      |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|

1234	      |   Rxd    |   lost   | lost | lost |  Rxd |
1235	    --+----------+----------+------+------+------+--

1237	      t0         x1         x2     x3     t1

1239	    Example Algorithm:

1241	    Start:                              # check for frame loss
1242	    If (t0 + L0) == t1 Then goto End    # no frame loss

1244	    Step 1:                             # check case with no ISF change
1245	    If (isf0 != isf1) Then goto Step 2  # At least one ISF change
1246	    If (isFractional(t1 - t0)/L0) Then goto Step 3
1247	                                        # More than 1 ISF change

1249	    Return recovered timestamps as
1250	    x(n) = t0 + n*L1 and associated ISF equal to isf0, for 0