Network Working Group                                      Johan Sjoberg
INTERNET-DRAFT                                         Magnus Westerlund
Category: Standards Track                                       Ericsson
Expires: August 2004                                       Ari Lakaniemi
                                                                   Nokia
                                                       February 13, 2004


    Real-Time Transport Protocol (RTP) Payload Format for Adaptive Multi-
                  Rate Wideband plus (AMR-WB+) Audio Codec
                  <draft-sjoberg-avt-rtp-amrwbplus-01.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This document is an individual submission to the IETF. Comments
   should be directed to the authors.

Copyright Notice

   Copyright (C) The Internet Society (2004).  All Rights Reserved.

Abstract

   This document specifies a real-time transport protocol (RTP) payload
   format to be used for Adaptive Multi-Rate Wideband plus (AMR-WB+)
   encoded audio signals. The AMR-WB+ codec is an audio extension of the
   AMR-WB codec providing additional modes designed to give higher
   quality of music and speech than the original modes.  The payload
   format is designed according to the principles outlined in the
   existing payload formats for AMR and AMR-WB, RFC3267.  A MIME type
   registration is included for AMR-WB+.


Sjoberg, et. al.                                                [Page 1]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


TABLE OF CONTENTS

1. Definitions.........................................................3
   1.1. Glossary.......................................................3
   1.2. Terminology....................................................3
2. Introduction........................................................3
3. Background on AMR-WB+ and Design Principles.........................4
   3.1. The AMR-WB+ Audio Codec........................................5
   3.2. Multi-rate Encoding and Mode Adaptation........................6
   3.3. Voice Activity Detection and Discontinuous Transmission........6
   3.4. Support for Multi-Channel Session..............................6
   3.5. Unequal Bit-error Detection and Protection.....................7
      3.5.1. Applying UEP and UED in an IP Network.....................7
   3.6. Robustness against Packet Loss.................................8
      3.6.1. Use of Forward Error Correction (FEC).....................8
      3.6.2. Use of Frame Interleaving................................10
   3.7. AMR-WB+ Audio over IP scenarios...............................10
4. RTP Payload Format for AMR-WB+.....................................11
   4.1. RTP Header Usage..............................................11
   4.2. Payload Structure.............................................12
   4.3. Payload definitions...........................................13
      4.3.1. The Payload Header.......................................13
      4.3.2. The Payload Table of Contents and Frame CRCs.............14
      4.3.3. Audio Data...............................................18
      4.3.4. Methods for Forming the Payload..........................18
      4.3.5. Payload Examples.........................................19
   4.4. Implementation Considerations.................................21
5. Congestion Control.................................................21
6. Security Considerations............................................21
   6.1. Confidentiality...............................................22
   6.2. Authentication................................................22
   6.3. Decoding Validation...........................................23
7. Payload Format Parameters..........................................23
   7.1. MIME Registration.............................................23
   7.2. Mapping MIME Parameters into SDP..............................25
      7.2.1. Offer-Answer Model Considerations........................25
      7.2.2. Examples.................................................26
8. IANA Considerations................................................26
9. Acknowledgements...................................................26
10. References........................................................27
   10.1. Normative references.........................................27
   10.2. Informative References.......................................27
11. Authors' Addresses................................................28
12. IPR Notice........................................................29
13. Copyright Notice..................................................30


Sjoberg, et. al.            Standards Track                    [Page 2]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


1. Definitions

1.1. Glossary

   3GPP    - the Third Generation Partnership Project
   AMR     - Adaptive Multi-Rate Codec
   AMR-WB  - Adaptive Multi-Rate Wideband Codec
   AMR-WB+ - Adaptive Multi-Rate Wideband plus Codec
   CMR     - Codec Mode Request
   CN      - Comfort Noise
   DTX     - Discontinuous Transmission
   FEC     - Forward Error Correction
   SCR     - Source Controlled Rate Operation
   SID     - Silence Indicator (the frames containing only CN
             parameters)
   VAD     - Voice Activity Detection
   UED     - Unequal Error Detection
   UEP     - Unequal Error Protection


1.2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [3].


2. Introduction

   This document specifies the payload format for packetization of AMR-
   WB+ encoded audio signals into the Real-time Transport Protocol (RTP)
   [4].  The payload format supports transmission of multiple channels
   according to the mode definition (modes are mono or stereo modes),
   multiple frames per payload, and robustness against packet loss and
   bit errors.

   Background on AMR-WB+ and design principles can be found in Section
   3.  The payload format itself is specified in Section 4 and follows
   the principles used in [4], [8], and [9].  In Section 7, a MIME type
   registration is provided.

   The intention with this RTP payload format definition is to follow
   closely to the payload format definitions of AMR and AMR-WB [9].
   However, AMR-WB+ has a couple of features not available in AMR or
   AMR-WB.  The new features are; all modes do not have the same
   sampling rate, and modes are either mono or stereo modes.  On the
   other hand AMR-WB+ is intended to use IP transport and this removes
   the need for interworking with other transport networks.

   The bandwidth efficient mode defined in [9] is not specified for AMR-
   WB+.  AMR-WB+ will mainly be used in streaming scenarios and there


Sjoberg, et. al.            Standards Track                    [Page 3]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   the benefit of using an octet-aligned format to decrease the
   complexity of the server is large.  The saved bandwidth using
   bandwidth efficient mode would also be very small for all extension
   modes.

   The inbuilt codec support for stereo encoding makes the
   implementation of multi-channel support difficult, but also less
   needed.  Therefore the multi-channel support is removed from this
   payload format compared to AMR and AMR-WB payload format.

   There is no file format for AMR-WB+ defined within this
   specification.  Instead the 3GPP defined ISO based 3GP file format
   [18] will support AMR-WB+, and provides all functionality need from a
   file format.  This format does also support storage of AMR and AMR-
   WB, plus other multi-media formats allowing for synchronized
   playback.  As the 3GP format provides much greater capability than
   the previously defined formats for AMR and AMR-WB, this format is
   expected to be used and be sufficient for all use cases.


3. Background on AMR-WB+ and Design Principles

   The Adaptive Multi-Rate plus (AMR-WB+) audio codec is designed for
   encoding and transport of speech and low bit-rate audio with good
   quality. The codec is being specified by 3GPP, and primary target
   applications within 3GPP are packet switched streaming (PSS) [17] and
   multimedia messaging (MMS) services. However, due to its flexibility
   and robustness, AMR-WB+ is very well suited for streaming services in
   highly varying transport environments, e.g. the Internet.

   Because of the flexibility of this codec, the behavior in a
   particular application is controlled by several parameters that
   select options or specify the acceptable values for a variable. These
   options and variables are described in general terms at appropriate
   points in the text of this specification as parameters to be
   established through out-of-band means. In Section 7, all of the
   parameters are specified in the form of MIME subtype registrations
   for the AMR-WB+ encoding. The method used to signal these parameters
   at session setup or to arrange prior agreement of the participants is
   beyond the scope of this document; however, Section 7 provides a
   mapping of the parameters into the Session Description Protocol (SDP)
   [7] for those applications that use SDP.

   Note that the AMR-WB+ design and specification work in 3GPP is still
   work in progress. Target is to finalize the codec specifications
   within 3GPP Release 6 timeline, the release will be frozen earliest
   in  June 2004. However, due to non-finished status of the codec work
   some of the issues discussed in this internet-draft are still subject
   to change, but the draft presents the situation according to authorsÆ
   best knowledge at the time of writing.


Sjoberg, et. al.            Standards Track                    [Page 4]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


3.1. The AMR-WB+ Audio Codec

   The AMR-WB+ audio codec was originally developed by 3GPP to be used
   for streaming and messaging services in GSM and 3G cellular systems.
   AMR-WB+ is designed as an audio extension to the AMR-WB speech codec.
   Thus, it includes the nine coding modes specified for AMR-WB,
   extended with four new modes with bit rates ranging from 14 to 24
   kbit/s. Whereas the AMR-WB modes employ 16000 Hz sampling frequency
   and operates on monophonic signal in all modes, the extension modes
   operate at sampling rates 16000, 24000 or 32000 Hz, and the input
   signal can be either monophonic or stereophonic audio, depending on
   the mode. The audio processing is performed on equal sizeframes, the
   transport frames correspond to 20 ms duration.  This means that each
   AMR-WB+ transport frame represents 320, 480 or 640 audio samples for
   each channel, depending on the employed sampling frequency.

   The AMR-WB+ codec includes four extension modes in addition to the
   AMR-WB modes, as introduced in Table 1 below. However, since the
   codec design work is still going on, the final specification may
   include different set of modes.

                       Sampling    Mono/     Number of     Number of
   Index    Mode     rate [kHz]  stereo  bits per frame  class A bits
  --------------------------------------------------------------------
     0   WB 6.60 kbps    16       mono       132           54
     1   WB 8.80 kbps    16       mono       177           64
     2   WB 12.65 kbps   16       mono       253           72
     3   WB 14.25 kbps   16       mono       285           72
     4   WB 15.85 kbps   16       mono       317           72
     5   WB 18.25 kbps   16       mono       365           72
     6   WB 19.85 kbps   16       mono       397           72
     7   WB 23.05 kbps   16       mono       461           72
     8   WB 23.85 kbps   16       mono       477           72
     9   WB SID          16       mono        40           40
    10   WB+ 14 kbps     16       mono       280           ??
    11   WB+ 18 kbps    16/24     stereo     360           ??
    12   WB+ 24 kbps    16/24     mono       480           ??
    13   WB+ 24 kbps    16/24     stereo     480           ??
    14   LOST_SPEECH     -          -          0
    15   NO_DATA         -          -          0

   Table 1: AMR-WB+ modes. NOTE! THIS TABLE WILL BE REPLACED BY A
   REFERENCE TO THE APPROPRIATE 3GPP SPECIFICATION AS SOON AS IT IS
   AVAIBLE.

   Note that modes with index in the range 0 û 9 are the same as defined
   for AMR-WB in [9], and modes with index in range 10 û 13 are the
   extension modes.


Sjoberg, et. al.            Standards Track                    [Page 5]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


3.2. Multi-rate Encoding and Mode Adaptation

   The multi-rate encoding (i.e., multi-mode) capability of AMR-WB+ is
   designed for preserving high audio quality under a wide range of
   bandwidth requirements and transmission conditions.

   AMR-WB+ enables seamless switching between modes using the same
   number of audio channels and the same sampling frequency. Every AMR-
   WB+ codec implementation is required to support all the respective
   audio coding modes defined by the codec and must be able to handle
   mode switching between any two modes. Switching between modes
   employing different number of audio channel or different sampling
   frequency is possible, but it requires the receiver to be equipped
   with necessary processing capabilities to take care of the changed
   characteristics of the incoming audio stream, and therefore it is not
   recommended because it is likely to cause severe audio quality
   problems if not taken care properly.


3.3. Voice Activity Detection and Discontinuous Transmission

   AMR-WB+ supports the same algorithms for voice activity detection
   (VAD) and generation of comfort noise (CN) parameters during silence
   periods as used by the AMR-WB codec. Hence, also the AMR-WB+ codec
   has the option to reduce the number of transmitted bits and packets
   during silence periods to a minimum. The operation of sending CN
   parameters at regular intervals during silence periods is usually
   called discontinuous transmission (DTX) or source controlled rate
   (SCR) operation.  The AMR-WB+ frames containing CN parameters are
   called Silence Indicator (SID) frames. See more details about VAD and
   DTX functionality in [5] and [6].


3.4. Support for Multi-Channel Session

   Some of the AMR-WB+ modes support encoding of stereophonic audio.
   Because of this native support for two-channel stereophonic signal it
   does not seem necessary to support multi-channel transport with
   separate codecs as done in AMR-WB RTP payload [9].  However for
   making the signalling of channels explicit, a sender of AMR-WB+ must
   use separate RTP payload types for mono and stereo modes.  A reason
   for having the number of channels present at RTP level is that the
   codec external requirements are different, i.e. the playback
   facilities of a receiver need to handle stereo or mono signals.

   This will not make switching between mono and stereo any more
   different as payload type switching can be done without problems
   since the same RTP timestamp rate is used in both cases.


Sjoberg, et. al.            Standards Track                    [Page 6]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


3.5. Unequal Bit-error Detection and Protection

   The audio bits encoded in each AMR-WB+ frame have different
   perceptual sensitivity to bit errors. This property can be exploited
   e.g. in cellular systems to achieve better voice quality by using
   unequal error protection and detection (UEP and UED) mechanisms.

   The UEP/UED mechanisms focus the protection and detection of
   corrupted bits to the perceptually most sensitive bits in an AMR-WB+
   frame. In particular, audio bits in an AMR-WB+ frame are divided into
   classes A and B, where bits in class A are most sensitive, while
   class B bits can tolerate some errors with only minor degradations in
   the speech quality. [NOTE: reference to appropriate 3GPP
   specification will be added as soon as it is available] A frame is
   only declared damaged if there are bit errors found in the most
   sensitive bits, i.e., the class A bits. On the other hand, it is
   acceptable to have some bit errors in the other bits, i.e. class B
   bits.

   Moreover, a damaged frame is still useful for error concealment at
   the decoder since some of the less sensitive bits can still be used.
   This approach can improve the audio quality compared to discarding
   the damaged frame.

3.5.1. Applying UEP and UED in an IP Network

   To take full advantage of the bit-error robustness of the AMR-WB+
   codec, the RTP payload format is designed to facilitate UEP/UED in an
   IP network.  It should be noted however that the utilization of UEP
   and UED discussed below is OPTIONAL.

   UEP/UED in an IP network can be achieved by detecting bit errors in
   class A bits and tolerating bit errors in class B bits of the AMR-WB+
   frame(s) in each RTP payload.

   Today there exist some link layers that do not discard packets with
   bit errors, e.g., SLIP and some wireless links. With the Internet
   traffic pattern shifting towards a more multimedia-centric one, more
   link layers of such nature may emerge in the future. With transport
   layer support for partial checksums, for example those supported by
   UDP-Lite [10], bit error tolerant AMR-WB+ traffic could achieve
   better performance over these types of links.

   There are at least two basic approaches for carrying AMR-WB+ traffic
   over bit error tolerant IP networks:

   1) Utilizing a partial checksum to cover headers and the most
      important audio bits of the payload. At least all class A bits
      should be covered by the checksum, since the bits of the extension
      modes are not sorted in sensitivity order but just classified in
      class A and B bits.


Sjoberg, et. al.            Standards Track                    [Page 7]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   2) Utilizing a partial checksum to only cover headers, but a frame
      CRC to cover the class A bits of each audio frame in the RTP
      payload.

   In either approach, at least part of the class B bits are left
   without error-check and thus bit error tolerance is achieved.

   The application interface to the UEP/UED transport protocol (e.g.,
   UDP-Lite) may not provide any control over the link error rate.
   Therefore, it is incumbent upon the designer of a node with a link
   interface of this type to choose a residual bit error rate that is
   low enough to support applications such as AMR-WB+ encoding when
   transmitting packets of a UEP/UED transport protocol.

   Approach 1 is a bit efficient, flexible and simple way, but comes
   with two disadvantages, namely, a) bit errors in protected audio bits
   will cause the payload to be discarded, and b) when transporting
   multiple frames in a payload there is the possibility that a single
   bit error in protected bits will cause all the frames to be
   discarded.

   These disadvantages can be avoided, if needed, with some overhead in
   the form of a frame-wise CRC (Approach 2). In problem a), the CRC
   makes it possible to detect bit errors in class A bits and use the
   frame for error concealment, which gives a small improvement in audio
   quality. For b), when transporting multiple frames in a payload, the
   CRCs remove the possibility that a single bit error in a class A bit
   will cause all the frames to be discarded. Avoiding that gives an
   improvement in audio quality when transporting multiple frames over
   links subject to bit errors.

   The choice between the above two approaches must be made based on the
   available bandwidth, and desired tolerance to bit errors. Neither
   solution is appropriate to all cases. Section 7 defines parameters
   that may be used at session setup to select between these approaches.


3.6. Robustness against Packet Loss

   The payload format supports several means, including forward error
   correction (FEC) and frame interleaving, to increase robustness
   against packet loss.

3.6.1. Use of Forward Error Correction (FEC)

   The simple scheme of repetition of previously sent data is one way of
   achieving FEC. Another possible scheme which can be more bandwidth
   efficient is to use payload external FEC, e.g., RFC2733 [14], which
   generates extra packets containing repair data. The whole payload can
   also be sorted in sensitivity order to support external FEC schemes


Sjoberg, et. al.            Standards Track                    [Page 8]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   using UEP. There is also a work in progress on a generic version of
   such a scheme [12] that can be applied to AMR-WB+ payload transport.

   For the AMR-WB+ extension modes, it is only possible to use the codec
   to send redundant copies of the same mode. We describe such a scheme
   next.

   This involves the simple retransmission of previously transmitted
   frames together with the current frame(s). This is done by using a
   sliding window to group the audio frames to send in each payload.
   Figure 1 below shows us an example.

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--

     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->

   Figure 1: An example of redundant transmission.

   In this example each frame is retransmitted one time in the following
   RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio
   frames and p(n-1)..p(n+4) a sequence of payload packets.

   The use of this approach does not require signaling at the session
   setup. In other words, the audio sender can choose to use this scheme
   without consulting the receiver. This is because a packet containing
   redundant frames will not look different from a packet with only new
   frames. The receiver may receive multiple copies or versions (encoded
   with different modes) of a frame for a certain timestamp if no packet
   is lost. If multiple versions of the same audio frame are received,
   it is recommended that the mode with the highest rate be used by the
   audio decoder.

   This redundancy scheme provides the same functionality as the one
   described in RFC 2198 "RTP Payload for Redundant Audio Data" [15]. In
   most cases the mechanism in this payload format is more efficient and
   simpler than requiring both endpoints to support RFC 2198 in
   addition. There are two situations in which use of RFC 2198 is
   indicated: if the spread in time required between the primary and
   redundant encodings is larger than 5 frame times, the bandwidth
   overhead of RFC 2198 will be lower; or, if some other codec than AMR-
   WB+ is desired for the redundant encoding, the AMR-WB+ payload format
   won't be able to carry it.


Sjoberg, et. al.            Standards Track                    [Page 9]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   The sender is responsible for selecting an appropriate amount of
   redundancy based on feedback about the channel, e.g., in RTCP
   receiver reports. The sender is also responsible for avoiding
   congestion, which may be exacerbated by redundancy (see Section 5 for
   more details).

3.6.2. Use of Frame Interleaving

   To decrease protocol overhead, the payload design allows several
   audio frames be encapsulated into a single RTP packet. One of the
   drawbacks of such an approach is that in case of packet loss this
   means loss of several consecutive audio frames, which usually causes
   clearly audible distortion in the reconstructed audio. Interleaving
   of frames can improve the audio quality in such cases by distributing
   the consecutive losses into a series of single frame losses.
   However, interleaving and bundling several frames per payload will
   also increase end-to-end delay and is therefore not appropriate for
   all usage scenarios. Anyway, streaming applications will most likely
   be able to exploit interleaving to improve audio quality in lossy
   transmission conditions.

   This payload design supports the use of frame interleaving as an
   option.  For the encoder (audio sender) to use frame interleaving in
   its outbound RTP packets for a given session, the decoder (audio
   receiver) needs to indicate its support via out-of-band means (see
   Section 7).


3.7. AMR-WB+ Audio over IP scenarios

   Since the primary target for the AMR-WB+ codec is packet switched
   streaming, the most relevant usage scenario for this payload format
   is IP end-to-end between between a server and a terminal, as shown in
   Figure 2.

             +----------+                          +----------+
             |          |    IP/UDP/RTP/AMR-WB+    |          |
             |  SERVER  |<------------------------>| TERMINAL |
             |          |                          |          |
             +----------+                          +----------+

              Figure 2: Server to terminal IP scenario


Sjoberg, et. al.            Standards Track                   [Page 10]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


4. RTP Payload Format for AMR-WB+

   The AMR-WB+ payload format has an identical structure with the AMR
   and AMR-WB payload formats [9].  The differences are that the number
   of modes is extended compared to the original AMR-WB format and that
   some features are removed. The motivation for the reduced
   functionality is that only IP transport expected for AMR-WB+, i.e.
   functionality used for gateway scenarios is removed.  The payload
   format consists of the RTP header, payload header and payload data.

   Since the AMR-WB speech modes are included in the AMR-WB+ codec, an
   end-point supporting AMR-WB+ is in principle also able to support
   AMR-WB payload format and MIME subtype. To enable communication with
   an end-point supporting only AMR-WB coding an AMR-WB+ SHOULD also
   indicate its capability to communicate using AMR-WB MIME subtype and
   RTP payload format to facilitate interoperability. However, it should
   be noted that this is not possible in all scenarios: e.g. when AMR-
   WB+ RTP payload format is used for streaming audio that is stored at
   a server it is not possible to transform data stored using one of the
   AMR-WB+ extension modes into one of the AMR-WB modes without full
   transcoding. A similar scenario occurs with messaging services where
   the message containing AMR-WB+ audio is pre-stored at a messaging
   server. On the other hand, e.g. in live streaming scenario an AMR-WB+
   end-point might have the possibility to limit its operation to AMR-WB
   modes only.

4.1. RTP Header Usage

   The format of the RTP header is specified in [4].  This payload
   format uses the fields of the header in a manner consistent with that
   specification.

   The RTP timestamp corresponds to the sampling instant of the first
   sample encoded for the first frame in the packet.  The timestamp
   clock frequency SHALL be 96000 Hz, the lowest frequency that is an
   integer multiple of the sampling frequencies used by any of the AMR-
   WB+ modes.

   The duration of one AMR-WB+ audio transport frame is 20 ms.  The
   sampling frequency is either 16 kHz, 24 kHz, or 32 kHz, corresponding
   to 320, 480, 640 encoded audio samples per frame from each channel,
   corresponding to a timestamp increase of 6x320, 4x480, or 3x640 all
   equal to 1920 timestamp units per frame.  A packet MAY contain
   multiple frames of encoded audio or comfort noise parameters.  If
   interleaving is employed, the frames encapsulated into a payload are
   picked according to the interleaving rules as defined in Section
   4.3.1.  Otherwise, each packet covers a period of one or more
   contiguous 20 ms frames.

   To allow for error resiliency through redundant transmission, the
   periods covered by multiple packets MAY overlap in time.  A receiver


Sjoberg, et. al.            Standards Track                   [Page 11]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   MUST be prepared to receive any audio frame multiple times, all
   multiply sent frames MUST use the same mode.

   The payload is always made an integral number of octets long by
   padding with zero bits if necessary.  If additional padding is
   required to bring the payload length to a larger multiple of octets
   or for some other purpose, then the P bit in the RTP in the header
   MAY be set and padding appended as specified in [4].

   The RTP header marker bit (M) SHALL be set to 1 if the first frame
   carried in the packet contains an audio frame, which is the first in
   a talkspurt.  For all other packets the marker bit SHALL be set to
   zero (M=0).

   The assignment of an RTP payload type for this new packet format is
   outside the scope of this document, and will not be specified here.
   It is expected that the RTP profile under which this payload format
   is being used will assign a payload type for this encoding or specify
   that the payload type is to be bound dynamically.

   An RTP payload type MUST only carry either mono or stereo encoded AMR
   frames.  If both mono and stereo is to be sent by an application two
   different payload types must be used.  Switching between mono and
   stereo modes MAY be done if the right extra processing is available
   (see section 3.2) in the receiver, through switching of the payload
   types.


4.2. Payload Structure

   The complete payload consists of a payload header, a payload table of
   contents, and audio data representing one or more audio frames.  The
   following diagram shows the general payload format layout:

   +----------------+-------------------+----------------
   | payload header | table of contents | audio data .. .
   +----------------+-------------------+----------------

   Payloads containing more than one audio frame are called compound
   payloads.

   The following sections describe the variations taken by the payload
   format depending on whether the AMR-WB+ session is set up to use any
   of the OPTIONAL functions for robust sorting, interleaving, and frame
   CRCs.


Sjoberg, et. al.            Standards Track                   [Page 12]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


4.3. Payload definitions

4.3.1. The Payload Header

   The payload header consists of a 4 bit CMR, 4 reserved bits, and
   optionally, an 8 bit interleaving header, as shown below:

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+- - - - - - - -
   |  CMR  |R|R|R|R|  ILL  |  ILP  |
   +-+-+-+-+-+-+-+-+- - - - - - - -

   CMR (4 bits): Is used by the AMR and AMR-WB formats to indicate a
   codec mode request sent to the audio encoder at the site of the
   receiver of this payload.  The value of the CMR field is set to the
   frame type index of the corresponding audio mode being requested.
   AMR-WB+ is not intended for conversational use and no gateway
   scenarios are identified.  Hence, this field is not needed for AMR-
   WB+.  The CMR field is kept for conformity with AMR and AMR-WB
   formats, but MUST be set to the value 15, indicating that no mode
   request is present.

   R: is a reserved bit that MUST be set to zero.  All R bits MUST be
      ignored by the receiver.

   ILL (4 bits, unsigned integer): This is an OPTIONAL field that is
      present only if interleaving is signaled out-of-band for the
      session.  ILL=L indicates to the receiver that the interleaving
      length is L+1, in number of frames.

   ILP (4 bits, unsigned integer): This is an OPTIONAL field that is
      present only if interleaving is signaled.  ILP MUST take a value
      between 0 and ILL, inclusive, indicating the interleaving index
      for frames in this payload in the interleave group.  If the
      value of ILP is found greater than ILL, the payload SHOULD be
      discarded.

   ILL and ILP fields MUST be present in each packet in a session if
   interleaving is signaled for the session.  Interleaving MUST be
   performed on a frame basis.

   The following example illustrates the arrangement of audio frames in
   an interleave group during an interleave session.  Here we assume
   ILL=L for the interleave group that starts at audio frame n.  We also
   assume that the first payload packet of the interleave group is s and
   the number of audio frames carried in each payload is N. Then we will
   have:


Sjoberg, et. al.            Standards Track                   [Page 13]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   Payload s (the first packet of this interleave group):
      ILL=L, ILP=0,
      Carry frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1)

   Payload s+1 (the second packet of this interleave group):
      ILL=L, ILP=1,
      frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1)
       ...

   Payload s+L (the last packet of this interleave group):
      ILL=L, ILP=L,
      frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1)

   The next interleave group will start at frame n+N*(L+1).

   There will be no interleaving effect unless the number of frames per
   packet (N) is at least 2.  Moreover, the number of frames per payload
   (N) and the value of ILL MUST NOT be changed inside an interleave
   group.  In other words, all payloads in an interleave group MUST have
   the same ILL and MUST contain the same number of audio frames.

   The sender of the payload MUST only apply interleaving if the
   receiver has signaled its use through out-of-band means.  Since
   interleaving will increase buffering requirements at the receiver,
   the receiver uses MIME parameter "interleaving=I" to set the maximum
   number of frames allowed in an interleaving group to I.

   When performing interleaving the sender MUST use a proper number of
   frames per payload (N) and ILL so that the resulting size of an
   interleave group is less or equal to I, i.e., N*(L+1)<=I.

4.3.2. The Payload Table of Contents and Frame CRCs

   The table of contents (ToC) consists of a list of ToC entries where
   each entry corresponds to an audio frame carried in the payload and,
   optionally, a list of audio frame CRCs, i.e.,

   +---------------------+
   | list of ToC entries |
   +---------------------+
   | list of frame CRCs  | (optional)
    - - - - - - - - - - -

      Note, for ToC entries with FT=14 or 15, there will be no
      corresponding audio frame or frame CRC present in the payload.

   When multiple frames are present in a packet, the ToC entries will be
   placed in the packet in order of their creation time, with the
   following exception; when interleaving is used the frames in the ToC
   will almost never be placed consecutive in time.


Sjoberg, et. al.            Standards Track                   [Page 14]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   A ToC entry takes the following format:

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |F|  FT   |Q|P|P|
   +-+-+-+-+-+-+-+-+

   F (1 bit): If set to 1, indicates that this frame is followed by
      another audio frame in this payload; if set to 0, indicates that
      this frame is the last frame in this payload.

   FT (4 bits): Frame type index, indicating the AMR-WB+
      audio coding mode or comfort noise (SID) mode of the
      corresponding frame carried in this payload.

   The value of FT is defined in Table 1 Section 3.1, FT=14
   (AUDIO_LOST), and FT=15 (NO_DATA) are used to indicate frames that
   are either lost or not being transmitted in this payload,
   respectively.

   NO_DATA (FT=15) frame could mean either that there is no data
   produced by the audio encoder for that frame or that no data for that
   frame is transmitted in the current payload (i.e., valid data for
   that frame could be sent in either an earlier or later packet).

   If receiving a ToC entry with a FT value not defined the whole packet
   SHOULD be discarded.  This is to avoid the loss of data
   synchronization in the depacketization process, which can result in a
   huge degradation in audio quality.

   Note that packets containing only NO_DATA frames SHOULD NOT be
   transmitted.  Also, frames containing only NO_DATA frames at the end
   of a packet SHOULD NOT be transmitted, except in the case of
   interleaving.  The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX
   described in [6].

   Q (1 bit): Frame quality indicator.  If set to 0, indicates the
      corresponding frame is severely damaged and the receiver should
      set the RX_TYPE (see [6]) to either AUDIO_BAD or SID_BAD
      depending on the frame type (FT).

   The frame quality indicator enables damaged frames to be forwarded to
   the audio decoder for error concealment.  This can improve the audio
   quality comparing to dropping the damaged frames.  See Section
   4.3.2.1 for more details.

   P bits: padding bits, MUST be set to zero. All padding bits MUST be
   ignored by the receiver.

   When multiple frames are present, their ToC entries will be placed in
   the ToC in order of their creation time.


Sjoberg, et. al.            Standards Track                   [Page 15]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   The following figure shows an example of a ToC of three entries.

    0                   1                   2
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|  FT   |Q|P|P|1|  FT   |Q|P|P|0|  FT   |Q|P|P|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The list of CRCs is OPTIONAL.  It only exists if the use of CRC is
   signaled out-of-band for the session.  When present, each CRC in the
   list is 8 bit long and corresponds to an audio frame carried in the
   payload.  Calculation and use of the CRC is specified in Section
   4.3.2.1.

4.3.2.1. Use of Frame CRC for UED over IP

   The general concept of UED/UEP over IP is discussed in Section 3.5.
   This section provides more details on how to use the frame CRC in the
   payload header together with a partial transport layer checksum to
   achieve UED.

   To achieve UED, one SHOULD use a transport layer checksum, for
   example, the one defined in UDP-Lite [10], to protect the RTP header,
   payload header, and table of contents bits in a payload.  The frame
   CRC, when used, MUST be calculated only over all class A bits in the
   frame.  Class B and possible C bits in the frame MUST NOT be included
   in the CRC calculation and SHOULD NOT be covered by the transport
   checksum.

      Note, the number of class A bits for various coding modes in
      AMR-WB+ codec is specified as normative in Table 1 in Section 3.1,
      and the SID frame (FT=9) has 40 class A bits.  These definitions
      of class A bits MUST be used for this payload format.

   A packet SHOULD be discarded if the transport layer checksum detects
   errors.

   The receiver of the payload SHOULD examine the data integrity of the
   received class A bits by re-calculating the CRC over the received
   class A bits and comparing the result to the value found in the
   received payload header.  If the two values mismatch, the receiver
   SHALL consider the class A bits in the receiver frame damaged and
   MUST clear the Q flag of the frame (i.e., set it to 0).  This will
   subsequently cause the frame to be marked as AUDIO_BAD, if the FT of
   the frame is 0..8 or 10..13, or SID_BAD if the FT of the frame is 9
   before it is passed to the audio decoder.  See [6] more details.

   The following example shows an octet-aligned ToC with a CRC list for
   a payload containing 3 audio frames from a single channel session
   (assuming none of the FTs is equal to 14 or 15):


Sjoberg, et. al.            Standards Track                   [Page 16]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|  FT#1 |Q|P|P|1|  FT#2 |Q|P|P|0|  FT#3 |Q|P|P|     CRC#1     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     CRC#2     |     CRC#3     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Each of the CRC's takes 8 bits

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | c0| c1| c2| c3| c4| c5| c6| c7|
   +---+---+---+---+---+---+---+---+
   (MSB)                       (LSB)

   and is calculated by the cyclic generator polynomial,

     C(x) = 1 + x^2 + x^3 + x^4 + x^8

   where ^ is the exponentiation operator.

   In binary form the polynomial has the following form: 101110001
   (MSB..LSB).

   The actual calculation of the CRC is made as follows:  First, an 8-
   bit CRC register is reset to zero: 00000000.  For each bit over which
   the CRC shall be calculated, an XOR operation is made between the
   rightmost (LSB) bit of the CRC register and the bit. The CRC register
   is then right shifted one step (each bits significance is reduced
   with one) inputting a "0" as the leftmost bit (MSB). If the result of
   the XOR operation mentioned above is a "1" then "10111000" is bit-
   wise XOR-ed into the CRC register.  This operation is repeated for
   each bit that the CRC should cover.  In this case, the first bit
   would be d(0) for the speech frame for which the CRC should cover.
   When the last bit (e.g., d(71) for AMR-WB 15.85 according to Table 1
   in Section 3.1) have been used in this CRC calculation, the contents
   in CRC register should simply be copied to the corresponding field in
   the list of CRC's.

   Fast calculation of the CRC on a general-purpose CPU is possible
   using a table-driven algorithm.


Sjoberg, et. al.            Standards Track                   [Page 17]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


4.3.3. Audio Data

   Audio data of a payload contains one or more audio frames or comfort
   noise frames, as described in the ToC of the payload.

      Note, for ToC entries with FT=14 or 15, there will be no
      corresponding audio frame present in the audio data.

   Each audio frame represents 20 ms of audio encoded with the mode
   indicated in the FT field of the corresponding ToC entry.  The length
   of the audio frame is implicitly defined by the mode indicated in the
   FT field.  The order and numbering notation of the bits are as
   specified in [2].  As specified there, the bits of audio frames have
   been rearranged in order of decreasing sensitivity or for the
   extension modes in two sensitivity classes, while the bits of comfort
   noise frames are in the order produced by the encoder.  The resulting
   bit sequence for a frame of length K bits is denoted d(0), d(1), ...,
   d(K-1). The last octet of each audio frame MUST be padded with zeroes
   at the end if not all bits in the octet are used.  In other words,
   each audio frame MUST be octet-aligned.

   When multiple audio frames are present in the audio data (i.e.,
   compound payload), the audio frames can be arranged either one whole
   frame after another as usual, or with the octets of all frames
   interleaved together at the octet level. Since the bits within each
   frame are ordered with the most error-sensitive bits first,
   interleaving the octets collects those sensitive bits from all frames
   to be nearer the beginning of the packet.  This is called "robust
   sorting order" which allows the application of UED (such as UDP-Lite
   [10]) or UEP (such as ULP [12]) mechanisms to the payload data.  The
   details of assembling the payload are given in the next section.

   The use of robust sorting order for a session MUST be agreed via out-
   of-band means.  Section 7.1 specifies a MIME parameter for this
   purpose.


4.3.4. Methods for Forming the Payload

   Two different packetization methods, namely normal order and robust
   sorting order, exist for forming a payload.  In both cases, the
   payload header and table of contents are packed into the payload the
   same way; the difference is in the packing of the audio frames.

   The payload begins with the payload header of one octet or two if
   frame interleaving is selected.  The payload header is followed by
   the table of contents consisting of a list of one-octet ToC entries.
   If frame CRCs are to be included, they follow the table of contents
   with one 8-bit CRC filling each octet.  Note that if a given frame
   has a ToC entry with FT=14 or 15, there will be no CRC present.


Sjoberg, et. al.            Standards Track                   [Page 18]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   The audio data follows the table of contents, or the CRCs if present.
   For packetization in the normal order, all of the octets comprising a
   audio frame are appended to the payload as a unit. The audio frames
   are packed in the same order as their corresponding ToC entries are
   arranged in the ToC list, with the exception that if a given frame
   has a ToC entry with FT=14 or 15, there will be no data octets
   present for that frame.

   For packetization in robust sorting order, the octets of all audio
   frames are interleaved together at the octet level.  That is, the
   data portion of the payload begins with the first octet of the first
   frame, followed by the first octet of the second frame, then the
   first octet of the third frame, and so on.  After the first octet of
   the last frame has been appended, the cycle repeats with the second
   octet of each frame.  The process continues for as many octets as are
   present in the longest frame.  If the frames are not all the same
   octet length, a shorter frame is skipped once all octets in it have
   been appended.  The order of the frames in the cycle will be
   sequential if frame interleaving is not in use, or according to the
   interleave pattern specified in the payload header if frame
   interleaving is in use.  Note that if a given frame has a ToC entry
   with FT=14 or 15, there will be no data octets present for that frame
   so that frame is skipped in the robust sorting cycle.

   The UED and/or UEP SHOULD cover at least the RTP header, payload
   header, table of contents, and all class A bits of a sorted payload.
   All class A bit SHOULD be covered since the extension modes do not
   have accurate sorting of the bits in sensitivity order. The bits are
   only sorted in different classes, with the most sensitive bits (class
   A bits) placed in the beginning.  Exactly how many octets need to be
   covered depends on the network and application.  If CRCs are used
   together with robust sorting, only the RTP header, the payload
   header, and the ToC SHOULD be covered by UED/UEP.  The means to
   communicate to other layers performing UED/UEP the number of octets
   to be covered is beyond the scope of this specification.


4.3.5. Payload Examples

4.3.5.1. Example 1, Basic Payload Carrying Multiple Frames

   The following diagram shows a payload from a session that carries two
   AMR-WB+ frames of 14 kbps coding mode (FT=10).  In the payload, the
   codec mode request is set to the default value (CMR=15), the mandated
   disabling of CMR.  No frame CRC, interleaving, or robust-sorting is
   in use.


Sjoberg, et. al.            Standards Track                   [Page 19]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |CMR=15 |R|R|R|R|1|FT#1=10|Q|P|P|0|FT#2=10|Q|P|P|   f1(0..7)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f1(8..15)   |  f1(16..23)   |  ....                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         ...   |f1(272..279)   |   f2(0..7)    |   f2(8..15)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  f2(16..23)   |  ....                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |f2(272..279)   |
   +-+-+-+-+-+-+-+-+

4.3.5.2. Example 2, Payload with CRC, Interleaving, and Robust-sorting

   This example shows a payload with two consecutive frames of 18 kbps
   stereo coding mode (FT=11), are carried in this payload.  In the
   payload, the codec mode request is set to the mandated value (CMR=15)

   Moreover, frame CRC and interleaving are both enabled for the
   session.  The interleaving length is 2 (ILL=1) and this payload is
   the first one in an interleave group (ILP=0).

   The first frame in the payload is frame #1, consisting of bits
   f1(0..359), and the next frame is frame#3, consisting of bits
   f3(0..359), due to interleaving.  For each of the two audio frames a
   CRC is calculated as CRC1(0..7), CRC3(0..7), respectively.  Finally,
   the payload is robust sorted.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |CMR=15 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1=11|Q|P|P|0|FT#3=11|Q|P|P|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      CRC1     |      CRC3     |   f1(0..7)    |   f3(0..7)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  f1(8..15)    |  f3(8..15)    |  f1(16..23)   |  f3(16..23)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :  ...                                                          :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ...    | f1(336..343)  | f3(336..343)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f1(344..359)  | f3(344..351)  | f1(352..359)  | f3(352..359)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Sjoberg, et. al.            Standards Track                   [Page 20]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


4.4. Implementation Considerations

   An application implementing this payload format MUST understand all
   the payload parameters in the out-of-band signaling used.  For
   example, if an application uses SDP, all the SDP and MIME parameters
   in this document MUST be understood.  This requirement ensures that
   an implementation always can decide if it is capable or not of
   communicating.

   Only the basic operation mode of the payload format is mandatory to
   implement.  The other modes of operation, i.e. interleaving, robust
   sorting, and frame-wise CRC are OPTIONAL to implement.  The
   requirements of the application using the payload format should be
   used to determine what to implement.


5. Congestion Control

   The general congestion control considerations for transporting RTP
   data apply to AMR-WB+ audio over RTP as well.  However, the multi-
   rate capability of AMR-WB+ audio coding may provide an advantage over
   other payload formats for controlling congestion since the bandwidth
   demand can be adjusted by selecting a different coding mode.

   Another parameter that may impact the bandwidth demand for AMR-WB+ is
   the number of frames that are encapsulated in each RTP payload.
   Packing more frames in each RTP payload can reduce the number of
   packets sent and hence the overhead from IP/UDP/RTP headers, at the
   expense of increased delay and reduced error robustness against
   packet losses.

   If forward error correction (FEC) is used to combat packet loss, the
   amount of redundancy added by FEC will need to be regulated so that
   the use of FEC itself does not cause a congestion problem.

   It is RECOMMENDED that AMR-WB+ applications using this payload format
   employ congestion control.  The actual mechanism for congestion
   control is not specified but should be suitable for real-time flows,
   e.g., TCP Friendly Rate Control[11]. In the future the usage of
   congestion controlled transport protocols like Datagram Congestion
   Control Protocol (DCCP) [16] may simplify the usage of congestion
   control for application developers.


6. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the general security considerations discussed in
   RFC3550 [4]. As this format transports encoded audio, the main
   security issues include confidentiality, integrity protection, and
   authentication of the audio itself.  The payload format itself does


Sjoberg, et. al.            Standards Track                   [Page 21]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   not have any built-in security mechanisms. External mechanisms, such
   as SRTP [13], MAY be used.

   This payload format or the AMR-WB+ decoder does not exhibit any
   significant non-uniformity in the receiver side computational
   complexity for packet processing and thus is unlikely to pose a
   denial-of-service threat due to the receipt of pathological data.

6.1. Confidentiality

   To achieve confidentiality of the encoded AMR-WB+ audio, all audio
   data bits will need to be encrypted.  There is less a need to encrypt
   the payload header or the table of contents due to 1) that they only
   carry information about the requested audio mode, frame type, and
   frame quality, and 2) that this information could be useful to some
   third party, e.g., quality monitoring.

   As long as the AMR-WB+ payload is only packed and unpacked at either
   end, encryption may be performed after packet encapsulation so that
   there is no conflict between the two operations.

   Interleaving may affect encryption.  Depending on the encryption
   scheme used, there may be restrictions on, for example, the time when
   keys can be changed.  Specifically, the key change may need to occur
   at the boundary between interleave groups.

   The type of encryption method used may impact the error robustness of
   the payload data.  The error robustness may be severely reduced when
   the data is encrypted unless an encryption method without error-
   propagation is used, e.g. a stream cipher.  Therefore, UED/UEP based
   on robust sorting may be difficult to apply when the payload data is
   encrypted.

6.2. Authentication

   To authenticate the sender of the audio and provide integrity
   protection, an external mechanism has to be used.  It is RECOMMENDED
   that such a mechanism protect all the audio data bits and the RTP
   header.  Note that the use of UED/UEP may be difficult to combine
   with authentication because any bit errors will cause authentication
   to fail.

   Data tampering by a man-in-the-middle attacker could result in
   erroneous depacketization/decoding that could lower the audio
   quality.

   To prevent a man-in-the-middle attacker from tampering with the
   payload packets, some additional information besides the audio bits
   SHOULD be protected.  This may include the payload header, ToC, frame
   CRCs, RTP timestamp, RTP sequence number, and the RTP marker bit.


Sjoberg, et. al.            Standards Track                   [Page 22]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


6.3. Decoding Validation

   When processing a received payload packet, if the receiver finds that
   the calculated payload length, based on the information of the
   session and the values found in the payload header fields, does not
   match the size of the received packet, the receiver SHOULD discard
   the packet.  This is because decoding a packet that has errors in its
   length field could severely degrade the audio quality.

7. Payload Format Parameters

   This section defines the parameters that may be used to select
   optional features of the AMR-WB+ payload format.  The parameters are
   defined here as part of the MIME subtype registrations for the AMR-
   WB+ audio codec.  A mapping of the parameters into the Session
   Description Protocol (SDP) [7] is also provided for those
   applications that use SDP.  Equivalent parameters could be defined
   elsewhere for use with control protocols that do not use MIME or SDP.

   The data format and parameters are only specified for real-time
   transport in RTP.

7.1. MIME Registration

   The MIME subtype for the Adaptive Multi-Rate Wideband plus (AMR-WB+)
   codec is allocated from the IETF tree since AMR-WB+ is expected to be
   a widely used audio codec in general streaming applications.

   Note, any unspecified parameter MUST be ignored by the receiver.

   Media Type name:     audio

   Media subtype name:  AMR-WB+

   Required parameters: none

   Optional parameters:

   These parameters apply to RTP transfer only.

   channels:       The number of audio channels present in the audio
                   frames. Permissible values are 1 (mono) or 2
                   (stereo). An RTP payload type SHALL only contain mono
                   or stereo modes, not both. If switching is desired
                   between mono or stereo two payload types will need to
                   be declared. If no parameter is present, the number
                   of channels is 1 (mono).

   maxptime:        see Section 8 in RFC 3267 [9].


Sjoberg, et. al.            Standards Track                   [Page 23]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   crc:            Permissible values are 0 and 1.  If 1, frame CRCs
                   SHALL be included in the payload, otherwise not. If 0
                   or if not present, CRCs SHALL not be included.

   robust-sorting: Permissible values are 0 and 1.  If 1, the payload
                   SHALL employ robust payload sorting.  If 0 or if not
                   present, simple payload sorting SHALL be used.

   interleaving:   Indicates that frame level interleaving SHALL be
                   used for the session and its value defines the
                   maximum number of frame allowed in an interleaving
                   group (see Section 4.3.1).  If this parameter is not
                   present, interleaving SHALL not be used.

   ptime:           see RFC2327 [7].


   Encoding considerations:
                This type is only defined for transfer via RTP (RFC
                3550) and as described in Section 4 of RFC XXXX.

   Security considerations:
                See Section 6 of RFC XXXX.

   Public specification:
                Please refer to Section 10 of RFC XXXX.

   Additional information:
                File storage of the AMR-WB+ format is recommended to be
                done in the 3GPP defined ISO based multimedia file
                format defined in 3GPP TS 26.244, see reference [18] of
                RFC XXXX. The file format has the MIME types
                "audio/3GPP" or "video/3GPP".

                To maintain interoperability with AMR-WB capable end-
                points, in cases where negotiation is possible, an AMR-
                WB+ end-point SHOULD declare itself also as AMR-WB
                capable.

                As the AMR-WB+ decoder is capable of performing stereo
                to mono conversions, all receivers of AMR-WB+ should be
                able to receive both stereo and mono, although the
                receiver only is capable of playout of mono signals.

   Person & email address to contact for further information:
                johan.sjoberg@ericsson.com
                ari.lakaniemi@nokia.com

   Intended usage: COMMON.
                It is expected that many IP based streaming
                applicationswill use this type.


Sjoberg, et. al.            Standards Track                   [Page 24]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   Author/Change controller:
                johan.sjoberg@ericsson.com
                ari.lakaniemi@nokia.com
                IETF Audio/Video transport working group


7.2. Mapping MIME Parameters into SDP

   The information carried in the MIME media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP)
   [7], which is commonly used to describe RTP sessions.  When SDP is
   used to specify sessions employing the AMR-WB+ codec, the mapping is
   as follows:

   -  The MIME type ("audio") goes in SDP "m=" as the media name.

   -  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  The RTP clock rate in "a=rtpmap" SHALL be
      96000 for AMR-WB+, and the encoding parameter number of channels
      MUST either be explicitly set to 1 or 2, or be omitted, implying
      the default value of 1. Only codec modes agreeing with the
      signalled number of channels may be used.


   -  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

   -  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the MIME media type string as a
      semicolon separated list of parameter=value pairs.


7.2.1. Offer-Answer Model Considerations

   To achieve good interoperability for the AMR-WB+ RTP payload in an
   Offer-Answer negotiative usage in SDP the following considerations
   should be made:

   -  Each combination of the RTP payload configuration parameters (crc,
      robust-sorting, and interleaving) is unique in its bit-pattern and
      not compatible with any other combination. Due to the application
      dependent nature of any configuration and they being optionally to
      implement, care must be taken. When creating an offer in an
      application desiring to use the more advance features (crc,
      robust-sorting, or interleaving), the offerer is RECOMMENDED to
      also offer an payload type containing only the octet-align
      configuration. If multiple configurations are of interest to the
      application they may all be offered, however care should be taken


Sjoberg, et. al.            Standards Track                   [Page 25]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


      to not offer too many payload types.

   -  As one can use both mono and stereo modes, and these require
      different payload types to be declared/negotiated, both stereo and
      mono payload types SHOULD be offered.

   -  The parameters "maxptime" and "ptime" should in most cases not
      affect the interoperability, however the setting of the parameters
      can affect the performance of the application.

   -  To maintain interoperability with AMR-WB in cases where
      negotiation is possible, an AMR-WB+ capable end-point SHOULD also
      declare itself capable of AMR-WB as it is a subset of AMR-WB+.


7.2.2. Examples

   One example SDP session description utilizing AMR-WB+ mono and stereo
   encoding follow.

    m=audio 49120 RTP/AVP 98 99
    a=rtpmap:98 AMR-WB+/96000/1
    a=rtpmap:99 AMR-WB+/96000/2
    a=fmtp:98 interleaving=30
    a=fmtp:99 interleaving=30 a=maxptime:100

   Note that the payload format (encoding) names are commonly shown in
   upper case.  MIME subtypes are commonly shown in lower case.  These
   names are case-insensitive in both places.  Similarly, parameter
   names are case-insensitive both in MIME types and in the default
   mapping to the SDP a=fmtp attribute.

8. IANA Considerations

   It is request that one new MIME subtypes is registered by IANA, see
   Section 7.

9. Acknowledgements

   The authors would like to thank Redwan Salami and Stefan Bruhn for
   their significant contributions made throughout the writing and
   reviewing of this document. We would also like to acknolwedge
   Qiaobing Xie coauthor of RFC 3267 on which this document is based on.


Sjoberg, et. al.            Standards Track                   [Page 26]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


10. References

10.1. Normative references

   [1]  3GPP TS 26.xxx "AMR Wideband plus audio codec; Transcoding
        functions", version 6.0.0 (2004-xx), 3rd Generation Partnership
        Project (3GPP).
   [2]  3GPP TS 26.xxx "AMR Wideband plus audio codec; Frame Structure",
        version 6.0.0 (2004-xx), 3rd Generation Partnership Project
        (3GPP).
   [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.
   [4]  H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
        Transport Protocol for Real-Time Applications", RFC 3550 July
        2003.
   [5]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
        aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
        Project (3GPP).
   [6]  3GPP TS 26.193 "AMR Wideband speech codec; Source Controled Rate
        operation", version 5.0.0 (2001-03), 3rd Generation Partnership
        Project (3GPP).
   [7]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, April 1998.
   [8]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
        with Minimal Control", RFC 3551, July 2003.
   [9]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
        Time Transport Protocol (RTP) Payload Format and File Storage
        Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
        Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.

10.2. Informative References

   [10] Larzon, L., Degermark, M. and S. Pink, "The UDP Lite Protocol",
        Work in Progress.
   [11] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly Rate
        Control (TFRC): Protocol Specification", RFC 3448, Internet
        Engineering Task Force, January 2003.
   [12] Li, A., et. al., "An RTP Payload Format for Generic FEC with
        Uneven Level Protection", Work in Progress.
   [13] Baugher, et. al., "The Secure Real Time Transport Protocol",
        Work in Progress.
   [14] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
        Generic Forward Error Correction", RFC 2733, December 1999.
   [15] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
        Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
        for Redundant Audio Data", RFC 2198, September 1997.
   [16] Kohler, E. et. al., "Datagram Congestion Control Protocol
        (DCCP)", Internet Draft, work in progress.
   [17] 3GPP TS 26.233 "Packet Switched Streaming service", version
        5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP).


Sjoberg, et. al.            Standards Track                   [Page 27]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


   [18] 3GPP TS 26.244 " Transparent end-to-end packet switched
        streaming service (PSS); 3GPP file format (3GP)", version 1.0.0
        (2003-11-28), 3rd Generation Partnership Project (3GPP).


   ETSI documents can be downloaded from the ETSI web server,
   "http://www.etsi.org/".  Any 3GPP document can be downloaded from the
   3GPP webserver, "http://www.3gpp.org/", see specifications.  TIA
   documents can be obtained from "www.tiaonline.org".


11. Authors' Addresses

   Johan Sjoberg
   Ericsson Research
   Ericsson AB
   SE-164 80 Stockholm, SWEDEN

   Phone:   +46 8 50878230
   EMail: Johan.Sjoberg@ericsson.com


   Magnus Westerlund
   Ericsson Research
   Ericsson AB
   SE-164 80 Stockholm, SWEDEN

   Phone:   +46 8 4048287
   EMail: Magnus.Westerlund@ericsson.com


   Ari Lakaniemi
   Nokia Research Center
   P.O.Box 407
   FIN-00045 Nokia Group, FINLAND

   Phone:   +358-71-8008000
   EMail: ari.lakaniemi@nokia.com


Sjoberg, et. al.            Standards Track                   [Page 28]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


12. IPR Notice

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights.  Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11.  Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard.  Please address the information to the IETF Executive
   Director.


Sjoberg, et. al.            Standards Track                   [Page 29]

INTERNET-DRAFT       RTP payload format for AMR-WB+   February 13, 2004


13. Copyright Notice

   Copyright (C) The Internet Society (2004). All Rights Reserved.

   This document and translations of it may be copied and
   furnished to others, and derivative works that comment on or
   otherwise explain it or assist in its implementation may be
   prepared, copied, published and distributed, in whole or in
   part, without restriction of any kind, provided that the above
   copyright notice and this paragraph are included on all such
   copies and derivative works.  However, this document itself may
   not be modified in any way, such as by removing the copyright
   notice or references to the Internet Society or other Internet
   organizations, except as needed for the  purpose of developing
   Internet standards in which case the procedures for copyrights
   defined in the Internet Standards process must be followed, or
   as required to translate it into languages other than English.

   The limited permissions granted above are perpetual and will
   not be revoked by the Internet Society or its successors or
   assigns.

   This document and the information contained herein is provided
   on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
   OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
   IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
   PARTICULAR PURPOSE.


   This Internet-Draft expires in August 2004.


Sjoberg, et. al.            Standards Track                   [Page 30]