idnits 2.17.1 

draft-ietf-avt-evrc-smv-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Obsolete normative reference: RFC 1889 (ref. '4') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 1890 (ref. '5') (Obsoleted by RFC 3551)

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)


     Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                               Adam H. Li
3	draft-ietf-avt-evrc-smv-00.txt                                     UCLA
4	February 4, 2002                                                 Editor
5	Expires: August 4, 2002

7	            An RTP Payload Format for EVRC and SMV Vocoders

9	STATUS OF THIS MEMO

11	   This document is an Internet-Draft and is in full conformance with
12	   all provisions of Section 10 of RFC 2026.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups. Note that other
16	   groups may also distribute working documents as Internet-Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time. It is inappropriate to use Internet- Drafts as reference
21	   material or to cite them other than as work in progress.

23	   The list of current Internet-Drafts can be accessed at
24	   http://www.ietf.org/ietf/1id-abstracts.txt

26	   The list of Internet-Draft Shadow Directories can be accessed at
27	   http://www.ietf.org/shadow.html.

29	ABSTRACT

31	   This document describes the RTP payload format for Enhanced Variable
32	   Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech.
33	   Two sub-formats are specified for different application scenarios. A
34	   bundled/interleaved format is included to reduce the effect of packet
35	   loss on speech quality and amortize the overhead of the RTP header
36	   over more than one speech frame. A non-bundled format is also
37	   supported for conversational applications.

39	Table of Contents

41	   1. Introduction ................................................... 2
42	   2. Background ..................................................... 2
43	   3. The Codecs Supported ........................................... 3
44	   3.1. EVRC ......................................................... 3
45	   3.2. SMV .......................................................... 3
46	   3.3. Other Frame-Based Vocoders ................................... 4
47	   4. RTP/Vocoder Packet Format ...................................... 4
48	   4.1. Type 1 Interleaved/Bundled Packet Format ..................... 4
49	   4.2. Type 2 Header-Free Packet Format ............................. 6
50	   4.3. Detecting the Format of Packets .............................. 6
51	   5. Packet Table of Contents Entries and Codec Data Frame Format ... 7
52	   5.1. Packet Table of Contents entries ............................. 7
53	   5.2. Codec Data Frames ............................................ 8
54	   6. Interleaving Codec Data Frames in Type 1 Packets ............... 9
55	   6.1. Finding Interleave Group Boundaries ......................... 10
56	   6.2. Reconstructing Interleaved Speech ........................... 11
57	   6.3. Receiving Invalid Interleaving Values ....................... 12
58	   6.4. Additional Receiver Responsibilities ........................ 12
59	   7. Bundling Codec Data Frames in Type 1 Packets .................. 12
60	   8. Handling Missing Codec Data Frames ............................ 12
61	   9. Implementation Issues ......................................... 13
62	   9.1. Interleaving Length ......................................... 13
63	   9.2. Mode Request ................................................ 13
64	   10. IANA Considerations .......................................... 14
65	   10.1 Storage Mode ................................................ 14
66	   10.2 EVRC MIME Registration ...................................... 15
67	   10.3 SMV MIME Registration ....................................... 16
68	   11. Mapping to SDP Parameters .................................... 17
69	   12. Security Considerations ...................................... 17
70	   13. Adding Support of Other Frame-Based Vocoders ................. 18
71	   14. Acknowledgements ............................................. 18
72	   15. References ................................................... 18
73	   16. Authors' Address ............................................. 19

75	1. Introduction

77	   This document describes how speech compressed with EVRC [1] or SMV
78	   [2] may be formatted for use as an RTP payload type.  The format is
79	   also extensible to other codecs that generate a similar set of frame
80	   types. Two methods are provided to packetize the codec data frames
81	   into RTP packets: an interleaved/bundled format and a zero-header
82	   format. The sender may choose the best format for each application
83	   scenario, based on network conditions, bandwidth availability, delay
84	   requirements, and packet-loss tolerance.

86	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
87	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
88	   document are to be interpreted as described in RFC 2119 [3].

90	2. Background

92	   The 3rd Generation Partnership Project 2 (3GPP2) has published two
93	   standards which define speech compression algorithms for CDMA
94	   applications: EVRC [1] and SMV [2]. EVRC is currently deployed in
95	   millions of first and second generation CDMA handsets. SMV is the
96	   preferred speech codec standard for CDMA2000, and will be deployed in
97	   third generation handsets in addition to EVRC. Improvements and new
98	   codecs will keep emerging as technology improves, and future handsets
99	   will likely support multiple codecs.

101	   The formats of the EVRC and SMV codec frames are very similar. Many
102	   other vocoders also share common characteristics, and have many
103	   similar application scenarios. This parallelism enables an RTP
104	   payload format to be designed for EVRC and SMV that may also support
105	   other, similar vocoders with minimal additional specification work.
106	   This can simplify the protocol for transporting vocoder data frames
107	   through RTP and reduce the complexity of implementations.

109	3. The Codecs Supported

111	3.1. EVRC

113	   The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20
114	   milliseconds of 8000 Hz, 16-bit sampled speech input into output
115	   frames in one of the three different sizes: Rate 1 (171 bits), Rate
116	   1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two zero
117	   bit codec frame types: null frames and erasure frames. Null frames
118	   are produced as a result of the vocoder running at rate 0. Null
119	   frames are zero bits long and are normally not transmitted. Erasure
120	   frames are the frames substituted by the receiver to the codec for
121	   the lost or damaged frames. Erasure frames are also zero bits long
122	   and are normally not transmitted.

124	   The codec chooses the output frame rate based on analysis of the
125	   input speech and the current operating mode (either normal or one of
126	   several reduced rate modes). For typical speech patterns, this
127	   results in an average output of 4.2 kilobits/second for normal mode
128	   and a lower average output for reduced rate modes.

130	3.2. SMV

132	   The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds
133	   of 8000 Hz, 16-bit sampled speech input into output frames of one of
134	   the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate
135	   1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two zero
136	   bit codec frame types: null frames and erasure frames. Null frames
137	   are produced as a result of the vocoder running at rate 0. Null
138	   frames are zero bits long and are normally not transmitted. Erasure
139	   frames are the frames substituted by the receiver to the codec for
140	   the lost or damaged frames. Erasure frames are also zero bits long
141	   and are normally not transmitted.

143	   The SMV codec can operate in four modes. Each mode may produce frames
144	   of any of the rates (full rate to 1/8 rate) for varying percentages
145	   of time, based on the characteristics of the speech samples and the
146	   selected mode. The SMV mode can change on a frame-by-frame basis. The
147	   SMV codec does not need additional information other than the codec
148	   data frames to correctly decode the data of various modes; therefore,
149	   the mode of the encoder does not need to be transmitted with the
150	   encoded frames.

152	   The percentage of different frame rates and the average data rate
153	   (ADR) for the four SMV modes are shown in the table below.

155	                     Mode 0       Mode 1       Mode 2        Mode 3
156	       -------------------------------------------------------------
157	       Rate 1        68.90%       38.14%       15.43%        07.49%
158	       Rate 1/2      06.03%       15.82%       38.34%        46.28%
159	       Rate 1/4      00.00%       17.37%       16.38%        16.38%
160	       Rate 1/8      25.07%       28.67%       29.85%        29.85%
161	       -------------------------------------------------------------
162	       ADR          7205 bps     5182 bps     4073 bps      3692 bps

164	   The SMV codec chooses the output frame rate based on an analysis of
165	   the input speech and the current operating mode. For typical speech
166	   patterns, this results in an average output of 4.2k bits/second for
167	   Mode 0 and lower for other reduced rate modes.

169	   SMV is more bandwidth efficient than EVRC. EVRC is equivalent in
170	   performance to SMV mode 1.

172	3.3. Other Frame-Based Vocoders

174	   Other frame-based vocoders can be carried in the packet format
175	   defined in this document, as long as they possess the following
176	   properties:

178	    o The codec is frame-based;
179	    o blank and erasure frames are supported;
180	    o the total number of rates is less than 17;
181	    o the maximum full rate frame can be transported in a single RTP
182	      packet using this specific format.

184	   Vocoders with the characteristics listed above can be transported
185	   using the packet format specified in this document with some
186	   additional specification work; the pieces that must be defined are
187	   listed in Section 13.

189	4. RTP/Vocoder Packet Format

191	   The RTP payload data MUST be transmitted in packets of one of the
192	   following two types.

194	4.1. Type 1 Interleaved/Bundled Packet Format

196	   This format is used to send one or more vocoder frames per packet.
197	   Interleaving or bundling MAY be used. The RTP packet for this format
198	   is as follows:

200	    0                   1                   2                   3
201	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
202	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
203	   |                      RTP Header [4]                           |
204	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
205	   |R|R| LLL | NNN | FFF |  Count  |  TOC  |  ...  |  TOC  |padding|
206	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
207	   |        one or more codec data frames, one per TOC entry       |
208	   |                             ....                              |
209	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

211	   The RTP header has the expected values as described in the RTP
212	   specification [4]. The RTP timestamp is in 1/8000 of a second units
213	   for EVRC and SMV. For any other vocoders that use this packet format,
214	   the timestamp unit needs to be defined explicitly. The M bit should
215	   be set as specified in the applicable RTP profile, for example, RFC
216	   1890 [5]. Note that RFC 1890 [5] specifies that if the sender does
217	   not suppress silence, the M bit will always be zero. When multiple
218	   codec data frames are present in a single RTP packet, the timestamp
219	   is, as always, that of the oldest data represented in the RTP packet.
220	   The assignment of an RTP payload type for this new packet format is
221	   outside the scope of this document, and will not be specified here.
222	   It is expected that the RTP profile for a particular class of
223	   applications will assign a payload type for this encoding, or if that
224	   is not done, then a payload type in the dynamic range shall be chosen
225	   by the sender.

227	   The first octet of a Type 1 Interleaved/Bundled format packet is the
228	   Interleave Octet. The second octet contains the Mode Request and
229	   Frame Count fields. The Table of Contents (ToC) field then follows.
230	   The fields are specified as follows:

232	   Reserved (RR): 2 bits
233	      Reserved bits. MUST be set to zero by sender, SHOULD be ignored
234	      by receiver.

236	   Interleave Length (LLL): 3 bits
237	      Indicates the length of interleave; a value of 0 indicates
238	      bundling, a special case of interleaving. See Section 6 and
239	      Section 7 for more detailed discussion.

241	   Interleave Index (NNN): 3 bits
242	      Indicates the index within an interleave group. MUST have a value
243	      less than or equal to the value of LLL. Values of NNN greater
244	      than the value of LLL are invalid. Packet with invalid NNN values
245	      SHOULD be ignored by the receiver.

247	   Mode Request (FFF): 3 bits
248	      The Mode Request field is used to signal Mode Request
249	      information. See Section 9.2 for details.

251	   Frame Count (Count): 5 bits
252	      Indicates the number of ToC fields (and therefore vocoder frames)
253	      present. A value of zero indicates that the packet contains one
254	      ToC field (and vocoder frame). A value of 31 indicates 32 ToC
255	      fields (and vocoder frames) are in the packet. The number of ToC
256	      fields (and vocoder frames) present is the value of the frame
257	      count field plus one.

259	   Padding (padding): 0 or 4 bits
260	      This padding ensures that codec data frames start on an octet
261	      boundary. When the frame count is odd, the sender MUST add 4 bits
262	      of padding following the last TOC. When the frame count is even,
263	      the sender MUST NOT add padding bits. If padding is present, the
264	      padding bits MUST be set to zero by sender, and SHOULD be ignored
265	      by receiver.

267	   The Table of Contents field (ToC) provides information on the codec
268	   data frame(s) in the packet. There is one ToC entry for each codec
269	   data frame. The detailed formats of the ToC field and codec data
270	   frames are specified in Section 5.

272	   Multiple data frames may be included within a Type 1
273	   Interleaved/Bundled packet using interleaving or bundling as
274	   described in Section 6 and Section 7.

276	4.2. Type 2 Header-Free Packet Format

278	   The Type 2 Header-Free Packet Format is designed for maximum
279	   bandwidth efficiency and low latency. Only one codec data frame can
280	   be sent in each Type 2 Header-Free format packet. None of the payload
281	   header fields (LLL, NNN, FFF, Count) nor ToC entries are present. The
282	   codec rate for the data frame can be determined from the length of
283	   the codec data frame, since there is only one codec data frame in
284	   each Type 2 Header-Free packet.

286	   Use of the RTP header fields for Type 2 Header-Free RTP/Vocoder
287	   Packet Format is the same as described in Section 4.1 for Type 1
288	   Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format of
289	   the codec data frame is specified in Section 5.

291	    0                   1                   2                   3
292	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
293	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
294	   |                      RTP Header [4]                           |
295	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
296	   |                                                               |
297	   +          ONLY one codec data frame            +-+-+-+-+-+-+-+-+
298	   |                                               |
299	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

301	4.3. Detecting the Format of Packets

303	   All receivers MUST be able to process both types of packets. The
304	   sender MAY choose to use one or both types of packets.

306	   A receiver MUST have prior knowledge of the packet type to correctly
307	   decode the RTP packets. The packet types used in an RTP session MUST
308	   be specified by the sender, and signaled through out-of-band means,
309	   for example by SDP during the setup of a session.

311	   When packets of both formats are used within the same session,
312	   different RTP payload type values MUST be used for each format to
313	   distinguish the packet formats. The association of payload type
314	   number with the packet format is done out-of-band, for example by SDP
315	   during the setup of a session.

317	5. Packet Table of Contents Entries and Codec Data Frame Format

319	5.1. Packet Table of Contents entries

321	   Each codec data frame in a Type 1 Interleaved/Bundled packet has a
322	   corresponding Table of Contents (ToC) entry. The ToC entry indicates
323	   the rate of the codec frame. (Type 2 Header-Free packets MUST NOT
324	   have a ToC field, and there is always only one codec data frame in
325	   each Type 2 Header-Free packet.)

327	   Each ToC entry is occupies four bits. The format of the bits is
328	   indicated below:

330	       0 1 2 3
331	      +-+-+-+-+
332	      |fr type|
333	      +-+-+-+-+

335	   Frame Type: 4 bits
336	      The frame type indicates the type of the corresponding codec data
337	      frame in the RTP packet.

339	      For EVRC and SMV codecs, the frame type values and size of the
340	      associated codec data frame are described in the table below:

342	      Value   Rate      Total codec data frame size (in octets)
343	      ---------------------------------------------------------
344	        0     Blank      0    (0 bit)
345	        1     1/8        2    (16 bits)
346	        2     1/4        5    (40 bits; not valid for EVRC)
347	        3     1/2       10    (80 bits)
348	        4     1         22    (171 bits; 5 padded at end with zeros)
349	        5     Erasure    0    (SHOULD NOT be transmitted by sender)

351	      All values not listed in the above table MUST be considered
352	      reserved. A ToC entry with a reserved Frame Type value SHOULD be
353	      considered invalid and substituted with an erasure frame. Note
354	      that the EVRC codec does not have 1/4 rate frames, thus frame
355	      type value 2 MUST be considered a reserved value when the EVRC
356	      codec is in use.

358	      Other vocoders that use this packet format need to specify their
359	      own table of frame types and corresponding codec data frames.

361	5.2. Codec Data Frames

363	   The output of the vocoder MUST be converted into codec data frames
364	   for inclusion in the RTP payload. The conversions for EVRC and SMV
365	   codecs are specified below. (Note: Because the EVRC codec does not
366	   have Rate 1/4 frames, the specifications of 1/4 frames does not apply
367	   to EVRC codec data frames). Other vocoders that use this packet
368	   format need to specify how to convert vocoder output data into
369	   frames.

371	   The codec output data bits as numbered in EVRC and SMV are packed
372	   into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2,
373	   Rate 1/4 and Rate 1/8) is placed in the most significant bit
374	   (internet bit 0) of octet 1 of the codec data frame, the second
375	   lowest bit is placed in the second most significant bit of the first
376	   octet, the third lowest in the third most significant bit of the
377	   first octet, and so on. This continues until all of the bits have
378	   been placed in the codec data frame.

380	   The remaining unused bits of the last octet of the codec data frame
381	   MUST be set to zero. Note that in EVRC and SMV this is only
382	   applicable to Rate 1 frames (171 bits) as the Rate 1/2 (80 bits),
383	   Rate 1/4 (40 bits, SMV only) and Rate 1/8 frames (16 bits) fit
384	   exactly into a whole number of octets.

386	   Following is a detailed listing showing a Rate 1 EVRC/SMV codec
387	   output frame converted into a codec data frame:

389	   The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets
390	   long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are
391	   placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV
392	   codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly,
393	   but do not require zero padding because they align on octet
394	   boundaries.

396	                    Rate 1 codec data frame (octets 0 - 3)

398	    0                   1                   2                   3
399	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
400	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401	   |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
402	   |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
403	   |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
404	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
405	                    Rate 1 codec data frame (octets 19 - 21)

407	    1           1                   1                   1
408	    4           5                   6                   7
409	    4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
410	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
411	   |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
412	   |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
413	   |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
414	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

416	6. Interleaving Codec Data Frames in Type 1 Packets

418	   As indicated in Section 4.1, more than one codec data frame MAY be
419	   included in a single Type 1 Interleaved/Bundled packet by a sender.
420	   This is accomplished by interleaving or bundling.

422	   Bundling is used to spread the transmission overhead of the RTP and
423	   payload header over multiple vocoder frames. Interleaving
424	   additionally reduces the listener's perception of data loss by
425	   spreading such loss over non-consecutive vocoder frames. EVRC, SMV,
426	   and similar vocoders are able to compensate for an occasional lost
427	   frame, but speech quality degrades exponentially with consecutive
428	   frame loss.

430	   Bundling is signaled by setting the LLL field to zero and the Count
431	   field to greater than zero. Interleaving is indicated by setting the
432	   LLL field to a value greater than zero.

434	   The discussions on general interleaving apply to the bundling (which
435	   can be viewed as a reduced case of interleaving) with reduced
436	   complexity. The bundling case is discussed in detail in Section 7.

438	   Senders MAY support interleaving and/or bundling. All receivers MUST
439	   support interleaving and bundling.

441	   Given a time-ordered sequence of output frames from the EVRC codec
442	   numbered 0..n, a bundling value B (in the Count field), and an
443	   interleave length L where n = B * (L+1) - 1, the output frames are
444	   placed into RTP packets as follows (the values of the fields LLL and
445	   NNN are indicated for each RTP packet):

447	   First RTP Packet in Interleave group:
448	      LLL=L, NNN=0
449	      Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of
450	      B frames

452	   Second RTP Packet in Interleave group:
453	      LLL=L, NNN=1
454	      Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
455	      total of B frames

457	   This continues to the last RTP packet in the interleave group:

459	   L+1 RTP Packet in Interleave group:
460	      LLL=L, NNN=L
461	      Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a
462	      total of B frames

464	   Within each interleave group, the RTP packets making up the
465	   interleave group MUST be transmitted in value-increasing order of the
466	   NNN field. While this does not guarantee reduced end-to-end delay on
467	   the receiving end, when packets are delivered in order by the
468	   underlying transport, delay will be reduced to the minimum possible.

470	   Receivers MAY signal the maximum number of codec data frames (i.e.,
471	   the maximum acceptable bundling value B) they can handle in a single
472	   RTP packet using the OPTIONAL maxptime RTP mode parameter identified
473	   in Section 10.

475	   Receivers MAY signal the maximum interleave length (i.e., the maximum
476	   acceptable LLL value in the Interleaving Octet) they will accept
477	   using the OPTIONAL maxinterleave RTP mode parameter identified in
478	   Section 10.

480	   Additionally, senders have the following restrictions:

482	   o  MUST NOT bundle more codec data frames in a single RTP packet than
483	      indicated by maxptime (see Section 10) if it is signaled.

485	   o  SHOULD NOT bundle more codec data frames in a single RTP packet
486	      than will fit in the MTU of the underlying network.

488	   o  Once beginning a session with a given maximum interleaving value
489	      set by maxinterleave in Section 10, MUST NOT increase the
490	      interleaving value (LLL) to exceed the maximum interleaving value
491	      that is signaled.

493	   o  MAY change the interleaving value only between interleave groups.

495	   o  Silence suppression MAY only be used between interleave groups. A
496	      ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used
497	      within interleaving groups if the codec outputs a blank frame.
498	      The M bits in the RTP header MUST NOT be set, as the stream is
499	      continuous in time. Because there is only one time stamp for each
500	      RTP packet, silence suppression used within an interleave group
501	      will cause ambiguities when reconstructing the speech at the
502	      receiver side, and thus is prohibited.

504	6.1. Finding Interleave Group Boundaries

506	   Given an RTP packet with sequence number S, interleave length (field
507	   LLL) L, interleave index value (field NNN) N, and bundling value B,
508	   the interleave group consists of this RTP packet and other RTP
509	   packets with sequence numbers from S-N to S-N+L inclusive. (The
510	   sequence numbers used here are for illustrative purposes. When
511	   wrapping around happens, the sequence numbers need to be adjusted
512	   accordingly). In other words, the interleave group always consists of
513	   L+1 RTP packets with sequential sequence numbers. The bundling value
514	   for all RTP packets in an interleave group MUST be the same.

516	   The receiver determines the expected bundling value for all RTP
517	   packets in an interleave group by the number of codec data frames
518	   bundled in the first RTP packet of the interleave group received.
519	   Note that this may not be the first RTP packet of the interleave
520	   group if packets are delivered out of order by the underlying
521	   transport.

523	   On receipt of an RTP packet in an interleave group with other than
524	   the expected bundling value, the receiver MAY discard codec data
525	   frames off the end of the RTP packet or add erasure codec data frames
526	   to the end of the packet in order to manufacture a substitute packet
527	   with the expected bundling value.  The receiver MAY instead choose to
528	   discard the whole interleave group.

530	6.2. Reconstructing Interleaved Speech

532	   Given an RTP sequence number ordered set of RTP packets in an
533	   interleave group numbered 0..L, where L is the interleave length and
534	   B is the bundling value, and codec data frames within each RTP packet
535	   that are numbered in order from first to last with the numbers 1..B,
536	   the original, time-ordered sequence of output frames from the EVRC
537	   codec may be reconstructed as follows:

539	   First L+1 frames:
540	      Frame 0 from packet 0 of interleave group
541	      Frame 0 from packet 1 of interleave group
542	      And so on up to...
543	      Frame 0 from packet L of interleave group

545	   Second L+1 frames:
546	      Frame 1 from packet 0 of interleave group
547	      Frame 1 from packet 1 of interleave group
548	      And so on up to...
549	      Frame 1 from packet L of interleave group

551	   And so on up to...

553	   Bth L+1 frames:
554	      Frame B from packet 0 of interleave group
555	      Frame B from packet 1 of interleave group
556	      And so on up to...
557	      Frame B from packet L of interleave group

559	6.3. Receiving Invalid Interleaving Values

561	   On receipt of an RTP packet with an invalid value of the LLL or NNN
562	   fields, the RTP packet SHOULD be treated as lost by the receiver for
563	   the purpose of generating erasure frames as described in Section 8.

565	6.4. Additional Receiver Responsibilities

567	   Assume that the receiver has begun playing frames from an interleave
568	   group. The time has come to play frame x from packet n of the
569	   interleave group. Further assume that packet n of the interleave
570	   group has not been received. As described in section 8, an erasure
571	   frame will be sent to the receiving vocoder.

573	   Now, assume that packet n of the interleave group arrives before
574	   frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of
575	   the newly received packet n rather than substituting an erasure
576	   frame. In other words, just because packet n was not available the
577	   first time it was needed to reconstruct the interleaved speech, the
578	   receiver SHOULD NOT assume it is not available when it is
579	   subsequently needed for interleaved speech reconstruction.

581	7. Bundling Codec Data Frames in Type 1 Packets

583	   As discussed in Section 6, the bundling of codec data frames is a
584	   special reduced case of interleaving with LLL value in the Interleave
585	   Octet set to 0.

587	   Bundling codec data frames indicates multiple data frames are
588	   included consecutively in a packet, because the interleaving length
589	   (LLL) is 0. The interleaving group is thus reduced to a single RTP
590	   packet, and the reconstruction of the code data frames from RTP
591	   packets becomes a much simpler process.

593	   Furthermore, the additional restrictions on senders are reduced to:

595	   o  MUST NOT bundle more codec data frames in a single RTP packet than
596	      indicated by maxptime (see Section 10) if it is signaled.

598	   o  SHOULD NOT bundle more codec data frames in a single RTP packet
599	      than will fit in the MTU of the underlying network.

601	8. Handling Missing Codec Data Frames

603	   The vocoders covered by this payload format support erasure frame as
604	   an indication when frames are not available. While an erasure frame
605	   MUST NOT be transmitted by an RTP sender, it MAY be used internally
606	   by a receiver to advance the state of the voice decoder by exactly
607	   one frame time for each missing frame. Using the information from
608	   packet sequence number, time stamp, and the M bit, the receiver can
609	   detect missing codec data frames from RTP packet loss and/or silence
610	   suppression, and generate corresponding erasure frames. Erasure
611	   frames SHOULD also be used in storage mode to record missing frames.

613	9. Implementation Issues

615	9.1. Interleaving Length

617	   The vocoder interpolates the missing speech content when given an
618	   erasure frame. However, the best quality is perceived by the listener
619	   when erasure frames are not consecutive. This makes interleaving
620	   desirable as it increases speech quality when packet loss occurs.

622	   On the other hand, interleaving can greatly increase the end-to-end
623	   delay. Where an interactive session is desired, either Type 1
624	   Interleaved/Bundled with interleaving length (field LLL) 0 or Type 2
625	   Header-Free RTP payload types are RECOMMENDED.

627	   When end-to-end delay is not a concern, an interleaving length (field
628	   LLL) of 4 or 5 is RECOMMENDED.

630	   The parameters maxptime and maxinterleave are exchanged at the
631	   initial setup of the session so that the receiver can allocate a
632	   known amount of buffer space that will be sufficient for all future
633	   reception in that session. During the session, the sender may
634	   decrease the bundling value or interleaving length (so that less
635	   buffer space is required at the receiver), but never require more
636	   buffer space. This prevents the situation where a receiver needs to
637	   allocate more buffer space in the middle of a session but is unable
638	   to do so.

640	9.2. Mode Request

642	   The Mode Request signal requests a particular encoding mode for the
643	   speech encoding in the reverse direction. All implementations are
644	   RECOMMENDED to honor the Mode Request signal. The Mode Request signal
645	   SHOULD only be used in one-to-one sessions. In multiparty sessions,
646	   any received Mode Request signals SHOULD be ignored.

648	   In addition, the Mode Request signal MAY also be sent through non-RTP
649	   means, which is out of the scope of this specification.

651	   The three-bit Mode Request field is used to signal the receiver to
652	   set a particular encoding mode to its audio encoder. If the Mode
653	   Request field is set to a non-zero value in RTP packets from node A
654	   to node B, it is a request for node B to change to the requested
655	   encoding mode for its audio encoder and therefore the bit rate of the
656	   RTP stream from node B to node A. Once a node sets this field to a
657	   non-zero value it SHOULD continue to set the field to the same value
658	   in subsequent packets until the requested mode has changed. This
659	   design helps to eliminate the scenario of getting the codec stuck in
660	   an unintended state if one of the packets that carries the Mode
661	   Request is lost. An otherwise silent node MAY send an RTP packet
662	   containing a blank frame in order to send a Mode Request.

664	   Each codec type using this format SHOULD define its own
665	   interpretation of the Mode Request field. Codecs SHOULD follow the
666	   convention that higher values of the three-bit field correspond to an
667	   equal or lower average output bit rate.

669	   For the EVRC codec, the Mode Request field MUST be interpreted
670	   according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
671	   specifications [1].  Values above '100' (4) are currently reserved.
672	   If an unknown value above '100' (4) is received, it MUST be handled
673	   as if '100' (4) were received.

675	   For SMV codec, the Mode Request field MUST be interpreted according
676	   to Table 2.2-2 of the SMV codec specifications [2]. Values above
677	   '101' (5) are currently reserved. If an unknown value above '101' (5)
678	   is received, it MUST be handled as if '101' (5) were received.

680	10. IANA Considerations

682	   Two new MIME sub-types as described in this section are to be
683	   registered.

685	   The MIME-names for the EVRC and SMV codec are allocated from the IETF
686	   tree since all the vocoders covered are expected to be widely used
687	   for Voice-over-IP applications.

689	   The RTP mode has been described in the previous sections.

691	10.1. Storage Mode

693	   The storage mode is used for storing speech frames, e.g., as a file
694	   or e-mail attachment.

696	   The file begins with a magic number to identify the vocoder that is
697	   used. The magic number for EVRC corresponds to the ASCII character
698	   string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A" in
699	   network byte order. The magic number for SMV corresponds to the ASCII
700	   character string "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a" in
701	   network byte order.

703	   The codec data frames are stored in consecutive order, with a single
704	   TOC entry field, expanded to one octet, prefixing each codec data
705	   frame. The ToC field is expanded to one octet by setting the left-
706	   most four bits of the octet to zero. For example, a ToC value of 4 (a
707	   full-rate frame) is stored as 0x04.

709	   Speech frames lost in transmission and non-received frames MUST be
710	   stored as erasure frames (frame type 5, see definition in Section
711	   5.1) to maintain synchronization with the original media.

713	10.2. EVRC MIME Registration

715	   Media Type Name:     audio

717	   Media Subtype Name:  EVRC

719	   Required Parameter for RTP mode:

721	      ptype:    Indicates the Type of the RTP/Vocoder packets. The
722	         valid values are 1 (Type 1 Interleaved/Bundled) or 2 (Type 2
723	         Header-Free).

725	   Optional parameters for RTP mode:

727	      ptime:    Defined as usual for RTP audio [6].

729	      maxptime: The maximum amount of media which can be encapsulated
730	         in each packet, expressed as time in milliseconds. The time
731	         SHALL be calculated as the sum of the time the media present
732	         in the packet represents. The time SHOULD be a multiple of the
733	         duration of a single codec data frame (20 msec). If not
734	         signaled, the default maxptime value SHALL be 200
735	         milliseconds.

737	      maxinterleave: Maximum number for interleaving length (field LLL
738	         in the Interleaving Octet). The interleaving lengths used in
739	         the entire session MUST NOT exceed this maximum value. If not
740	         signaled, the maxinterleave length SHALL be 5.

742	   Optional parameters for storage mode: none

744	   Encoding considerations for RTP mode: see Section 6 and Section 7 of
745	      RFC xxxx.

747	   Encoding considerations for storage mode: see Section 10.1 of RFC
748	      xxxx.

750	   Security considerations: see Section 12 "Security Considerations" of
751	      RFC xxxx.

753	   Public specification: RFC xxxx.

755	   Additional information for storage mode:
756	      Magic number: #!EVRC\n
757	      File extensions: evc, EVC
758	      Macintosh file type code: none
759	      Object identifier or OID: none

761	   Intended usage: COMMON. It is expected that many VoIP applications
762	      (as well as mobile applications) will use this type.

764	   Person & email address to contact for further information:
765	      Adam Li
766	      adamli@icsl.ucla.edu

768	   Author/Change controller:
769	      Adam Li
770	      adamli@icsl.ucla.edu
771	      IETF Audio/Video Transport Working Group

773	10.3. SMV MIME Registration

775	   Media Type Name:     audio

777	   Media Subtype Name:  SMV

779	   Required Parameter for RTP mode:

781	      ptype:    Indicates the Type of the RTP/Vocoder packets. The
782	         valid values are 1 (Type 1 Interleaved/Bundled) or 2 (Type 2
783	         Header-Free).

785	   Optional parameters for RTP mode:

787	      ptime:    Defined as usual for RTP audio [6].

789	      maxptime: The maximum amount of media which can be encapsulated
790	         in each packet, expressed as time in milliseconds. The time
791	         SHALL be calculated as the sum of the time the media present
792	         in the packet represents. The time SHOULD be a multiple of the
793	         duration of a single codec data frame (20 msec). If not
794	         signaled, the default maxptime value SHALL be 200
795	         milliseconds.

797	      maxinterleave: Maximum number for interleaving length (field LLL
798	         in the Interleaving Octet). The interleaving lengths used in
799	         the entire session MUST NOT exceed this maximum value. If not
800	         signaled, the maxinterleave length SHALL be 5.

802	   Optional parameters for storage mode: none

804	   Encoding considerations for RTP mode: see Section 6 and Section 7 of
805	      RFC xxxx.

807	   Encoding considerations for storage mode: see Section 10.1 of RFC
808	      xxxx.

810	   Security considerations: see Section 12 "Security Considerations" of
811	      RFC xxxx.

813	   Public specification: RFC xxxx.

815	   Additional information for storage mode:
816	      Magic number: #!SMV\n
817	      File extensions: smv, SMV
818	      Macintosh file type code: none
819	      Object identifier or OID: none

821	   Intended usage: COMMON. It is expected that many VoIP applications
822	      (as well as mobile applications) will use this type.

824	   Person & email address to contact for further information:
825	      Adam Li
826	      adamli@icsl.ucla.edu

828	   Author/Change controller:
829	      Adam Li
830	      adamli@icsl.ucla.edu
831	      IETF Audio/Video Transport Working Group

833	11. Mapping to SDP Parameters

835	   Please note that this section applies to the RTP mode only.

837	   Parameters are mapped to SDP [6] as usual.
838	   Example usage in SDP:
839	     m = audio 49120 RTP/AVP 97
840	     a = rtpmap:97 EVRC
841	     a = fmtp:97 ptype=1; maxinterleave=2
842	     a = maxptime:80

844	12. Security Considerations

846	   RTP packets using the payload format defined in this specification
847	   are subject to the security considerations discussed in the RTP
848	   specification [4], and any appropriate profile (for example [5]).
849	   This implies that confidentiality of the media streams is achieved by
850	   encryption. Because the data compression used with this payload
851	   format is applied end-to-end, encryption may be performed after
852	   compression so there is no conflict between the two operations.

854	   A potential denial-of-service threat exists for data encoding using
855	   compression techniques that have non-uniform receiver-end
856	   computational load. The attacker can inject pathological datagrams
857	   into the stream which are complex to decode and cause the receiver to
858	   become overloaded. However, the encodings covered in this document do
859	   not exhibit any significant non-uniformity.

861	   As with any IP-based protocol, in some circumstances, a receiver may
862	   be overloaded simply by the receipt of too many packets, either
863	   desired or undesired. Network-layer authentication may be used to
864	   discard packets from undesired sources, but the processing cost of
865	   the authentication itself may be too high. In a multicast
866	   environment, pruning of specific sources may be implemented in
867	   future versions of IGMP [7] and in multicast routing protocols to
868	   allow a receiver to select which sources are allowed to reach it.

870	   Interleaving MAY affect encryption. Depending on the used encryption
871	   scheme there MAY be restrictions on for example the time when keys
872	   can be changed.

874	13. Adding Support of Other Frame-Based Vocoders

876	   As described above, the RTP packet format defined in this document is
877	   very flexible and designed to be usable by other frame-based
878	   vocoders.

880	   Additional vocoders using this format MUST have properties as
881	   described in Section 3.3.

883	   The following need to be done in order for any eligible vocoders to
884	   use the RTP payload format defined in this document:

886	    o Define the unit used for RTP time stamp;
887	    o Define the meaning of the Mode Request bits;
888	    o Define corresponding codec data frame type values for ToC;
889	    o Define the conversion procedure for vocoders output data frame;
890	    o Define a magic number for storage mode, and complete the
891	      corresponding MIME registration.

893	14. Acknowledgements

895	   The following authors have made significant contributions to this
896	   document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
897	   Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
898	   Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
899	   Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
900	   Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
901	   Sherwood, and Thomas Zeng.

903	15. References

905	   [1]  3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
906	        Option 3 for Wideband Spread Spectrum Digital Systems", January
907	        1997.

909	   [2]  3GPP2 C.S0030, "Selectable Mode Vocoder", August 2001.

911	   [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
912	        Levels", BCP 14, RFC 2119, March 1997.

914	   [4]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
915	        "RTP:  A Transport Protocol for Real-Time Applications", RFC
916	        1889, January 1996.

918	   [5]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
919	        with Minimal Control", RFC 1890, January 1996.

921	   [6]  M. Handley and V. Jacobson, "SDP: Session Description Protocol",
922	        RFC 2327, April 1998.

924	   [7]  Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
925	        1112, August 1989.

927	16. Authors' Address

929	   The editor will serve as the point of contact for technical issues.

931	   Adam H. Li
932	   Image Communication Lab
933	   Electrical Engineering Department
934	   University of California
935	   Los Angeles, CA 90095
936	   USA
937	   Phone: +1 310 825 5178
938	   Email: adamli@icsl.ucla.edu