idnits 2.17.1 

draft-ietf-avt-evrc-smv-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 400 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Obsolete normative reference: RFC 1889 (ref. '4') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 1890 (ref. '5') (Obsoleted by RFC 3551)

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)


     Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	      Internet Draft                                               Adam H. Li
3	      draft-ietf-avt-evrc-smv-01.txt                                     UCLA
4	      May 16, 2002                                                     Editor
5	      Expires: November 16, 2002

7	                    RTP Payload Format for EVRC and SMV Vocoders

9	      STATUS OF THIS MEMO

11	         This document is an Internet-Draft and is in full conformance with
12	         all provisions of Section 10 of RFC 2026.

14	         Internet-Drafts are working documents of the Internet Engineering
15	         Task Force (IETF), its areas, and its working groups. Note that other
16	         groups may also distribute working documents as Internet-Drafts.

18	         Internet-Drafts are draft documents valid for a maximum of six months
19	         and may be updated, replaced, or obsoleted by other documents at any
20	         time. It is inappropriate to use Internet- Drafts as reference
21	         material or to cite them other than as work in progress.

23	         The list of current Internet-Drafts can be accessed at
24	         http://www.ietf.org/ietf/1id-abstracts.txt

26	         The list of Internet-Draft Shadow Directories can be accessed at
27	         http://www.ietf.org/shadow.html.

29	      ABSTRACT

31	         This document describes the RTP payload format for Enhanced Variable
32	         Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech.
33	         Two sub-formats are specified for different application scenarios. A
34	         bundled/interleaved format is included to reduce the effect of packet
35	         loss on speech quality and amortize the overhead of the RTP header
36	         over more than one speech frame. A non-bundled format is also
37	         supported for conversational applications.

39	      Table of Contents

41	         1. Introduction ................................................... 2
42	         2. Background ..................................................... 2
43	         3. The Codecs Supported ........................................... 3
44	         3.1. EVRC ......................................................... 3
45	         3.2. SMV .......................................................... 3
46	         3.3. Other Frame-Based Vocoders ................................... 4
47	         4. RTP/Vocoder Packet Format ...................................... 4
48	         4.1. Type 1 Interleaved/Bundled Packet Format ..................... 4
49	         4.2. Type 2 Header-Free Packet Format ............................. 6
50	         4.3. Determining the Format of Packets ............................ 6
51	         5. Packet Table of Contents Entries and Codec Data Frame Format ... 7
52	         5.1. Packet Table of Contents entries ............................. 7
53	         5.2. Codec Data Frames ............................................ 8
54	         6. Interleaving Codec Data Frames in Type 1 Packets ............... 9
55	         6.1. Finding Interleave Group Boundaries ......................... 11
56	         6.2. Additional Receiver Responsibilities ........................ 11
57	         7. Bundling Codec Data Frames in Type 1 Packets .................. 11
58	         8. Handling Missing Codec Data Frames ............................ 12
59	         9. Implementation Issues ......................................... 12
60	         9.1. Interleaving Length ......................................... 12
61	         9.2. Validation of Received Packets .............................. 12
62	         10. Mode Request ................................................. 13
63	         11. Storage Mode ................................................. 13
64	         12. IANA Considerations .......................................... 14
65	         12.1. Registration of Media Type EVRC ............................ 14
66	         12.2. Registration of Media Type EVRC0 ........................... 15
67	         12.3. Registration of Media Type SMV ............................. 16
68	         12.4. Registration of Media Type SMV0 ............................ 17
69	         13. Mapping to SDP Parameters .................................... 17
70	         14. Security Considerations ...................................... 18
71	         15. Adding Support of Other Frame-Based Vocoders ................. 19
72	         16. Acknowledgements ............................................. 19
73	         17. References ................................................... 20
74	         18. Authors' Address ............................................. 20

76	      1. Introduction

78	         This document describes how speech compressed with EVRC [1] or SMV
79	         [2] may be formatted for use as an RTP payload type.  The format is
80	         also extensible to other codecs that generate a similar set of frame
81	         types. Two methods are provided to packetize the codec data frames
82	         into RTP packets: an interleaved/bundled format and a zero-header
83	         format. The sender may choose the best format for each application
84	         scenario, based on network conditions, bandwidth availability, delay
85	         requirements, and packet-loss tolerance.

87	         The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
88	         "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
89	         document are to be interpreted as described in RFC 2119 [3].

91	      2. Background

93	         The 3rd Generation Partnership Project 2 (3GPP2) has published two
94	         standards which define speech compression algorithms for CDMA
95	         applications: EVRC [1] and SMV [2]. EVRC is currently deployed in
96	         millions of first and second generation CDMA handsets. SMV is the
97	         preferred speech codec standard for CDMA2000, and will be deployed in
98	         third generation handsets in addition to EVRC. Improvements and new
99	         codecs will keep emerging as technology improves, and future handsets
100	         will likely support multiple codecs.

102	         The formats of the EVRC and SMV codec frames are very similar. Many
103	         other vocoders also share common characteristics, and have many
104	         similar application scenarios. This parallelism enables an RTP
105	         payload format to be designed for EVRC and SMV that may also support
106	         other, similar vocoders with minimal additional specification work.
107	         This can simplify the protocol for transporting vocoder data frames
108	         through RTP and reduce the complexity of implementations.

110	      3. The Codecs Supported

112	      3.1. EVRC

114	         The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20
115	         milliseconds of 8000 Hz, 16-bit sampled speech input into output
116	         frames in one of the three different sizes: Rate 1 (171 bits), Rate
117	         1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two zero
118	         bit codec frame types: null frames and erasure frames. Null frames
119	         are produced as a result of the vocoder running at rate 0. Null
120	         frames are zero bits long and are normally not transmitted. Erasure
121	         frames are the frames substituted by the receiver to the codec for
122	         the lost or damaged frames. Erasure frames are also zero bits long
123	         and are normally not transmitted.

125	         The codec chooses the output frame rate based on analysis of the
126	         input speech and the current operating mode (either normal or one of
127	         several reduced rate modes). For typical speech patterns, this
128	         results in an average output of 4.2 kilobits/second for normal mode
129	         and a lower average output for reduced rate modes.

131	      3.2. SMV

133	         The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds
134	         of 8000 Hz, 16-bit sampled speech input into output frames of one of
135	         the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate
136	         1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two zero
137	         bit codec frame types: null frames and erasure frames. Null frames
138	         are produced as a result of the vocoder running at rate 0. Null
139	         frames are zero bits long and are normally not transmitted. Erasure
140	         frames are the frames substituted by the receiver to the codec for
141	         the lost or damaged frames. Erasure frames are also zero bits long
142	         and are normally not transmitted.

144	         The SMV codec can operate in four modes. Each mode may produce frames
145	         of any of the rates (full rate to 1/8 rate) for varying percentages
146	         of time, based on the characteristics of the speech samples and the
147	         selected mode. The SMV mode can change on a frame-by-frame basis. The
148	         SMV codec does not need additional information other than the codec
149	         data frames to correctly decode the data of various modes; therefore,
150	         the mode of the encoder does not need to be transmitted with the
151	         encoded frames.

153	         The percentage of different frame rates for the four SMV modes are
154	         shown in the table below.

156	                           Mode 0       Mode 1       Mode 2        Mode 3
157	             -------------------------------------------------------------
158	             Rate 1        68.90%       38.14%       15.43%        07.49%
159	             Rate 1/2      06.03%       15.82%       38.34%        46.28%
160	             Rate 1/4      00.00%       17.37%       16.38%        16.38%
161	             Rate 1/8      25.07%       28.67%       29.85%        29.85%

163	         The SMV codec chooses the output frame rate based on an analysis of
164	         the input speech and the current operating mode. For typical speech
165	         patterns, this results in an average output of 4.2kilobits/second for
166	         Mode 0 in two way conversation (assuming 50% active speech time and
167	         50% in eighth rate while listening) and lower for other reduced rate
168	         modes.

170	         SMV is more bandwidth efficient than EVRC. EVRC is equivalent in
171	         performance to SMV mode 1.

173	      3.3. Other Frame-Based Vocoders

175	         Other frame-based vocoders can be carried in the packet format
176	         defined in this document, as long as they possess the following
177	         properties:

179	          o The codec is frame-based;
180	          o blank and erasure frames are supported;
181	          o the total number of rates is less than 17;
182	          o the maximum full rate frame can be transported in a single RTP
183	            packet using this specific format.

185	         Vocoders with the characteristics listed above can be transported
186	         using the packet format specified in this document with some
187	         additional specification work; the pieces that must be defined are
188	         listed in Section 15.

190	      4. RTP/Vocoder Packet Format

192	         In the packet format diagrams shown in this document, bit 0 is the
193	         most significant bit. The vocoder speech data MUST be transmitted in
194	         RTP packets of one of the following two types.

196	      4.1. Type 1 Interleaved/Bundled Packet Format

198	         This format is used to send one or more vocoder frames per packet.
199	         Interleaving or bundling MAY be used. The RTP packet for this format
200	         is as follows:

202	          0                   1                   2                   3
203	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
204	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
205	         |                      RTP Header [4]                           |
206	         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
207	         |R|R| LLL | NNN | FFF |  Count  |  TOC  |  ...  |  TOC  |padding|
208	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
209	         |        one or more codec data frames, one per TOC entry       |
210	         |                             ....                              |
211	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

213	         The RTP header has the expected values as described in the RTP
214	         specification [4]. The RTP timestamp is in 1/8000 of a second units
215	         for EVRC and SMV. For any other vocoders that use this packet format,
216	         the timestamp unit needs to be defined explicitly. The M bit should
217	         be set as specified in the applicable RTP profile, for example, RFC
218	         1890 [5]. Note that RFC 1890 [5] specifies that if the sender does
219	         not suppress silence, the M bit will always be zero. When multiple
220	         codec data frames are present in a single RTP packet, the timestamp
221	         is that of the oldest data represented in the RTP packet. The
222	         assignment of an RTP payload type for this new packet format is
223	         outside the scope of this document; it is specified by the RTP
224	         profile under which this payload format is used.
225	         The first octet of a Type 1 Interleaved/Bundled format packet is the
226	         Interleave Octet. The second octet contains the Mode Request and
227	         Frame Count fields. The Table of Contents (ToC) field then follows.
228	         The fields are specified as follows:

230	         Reserved (RR): 2 bits
231	            Reserved bits. MUST be set to zero by sender, SHOULD be ignored
232	            by receiver.

234	         Interleave Length (LLL): 3 bits
235	            Indicates the length of interleave; a value of 0 indicates
236	            bundling, a special case of interleaving. See Section 6 and
237	            Section 7 for more detailed discussion.

239	         Interleave Index (NNN): 3 bits
240	            Indicates the index within an interleave group. MUST have a value
241	            less than or equal to the value of LLL. Values of NNN greater
242	            than the value of LLL are invalid. Packet with invalid NNN values
243	            SHOULD be ignored by the receiver.

245	         Mode Request (FFF): 3 bits
246	            The Mode Request field is used to signal Mode Request
247	            information. See Section 10 for details.

249	         Frame Count (Count): 5 bits
250	            The number of ToC fields (and vocoder frames) present in the
251	            packet is the value of the frame count field plus one. A value of
252	            zero indicates that the packet contains one ToC field, while a
253	            value of 31 indicates that the packet contains 32 ToC fields.

255	         Padding (padding): 0 or 4 bits
256	            This padding ensures that codec data frames start on an octet
257	            boundary. When the frame count is odd, the sender MUST add 4 bits
258	            of padding following the last TOC. When the frame count is even,
259	            the sender MUST NOT add padding bits. If padding is present, the
260	            padding bits MUST be set to zero by sender, and SHOULD be ignored
261	            by receiver.

263	         The Table of Contents field (ToC) provides information on the codec
264	         data frame(s) in the packet. There is one ToC entry for each codec
265	         data frame. The detailed formats of the ToC field and codec data
266	         frames are specified in Section 5.

268	         Multiple data frames may be included within a Type 1
269	         Interleaved/Bundled packet using interleaving or bundling as
270	         described in Section 6 and Section 7.

272	      4.2. Type 2 Header-Free Packet Format

274	         The Type 2 Header-Free Packet Format is designed for maximum
275	         bandwidth efficiency and low latency. Only one codec data frame can
276	         be sent in each Type 2 Header-Free format packet. None of the payload
277	         header fields (LLL, NNN, FFF, Count) nor ToC entries are present. The
278	         codec rate for the data frame can be determined from the length of
279	         the codec data frame, since there is only one codec data frame in
280	         each Type 2 Header-Free packet.

282	         Use of the RTP header fields for Type 2 Header-Free RTP/Vocoder
283	         Packet Format is the same as described in Section 4.1 for Type 1
284	         Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format of
285	         the codec data frame is specified in Section 5.

287	          0                   1                   2                   3
288	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
289	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
290	         |                      RTP Header [4]                           |
291	         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
292	         |                                                               |
293	         +          ONLY one codec data frame            +-+-+-+-+-+-+-+-+
294	         |                                               |
295	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

297	      4.3. Determining the Format of Packets

299	         All receivers SHOULD be able to process both types of packets. The
300	         sender MAY choose to use one or both types of packets.

302	         A receiver MUST have prior knowledge of the packet type to correctly
303	         decode the RTP packets. The packet types used in an RTP session MUST
304	         be specified by the sender, and signaled through out-of-band means,
305	         for example by SDP during the setup of a session.

307	         When packets of both formats are used within the same session,
308	         different RTP payload type values MUST be used for each format to
309	         distinguish the packet formats. The association of payload type
310	         number with the packet format is done out-of-band, for example by SDP
311	         during the setup of a session.

313	      5. Packet Table of Contents Entries and Codec Data Frame Format

315	      5.1. Packet Table of Contents entries

317	         Each codec data frame in a Type 1 Interleaved/Bundled packet has a
318	         corresponding Table of Contents (ToC) entry. The ToC entry indicates
319	         the rate of the codec frame. (Type 2 Header-Free packets MUST NOT
320	         have a ToC field, and there is always only one codec data frame in
321	         each Type 2 Header-Free packet.)

323	         Each ToC entry is occupies four bits. The format of the bits is
324	         indicated below:

326	             0 1 2 3
327	            +-+-+-+-+
328	            |fr type|
329	            +-+-+-+-+

331	         Frame Type: 4 bits
332	            The frame type indicates the type of the corresponding codec data
333	            frame in the RTP packet.

335	         For EVRC and SMV codecs, the frame type values and size of the
336	         associated codec data frame are described in the table below:

338	         Value   Rate      Total codec data frame size (in octets)
339	         ---------------------------------------------------------
340	           0     Blank      0    (0 bit)
341	           1     1/8        2    (16 bits)
342	           2     1/4        5    (40 bits; not valid for EVRC)
343	           3     1/2       10    (80 bits)
344	           4     1         22    (171 bits; 5 padded at end with zeros)
345	           5     Erasure    0    (SHOULD NOT be transmitted by sender)

347	         All values not listed in the above table MUST be considered reserved.
348	         A ToC entry with a reserved Frame Type value SHOULD be considered
349	         invalid. Note that the EVRC codec does not have 1/4 rate frames, thus
350	         frame type value 2 MUST be considered a reserved value when the EVRC
351	         codec is in use.

353	         Other vocoders that use this packet format need to specify their own
354	         table of frame types and corresponding codec data frames.

356	      5.2. Codec Data Frames

358	         The output of the vocoder MUST be converted into codec data frames
359	         for inclusion in the RTP payload. The conversions for EVRC and SMV
360	         codecs are specified below. (Note: Because the EVRC codec does not
361	         have Rate 1/4 frames, the specifications of 1/4 frames does not apply
362	         to EVRC codec data frames). Other vocoders that use this packet
363	         format need to specify how to convert vocoder output data into
364	         frames.

366	         The codec output data bits as numbered in EVRC and SMV are packed
367	         into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2,
368	         Rate 1/4 and Rate 1/8) is placed in the most significant bit
369	         (internet bit 0) of octet 1 of the codec data frame, the second
370	         lowest bit is placed in the second most significant bit of the first
371	         octet, the third lowest in the third most significant bit of the
372	         first octet, and so on. This continues until all of the bits have
373	         been placed in the codec data frame.

375	         The remaining unused bits of the last octet of the codec data frame
376	         MUST be set to zero. Note that in EVRC and SMV this is only
377	         applicable to Rate 1 frames (171 bits) as the Rate 1/2 (80 bits),
378	         Rate 1/4 (40 bits, SMV only) and Rate 1/8 frames (16 bits) fit
379	         exactly into a whole number of octets.

381	         Following is a detailed listing showing a Rate 1 EVRC/SMV codec
382	         output frame converted into a codec data frame:

384	         The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets
385	         long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are
386	         placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV
387	         codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly,
388	         but do not require zero padding because they align on octet
389	         boundaries.

391	                              Rate 1 codec data frame
392	          0                   1                   2                   3
393	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
394	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
395	         |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
396	         |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
397	         |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
398	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399	         :                                                               :
400	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401	         |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
402	         |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
403	         |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
404	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

406	      6. Interleaving Codec Data Frames in Type 1 Packets

408	         As indicated in Section 4.1, more than one codec data frame MAY be
409	         included in a single Type 1 Interleaved/Bundled packet by a sender.
410	         This is accomplished by interleaving or bundling.

412	         Bundling is used to spread the transmission overhead of the RTP and
413	         payload header over multiple vocoder frames. Interleaving
414	         additionally reduces the listener's perception of data loss by
415	         spreading such loss over non-consecutive vocoder frames. EVRC, SMV,
416	         and similar vocoders are able to compensate for an occasional lost
417	         frame, but speech quality degrades exponentially with consecutive
418	         frame loss.

420	         Bundling is signaled by setting the LLL field to zero and the Count
421	         field to greater than zero. Interleaving is indicated by setting the
422	         LLL field to a value greater than zero.

424	         The discussions on general interleaving apply to the bundling (which
425	         can be viewed as a reduced case of interleaving) with reduced
426	         complexity. The bundling case is discussed in detail in Section 7.

428	         Senders MAY support interleaving and/or bundling. All receivers MUST
429	         support interleaving and bundling.

431	         Given a time-ordered sequence of output frames from the codec
432	         numbered 0..n, a bundling value B (the value in the Count field plus
433	         one), and an interleave length L where n = B * (L+1) - 1, the output
434	         frames are placed into RTP packets as follows (the values of the
435	         fields LLL and NNN are indicated for each RTP packet):

437	         First RTP Packet in Interleave group:
438	            LLL=L, NNN=0
439	            Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of
440	            B frames

442	         Second RTP Packet in Interleave group:
443	            LLL=L, NNN=1
444	            Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
445	            total of B frames

447	         This continues to the last RTP packet in the interleave group:

449	         L+1 RTP Packet in Interleave group:
450	            LLL=L, NNN=L
451	            Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a
452	            total of B frames

454	         Within each interleave group, the RTP packets making up the
455	         interleave group MUST be transmitted in value-increasing order of the
456	         NNN field. While this does not guarantee reduced end-to-end delay on
457	         the receiving end, when packets are delivered in order by the
458	         underlying transport, delay will be reduced to the minimum possible.

460	         Receivers MAY signal the maximum number of codec data frames (i.e.,
461	         the maximum acceptable bundling value B) they can handle in a single
462	         RTP packet using the OPTIONAL maxptime RTP mode parameter identified
463	         in Section 12.

465	         Receivers MAY signal the maximum interleave length (i.e., the maximum
466	         acceptable LLL value in the Interleaving Octet) they will accept
467	         using the OPTIONAL maxinterleave RTP mode parameter identified in
468	         Section 12.

470	         The parameters maxptime and maxinterleave are exchanged at the
471	         initial setup of the session. In one-to-one sessions, the sender MUST
472	         respect these values set be the receiver, and MUST NOT
473	         interleave/bundle more packets than what the receiver signals that it
474	         can handle. This ensures that the receiver can allocate a known
475	         amount of buffer space that will be sufficient for all
476	         interleaving/bundling used in that session. During the session, the
477	         sender may decrease the bundling value or interleaving length (so
478	         that less buffer space is required at the receiver), but never exceed
479	         the maximum value set by the receiver. This prevents the situation
480	         where a receiver needs to allocate more buffer space in the middle of
481	         a session but is unable to do so.

483	         Additionally, senders have the following restrictions:

485	         o  MUST NOT bundle more codec data frames in a single RTP packet than
486	            indicated by maxptime (see Section 12) if it is signaled.

488	         o  SHOULD NOT bundle more codec data frames in a single RTP packet
489	            than will fit in the MTU of the underlying network.

491	         o  Once beginning a session with a given maximum interleaving value
492	            set by maxinterleave in Section 12, MUST NOT increase the
493	            interleaving value (LLL) to exceed the maximum interleaving value
494	            that is signaled.

496	         o  MAY change the interleaving value, but MUST do so only between
497	            interleave groups.

499	         o  Silence suppression MAY only be used between interleave groups. A
500	            ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used
501	            within interleaving groups if the codec outputs a blank frame.
502	            The M bits in the RTP header is not set for these blank frames,
503	            as the stream is continuous in time. Because there is only one
504	            time stamp for each RTP packet, silence suppression used within
505	            an interleave group would cause ambiguities when reconstructing
506	            the speech at the receiver side, and thus is prohibited.

508	      6.1. Finding Interleave Group Boundaries

510	         Given an RTP packet with sequence number S, interleave length (field
511	         LLL) L, interleave index value (field NNN) N, and bundling value B,
512	         the interleave group consists of this RTP packet and other RTP
513	         packets with sequence numbers from S-N mod 65536 to S-N+L mod 65536
514	         inclusive. In other words, the interleave group always consists of
515	         L+1 RTP packets with sequential sequence numbers. The bundling value
516	         for all RTP packets in an interleave group MUST be the same.

518	         The receiver determines the expected bundling value for all RTP
519	         packets in an interleave group by the number of codec data frames
520	         bundled in the first RTP packet of the interleave group received.
521	         Note that this may not be the first RTP packet of the interleave
522	         group if packets are delivered out of order by the underlying
523	         transport.

525	      6.2. Additional Receiver Responsibilities

527	         Assume that the receiver has begun playing frames from an interleave
528	         group. The time has come to play frame x from packet n of the
529	         interleave group. Further assume that packet n of the interleave
530	         group has not been received. As described in Section 8, an erasure
531	         frame will be sent to the receiving vocoder.

533	         Now, assume that packet n of the interleave group arrives before
534	         frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of
535	         the newly received packet n rather than substituting an erasure
536	         frame. In other words, just because packet n was not available the
537	         first time it was needed to reconstruct the interleaved speech, the
538	         receiver SHOULD NOT assume it is not available when it is
539	         subsequently needed for interleaved speech reconstruction.

541	      7. Bundling Codec Data Frames in Type 1 Packets

543	         As discussed in Section 6, the bundling of codec data frames is a
544	         special reduced case of interleaving with LLL value in the Interleave
545	         Octet set to 0.

547	         Bundling codec data frames indicates multiple data frames are
548	         included consecutively in a packet, because the interleaving length
549	         (LLL) is 0. The interleaving group is thus reduced to a single RTP
550	         packet, and the reconstruction of the code data frames from RTP
551	         packets becomes a much simpler process.

553	         Furthermore, the additional restrictions on senders are reduced to:

555	         o  MUST NOT bundle more codec data frames in a single RTP packet than
556	            indicated by maxptime (see Section 12) if it is signaled.

558	         o  SHOULD NOT bundle more codec data frames in a single RTP packet
559	            than will fit in the MTU of the underlying network.

561	      8. Handling Missing Codec Data Frames

563	         The vocoders covered by this payload format support erasure frame as
564	         an indication when frames are not available. The erasure frames are
565	         normally used internally by a receiver to advance the state of the
566	         voice decoder by exactly one frame time for each missing frame. Using
567	         the information from packet sequence number, time stamp, and the M
568	         bit, the receiver can detect missing codec data frames from RTP
569	         packet loss and/or silence suppression, and generate corresponding
570	         erasure frames. Erasure frames MUST also be used in storage mode to
571	         record missing frames.

573	      9. Implementation Issues

575	      9.1. Interleaving Length

577	         The vocoder interpolates the missing speech content when given an
578	         erasure frame. However, the best quality is perceived by the listener
579	         when erasure frames are not consecutive. This makes interleaving
580	         desirable as it increases speech quality when packet loss occurs.

582	         On the other hand, interleaving can greatly increase the end-to-end
583	         delay. Where an interactive session is desired, either Type 1
584	         Interleaved/Bundled with interleaving length (field LLL) 0 or Type 2
585	         Header-Free RTP payload types are RECOMMENDED.

587	         When end-to-end delay is not a primary concern, an interleaving
588	         length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable
589	         compromise between robustness and latency.

591	      9.2. Validation of Received Packets

593	         When receiving an RTP packet, the receiver SHOULD check the validity
594	         of the ToC fields and match the length of the packet with what is
595	         indicated by the ToC fields. If any invalidity or mismatch is
596	         detected, it is RECOMMENDED to discard the received packet to avoid
597	         potential severe degradation of the speech quality. The discarded
598	         packet is treated following the same procedure as a lost packet, and
599	         the discarded data will be replaced with erasure frames.

601	         On receipt of an RTP packet with an invalid value of the LLL or NNN
602	         fields, the RTP packet SHOULD be treated as lost by the receiver for
603	         the purpose of generating erasure frames as described in Section 8.

605	         On receipt of an RTP packet in an interleave group with other than
606	         the expected frame count value, the receiver MAY discard codec data
607	         frames off the end of the RTP packet or add erasure codec data frames
608	         to the end of the packet in order to manufacture a substitute packet
609	         with the expected bundling value.  The receiver MAY instead choose to
610	         discard the whole interleave group.

612	      10. Mode Request

614	         The Mode Request signal requests a particular encoding mode for the
615	         speech encoding in the reverse direction. All implementations are
616	         RECOMMENDED to honor the Mode Request signal. The Mode Request signal
617	         SHOULD only be used in one-to-one sessions. In multiparty sessions,
618	         any received Mode Request signals SHOULD be ignored.

620	         In addition, the Mode Request signal MAY also be sent through non-RTP
621	         means, which is out of the scope of this specification.

623	         The three-bit Mode Request field is used to signal the receiver to
624	         set a particular encoding mode to its audio encoder. If the Mode
625	         Request field is set to a non-zero value in RTP packets from node A
626	         to node B, it is a request for node B to change to the requested
627	         encoding mode for its audio encoder and therefore the bit rate of the
628	         RTP stream from node B to node A. Once a node sets this field to a
629	         non-zero value it SHOULD continue to set the field to the same value
630	         in subsequent packets until the requested mode has changed. This
631	         design helps to eliminate the scenario of getting the codec stuck in
632	         an unintended state if one of the packets that carries the Mode
633	         Request is lost. An otherwise silent node MAY send an RTP packet
634	         containing a blank frame in order to send a Mode Request.

636	         Each codec type using this format SHOULD define its own
637	         interpretation of the Mode Request field. Codecs SHOULD follow the
638	         convention that higher values of the three-bit field correspond to an
639	         equal or lower average output bit rate.

641	         For the EVRC codec, the Mode Request field MUST be interpreted
642	         according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
643	         specifications [1].  Values above '100' (4) are currently reserved.
644	         If an unknown value above '100' (4) is received, it MUST be handled
645	         as if '100' (4) were received, for interoperability with potential
646	         future revisions.

648	         For SMV codec, the Mode Request field MUST be interpreted according
649	         to Table 2.2-2 of the SMV codec specifications [2]. Values above
650	         '101' (5) are currently reserved. If an unknown value above '101' (5)
651	         is received, it MUST be handled as if '101' (5) were received, also
652	         for interoperability with potential future revisions.

654	      11. Storage Mode

656	         The storage mode is used for storing speech frames, e.g., as a file
657	         or e-mail attachment.

659	         The file begins with a magic number to identify the vocoder that is
660	         used. The magic number for EVRC corresponds to the ASCII character
661	         string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A" in
662	         network byte order. The magic number for SMV corresponds to the ASCII
663	         character string "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a" in
664	         network byte order.

666	         The codec data frames are stored in consecutive order, with a single
667	         TOC entry field, extended to one octet, prefixing each codec data
668	         frame. The ToC field is extended to one octet by setting the four
669	         most significant bits of the octet to zero. For example, a ToC value
670	         of 4 (a full-rate frame) is stored as 0x04.

672	         Speech frames lost in transmission and non-received frames MUST be
673	         stored as erasure frames (frame type 5, see definition in Section
674	         5.1) to maintain synchronization with the original media.

676	      12. IANA Considerations

678	         Two new MIME sub-types as described in this section are to be
679	         registered.

681	         The MIME-names for the EVRC and SMV codec are allocated from the IETF
682	         tree since all the vocoders covered are expected to be widely used
683	         for Voice-over-IP applications.

685	      12.1. Registration of Media Type EVRC

687	         Media Type Name:           audio

689	         Media Subtype Name:        EVRC
690	            Type 1 Interleaved/Bundled packet format for EVRC

692	            Required Parameter:         none

694	         Optional parameters:
695	            The following parameter applies to RTP mode only.

697	            ptime:    Defined as usual for RTP audio [6].

699	            maxptime: The maximum amount of media which can be encapsulated
700	               in each packet, expressed as time in milliseconds. The time
701	               SHALL be calculated as the sum of the time the media present
702	               in the packet represents. The time SHOULD be a multiple of the
703	               duration of a single codec data frame (20 msec). If not
704	               signaled, the default maxptime value SHALL be 200
705	               milliseconds.

707	            maxinterleave: Maximum number for interleaving length (field LLL
708	               in the Interleaving Octet). The interleaving lengths used in
709	               the entire session MUST NOT exceed this maximum value. If not
710	               signaled, the maxinterleave length SHALL be 5.

712	         Encoding considerations:
713	            For RTP mode, see Section 6 and Section 7 of RFC xxxx.
714	            For storage mode, see Section 11 of RFC xxxx.

716	         Security considerations:
717	            See Section 14 "Security Considerations" of RFC xxxx.

719	         Public specification:
720	            RFC xxxx.

722	         Additional information:
723	            The following information applies for storage mode only.

725	            Magic number: #!EVRC\n
726	            File extensions: evc, EVC
727	            Macintosh file type code: none
728	            Object identifier or OID: none

730	         Intended usage:
731	            COMMON. It is expected that many VoIP applications (as well as
732	            mobile applications) will use this type.

734	         Person & email address to contact for further information:
735	            Adam Li
736	            adamli@icsl.ucla.edu

738	         Author/Change controller:
739	            Adam Li
740	            adamli@icsl.ucla.edu
741	            IETF Audio/Video Transport Working Group

743	      12.2. Registration of Media Type EVRC0

745	         Media Type Name:           audio

747	         Media Subtype Name:        EVRC0
748	            Type 2 Header-Free packet format for EVRC

750	         Required Parameter:       none

752	         Optional parameters:       none

754	         Encoding considerations:  none

756	         Security considerations:
757	            See Section 14 "Security Considerations" of RFC xxxx.

759	         Public specification:
760	            RFC xxxx.

762	         Additional information:   none

764	         Intended usage:
765	            COMMON. It is expected that many VoIP applications (as well as
766	            mobile applications) will use this type.

768	         Person & email address to contact for further information:
769	            Adam Li
770	            adamli@icsl.ucla.edu

772	         Author/Change controller:
773	            Adam Li
774	            adamli@icsl.ucla.edu
775	            IETF Audio/Video Transport Working Group

777	      12.3. Registration of Media Type SMV

779	         Media Type Name:           audio

781	         Media Subtype Name:        SMV
782	            Type 1 Interleaved/Bundled packet format for SMV

784	            Required Parameter:         none

786	         Optional parameters:
787	            The following parameter applies to RTP mode only.

789	            ptime:    Defined as usual for RTP audio [6].

791	            maxptime: The maximum amount of media which can be encapsulated
792	               in each packet, expressed as time in milliseconds. The time
793	               SHALL be calculated as the sum of the time the media present
794	               in the packet represents. The time SHOULD be a multiple of the
795	               duration of a single codec data frame (20 msec). If not
796	               signaled, the default maxptime value SHALL be 200
797	               milliseconds.

799	            maxinterleave: Maximum number for interleaving length (field LLL
800	               in the Interleaving Octet). The interleaving lengths used in
801	               the entire session MUST NOT exceed this maximum value. If not
802	               signaled, the maxinterleave length SHALL be 5.

804	         Encoding considerations:
805	            For RTP mode, see Section 6 and Section 7 of RFC xxxx.
806	            For storage mode, see Section 11 of RFC xxxx.

808	         Security considerations:
809	            See Section 14 "Security Considerations" of RFC xxxx.

811	         Public specification:
812	            RFC xxxx.

814	         Additional information:
815	            The following information applies to storage mode only.

817	            Magic number: #!SMV\n
818	            File extensions: smv, SMV
819	            Macintosh file type code: none
820	            Object identifier or OID: none

822	         Intended usage:
823	            COMMON. It is expected that many VoIP applications (as well as
824	            mobile applications) will use this type.

826	         Person & email address to contact for further information:
827	            Adam Li
828	            adamli@icsl.ucla.edu

830	         Author/Change controller:
831	            Adam Li
832	            adamli@icsl.ucla.edu
833	            IETF Audio/Video Transport Working Group

835	      12.4. Registration of Media Type SMV0

837	         Media Type Name:           audio

839	         Media Subtype Name:        SMV0
840	            Type 2 Header-Free packet format for SMV

842	         Required Parameter:        none

844	         Optional parameters:       none

846	         Encoding considerations:  none

848	         Security considerations:
849	            See Section 14 "Security Considerations" of RFC xxxx.

851	         Public specification:
852	            RFC xxxx.

854	         Additional information:   none

856	         Intended usage:
857	            COMMON. It is expected that many VoIP applications (as well as
858	            mobile applications) will use this type.

860	         Person & email address to contact for further information:
861	            Adam Li
862	            adamli@icsl.ucla.edu

864	         Author/Change controller:
865	            Adam Li
866	            adamli@icsl.ucla.edu
867	            IETF Audio/Video Transport Working Group

869	      13. Mapping to SDP Parameters

871	         Please note that this section applies to the RTP mode only.

873	         The information carried in the MIME media type specification has a
874	         specific mapping to fields in the Session Description Protocol (SDP)
875	         [6], which is commonly used to describe RTP sessions. When SDP is
876	         used to specify sessions employing the EVRC or EMV codec, the mapping
877	         is as follows:

879	            o The MIME type ("audio") goes in SDP "m=" as the media name.

881	            o The MIME subtype (payload format name) goes in SDP "a=rtpmap"
882	              as the encoding name.

884	            o The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
885	              and "a=maxptime" attributes, respectively.

887	            o Any remaining parameters go in the SDP "a=fmtp" attribute by
888	              copying them directly from the MIME media type string as a
889	              semicolon separated list of parameter=value pairs.

891	         Some examples of SDP session descriptions for EVRC and SMV encodings
892	         follow below.

894	         Example of usage of EVRC:

896	           m = audio 49120 RTP/AVP 97
897	           a = rtpmap:97 EVRC
898	           a = fmtp:97 maxinterleave=2
899	           a = maxptime:80

901	         Example of usage of SMV

903	           m = audio 49122 RTP/AVP 99
904	           a = rtpmap:99 SMV0
905	           a = fmtp:99

907	         Note that the payload format (encoding) names are commonly shown in
908	         upper case. MIME subtypes are commonly shown in lower case. These
909	         names are case-insensitive in both places. Similarly, parameter names
910	         are case-insensitive both in MIME types and in the default mapping to
911	         the SDP a=fmtp attribute.

913	      14. Security Considerations

915	         RTP packets using the payload format defined in this specification
916	         are subject to the security considerations discussed in the RTP
917	         specification [4], and any appropriate profile (for example [5]).
918	         This implies that confidentiality of the media streams is achieved by
919	         encryption. Because the data compression used with this payload
920	         format is applied end-to-end, encryption may be performed after
921	         compression so there is no conflict between the two operations.

923	         A potential denial-of-service threat exists for data encoding using
924	         compression techniques that have non-uniform receiver-end
925	         computational load. The attacker can inject pathological datagrams
926	         into the stream which are complex to decode and cause the receiver to
927	         become overloaded. However, the encodings covered in this document do
928	         not exhibit any significant non-uniformity.

930	         As with any IP-based protocol, in some circumstances, a receiver may
931	         be overloaded simply by the receipt of too many packets, either
932	         desired or undesired. Network-layer authentication may be used to
933	         discard packets from undesired sources, but the processing cost of
934	         the authentication itself may be too high. In a multicast
935	         environment, pruning of specific sources may be implemented in
936	         future versions of IGMP [7] and in multicast routing protocols to
937	         allow a receiver to select which sources are allowed to reach it.

939	         Interleaving MAY affect encryption. Depending on the used encryption
940	         scheme there MAY be restrictions on for example the time when keys
941	         can be changed. Specifically, the key change may need to occur at the
942	         boundary between interleave groups.

944	      15. Adding Support of Other Frame-Based Vocoders

946	         As described above, the RTP packet format defined in this document is
947	         very flexible and designed to be usable by other frame-based
948	         vocoders.

950	         Additional vocoders using this format MUST have properties as
951	         described in Section 3.3.

953	         For an eligible vocoder to use the payload format mechanisms defined
954	         in this document, a new RTP payload format document needs to be
955	         published as an RFC. That document can simply refer to this document
956	         and then specify the following parameters:

958	          o Define the unit used for RTP time stamp;
959	          o Define the meaning of the Mode Request bits;
960	          o Define corresponding codec data frame type values for ToC;
961	          o Define the conversion procedure for vocoders output data frame;
962	          o Define a magic number for storage mode, and complete the
963	            corresponding MIME registration.

965	      16. Acknowledgements

967	         The following authors have made significant contributions to this
968	         document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
969	         Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
970	         Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
971	         Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
972	         Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
973	         Sherwood, and Thomas Zeng.

975	      17. References

977	         [1]  3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
978	              Option 3 for Wideband Spread Spectrum Digital Systems", January
979	              1997.

981	         [2]  C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option for
982	              Wideband Spread Spectrum Communication Systems", May 2002.

984	         [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
985	              Levels", BCP 14, RFC 2119, March 1997.

987	         [4]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
988	              "RTP:  A Transport Protocol for Real-Time Applications", RFC
989	              1889, January 1996.

991	         [5]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
992	              with Minimal Control", RFC 1890, January 1996.

994	         [6]  M. Handley and V. Jacobson, "SDP: Session Description Protocol",
995	              RFC 2327, April 1998.

997	         [7]  Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
998	              1112, August 1989.

1000	      18. Authors' Address

1002	         The editor will serve as the point of contact for technical issues.

1004	         Adam H. Li
1005	         Image Communication Lab
1006	         Electrical Engineering Department
1007	         University of California
1008	         Los Angeles, CA 90095
1009	         USA
1010	         Phone: +1 310 825 5178
1011	         Email: adamli@icsl.ucla.edu