idnits 2.17.1 

draft-ietf-avt-evrc-smv-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == There are 2 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 402 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Obsolete normative reference: RFC 1889 (ref. '4') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 1890 (ref. '5') (Obsoleted by RFC 3551)

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)


     Summary: 7 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	      Internet Draft                                               Adam H. Li
3	      draft-ietf-avt-evrc-smv-02.txt                                     UCLA
4	      June 7, 2002                                                     Editor
5	      Expires: December 7, 2002

7	          RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and
8	                           Selectable Mode Vocoders (SMV)

10	      STATUS OF THIS MEMO

12	         This document is an Internet-Draft and is in full conformance with
13	         all provisions of Section 10 of RFC 2026.

15	         Internet-Drafts are working documents of the Internet Engineering
16	         Task Force (IETF), its areas, and its working groups. Note that other
17	         groups may also distribute working documents as Internet-Drafts.

19	         Internet-Drafts are draft documents valid for a maximum of six months
20	         and may be updated, replaced, or obsoleted by other documents at any
21	         time. It is inappropriate to use Internet- Drafts as reference
22	         material or to cite them other than as work in progress.

24	         The list of current Internet-Drafts can be accessed at
25	         http://www.ietf.org/ietf/1id-abstracts.txt

27	         The list of Internet-Draft Shadow Directories can be accessed at
28	         http://www.ietf.org/shadow.html.

30	      ABSTRACT

32	         This document describes the RTP payload format for Enhanced Variable
33	         Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech.
34	         Two sub-formats are specified for different application scenarios. A
35	         bundled/interleaved format is included to reduce the effect of packet
36	         loss on speech quality and amortize the overhead of the RTP header
37	         over more than one speech frame. A non-bundled format is also
38	         supported for conversational applications.

40	      Table of Contents

42	         1. Introduction ................................................... 2
43	         2. Background ..................................................... 2
44	         3. The Codecs Supported ........................................... 3
45	         3.1. EVRC ......................................................... 3
46	         3.2. SMV .......................................................... 3
47	         3.3. Other Frame-Based Vocoders ................................... 4
48	         4. RTP/Vocoder Packet Format ...................................... 4
49	         4.1. Interleaved/Bundled Packet Format ............................ 4
50	         4.2. Header-Free Packet Format .................................... 6
51	         4.3. Determining the Format of Packets ............................ 6
52	         5. Packet Table of Contents Entries and Codec Data Frame Format ... 7
53	         5.1. Packet Table of Contents entries ............................. 7
54	         5.2. Codec Data Frames ............................................ 7
55	         6. Interleaving Codec Data Frames ................................. 8
56	         7. Bundling Codec Data Frames .................................... 11
57	         8. Handling Missing Codec Data Frames ............................ 11
58	         9. Implementation Issues ......................................... 11
59	         9.1. Interleaving Length ......................................... 11
60	         9.2. Validation of Received Packets .............................. 12
61	         9.3. Processing the Late Packets ................................. 12
62	         10. Mode Request ................................................. 12
63	         11. Storage Format ............................................... 13
64	         12. IANA Considerations .......................................... 14
65	         12.1. Registration of Media Type EVRC ............................ 14
66	         12.2. Registration of Media Type EVRC0 ........................... 15
67	         12.3. Registration of Media Type SMV ............................. 16
68	         12.4. Registration of Media Type SMV0 ............................ 17
69	         13. Mapping to SDP Parameters .................................... 18
70	         14. Security Considerations ...................................... 18
71	         15. Adding Support of Other Frame-Based Vocoders ................. 19
72	         16. Acknowledgements ............................................. 19
73	         17. References ................................................... 20
74	         18. Authors' Address ............................................. 20

76	      1. Introduction

78	         This document describes how speech compressed with EVRC [1] or SMV
79	         [2] may be formatted for use as an RTP payload type.  The format is
80	         also extensible to other codecs that generate a similar set of frame
81	         types. Two methods are provided to packetize the codec data frames
82	         into RTP packets: an interleaved/bundled format and a zero-header
83	         format. The sender may choose the best format for each application
84	         scenario, based on network conditions, bandwidth availability, delay
85	         requirements, and packet-loss tolerance.

87	         The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
88	         "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
89	         document are to be interpreted as described in RFC 2119 [3].

91	      2. Background

93	         The 3rd Generation Partnership Project 2 (3GPP2) has published two
94	         standards which define speech compression algorithms for CDMA
95	         applications: EVRC [1] and SMV [2]. EVRC is currently deployed in
96	         millions of first and second generation CDMA handsets. SMV is the
97	         preferred speech codec standard for CDMA2000, and will be deployed in
98	         third generation handsets in addition to EVRC. Improvements and new
99	         codecs will keep emerging as technology improves, and future handsets
100	         will likely support multiple codecs.

102	         The formats of the EVRC and SMV codec frames are very similar. Many
103	         other vocoders also share common characteristics, and have many
104	         similar application scenarios. This parallelism enables an RTP
105	         payload format to be designed for EVRC and SMV that may also support
106	         other, similar vocoders with minimal additional specification work.
107	         This can simplify the protocol for transporting vocoder data frames
108	         through RTP and reduce the complexity of implementations.

110	      3. The Codecs Supported

112	      3.1. EVRC

114	         The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20
115	         milliseconds of 8000 Hz, 16-bit sampled speech input into output
116	         frames in one of the three different sizes: Rate 1 (171 bits), Rate
117	         1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two zero
118	         bit codec frame types: null frames and erasure frames. Null frames
119	         are produced as a result of the vocoder running at rate 0. Null
120	         frames are zero bits long and are normally not transmitted. Erasure
121	         frames are the frames substituted by the receiver to the codec for
122	         the lost or damaged frames. Erasure frames are also zero bits long
123	         and are normally not transmitted.

125	         The codec chooses the output frame rate based on analysis of the
126	         input speech and the current operating mode (either normal or one of
127	         several reduced rate modes). For typical speech patterns, this
128	         results in an average output of 4.2 kilobits/second for normal mode
129	         and a lower average output for reduced rate modes.

131	      3.2. SMV

133	         The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds
134	         of 8000 Hz, 16-bit sampled speech input into output frames of one of
135	         the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate
136	         1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two zero
137	         bit codec frame types: null frames and erasure frames. Null frames
138	         are produced as a result of the vocoder running at rate 0. Null
139	         frames are zero bits long and are normally not transmitted. Erasure
140	         frames are the frames substituted by the receiver to the codec for
141	         the lost or damaged frames. Erasure frames are also zero bits long
142	         and are normally not transmitted.

144	         The SMV codec can operate in four modes. Each mode may produce frames
145	         of any of the rates (full rate to 1/8 rate) for varying percentages
146	         of time, based on the characteristics of the speech samples and the
147	         selected mode. The SMV mode can change on a frame-by-frame basis. The
148	         SMV codec does not need additional information other than the codec
149	         data frames to correctly decode the data of various modes; therefore,
150	         the mode of the encoder does not need to be transmitted with the
151	         encoded frames.

153	         The SMV codec chooses the output frame rate based on analysis of the
154	         input speech and the current operating mode. For typical speech
155	         patterns, this results in an average output of 4.2 kilobits/second
156	         for Mode 0 in two way conversation (approximately 50% active speech
157	         time and 50% in eighth rate while listening) and lower for other
158	         reduced rate modes. SMV is more bandwidth efficient than EVRC. EVRC
159	         is equivalent in performance to SMV mode 1.

161	      3.3. Other Frame-Based Vocoders

163	         Other frame-based vocoders can be carried in the packet format
164	         defined in this document, as long as they possess the following
165	         properties:

167	          o The codec is frame-based;
168	          o blank and erasure frames are supported;
169	          o the total number of rates is less than 17;
170	          o the maximum full rate frame can be transported in a single RTP
171	            packet using this specific format.

173	         Vocoders with the characteristics listed above can be transported
174	         using the packet format specified in this document with some
175	         additional specification work; the pieces that must be defined are
176	         listed in Section 15.

178	      4. RTP/Vocoder Packet Format

180	         The vocoder speech data may be transmitted in either of the two RTP
181	         packet formats specified in the following two subsections, as
182	         appropriate for the application scenario. In the packet format
183	         diagrams shown in this document, bit 0 is the most significant bit.

185	      4.1. Interleaved/Bundled Packet Format

187	         This format is used to send one or more vocoder frames per packet.
188	         Interleaving or bundling MAY be used. The RTP packet for this format
189	         is as follows:

191	          0                   1                   2                   3
192	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
193	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
194	         |                      RTP Header [4]                           |
195	         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
196	         |R|R| LLL | NNN | MMM |  Count  |  TOC  |  ...  |  TOC  |padding|
197	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
198	         |        one or more codec data frames, one per TOC entry       |
199	         |                             ....                              |
200	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
201	         The RTP header has the expected values as described in the RTP
202	         specification [4]. The RTP timestamp is in 1/8000 of a second units
203	         for EVRC and SMV. For any other vocoders that use this packet format,
204	         the timestamp unit needs to be defined explicitly. The M bit should
205	         be set as specified in the applicable RTP profile, for example, RFC
206	         1890 [5]. Note that RFC 1890 [5] specifies that if the sender does
207	         not suppress silence, the M bit will always be zero. When multiple
208	         codec data frames are present in a single RTP packet, the timestamp
209	         is that of the oldest data represented in the RTP packet. The
210	         assignment of an RTP payload type for this packet format is outside
211	         the scope of this document; it is specified by the RTP profile under
212	         which this payload format is used.

214	         The first octet of a Interleaved/Bundled format packet is the
215	         Interleave Octet. The second octet contains the Mode Request and
216	         Frame Count fields. The Table of Contents (ToC) field then follows.
217	         The fields are specified as follows:

219	         Reserved (RR): 2 bits
220	            Reserved bits. MUST be set to zero by sender, SHOULD be ignored
221	            by receiver.

223	         Interleave Length (LLL): 3 bits
224	            Indicates the length of interleave; a value of 0 indicates
225	            bundling, a special case of interleaving. See Section 6 and
226	            Section 7 for more detailed discussion.

228	         Interleave Index (NNN): 3 bits
229	            Indicates the index within an interleave group. MUST have a value
230	            less than or equal to the value of LLL. Values of NNN greater
231	            than the value of LLL are invalid. Packet with invalid NNN values
232	            SHOULD be ignored by the receiver.

234	         Mode Request (MMM): 3 bits
235	            The Mode Request field is used to signal Mode Request
236	            information. See Section 10 for details.

238	         Frame Count (Count): 5 bits
239	            The number of ToC fields (and vocoder frames) present in the
240	            packet is the value of the frame count field plus one. A value of
241	            zero indicates that the packet contains one ToC field, while a
242	            value of 31 indicates that the packet contains 32 ToC fields.

244	         Padding (padding): 0 or 4 bits
245	            This padding ensures that codec data frames start on an octet
246	            boundary. When the frame count is odd, the sender MUST add 4 bits
247	            of padding following the last TOC. When the frame count is even,
248	            the sender MUST NOT add padding bits. If padding is present, the
249	            padding bits MUST be set to zero by sender, and SHOULD be ignored
250	            by receiver.

252	         The Table of Contents field (ToC) provides information on the codec
253	         data frame(s) in the packet. There is one ToC entry for each codec
254	         data frame. The detailed formats of the ToC field and codec data
255	         frames are specified in Section 5.

257	         Multiple data frames may be included within a Interleaved/Bundled
258	         packet using interleaving or bundling as described in Section 6 and
259	         Section 7.

261	      4.2. Header-Free Packet Format

263	         The Header-Free Packet Format is designed for maximum bandwidth
264	         efficiency and low latency. Only one codec data frame can be sent in
265	         each Header-Free format packet. None of the payload header fields
266	         (LLL, NNN, MMM, Count) nor ToC entries are present. The codec rate
267	         for the data frame can be determined from the length of the codec
268	         data frame, since there is only one codec data frame in each Header-
269	         Free packet.

271	         Use of the RTP header fields for Header-Free RTP/Vocoder Packet
272	         Format is the same as described in Section 4.1 for
273	         Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format of
274	         the codec data frame is specified in Section 5.

276	          0                   1                   2                   3
277	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
278	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
279	         |                      RTP Header [4]                           |
280	         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
281	         |                                                               |
282	         +          ONLY one codec data frame            +-+-+-+-+-+-+-+-+
283	         |                                               |
284	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

286	      4.3. Determining the Format of Packets

288	         All receivers SHOULD be able to process both packet formats. The
289	         sender MAY choose to use one or both packet formats.

291	         A receiver MUST have prior knowledge of the packet format to
292	         correctly decode the RTP packets.
293	         When packets of both formats are used within the same session,
294	         different RTP payload type values MUST be used for each format to
295	         distinguish the packet formats. The association of payload type
296	         number with the packet format is done out-of-band, for example by SDP
297	         during the setup of a session.

299	      5. Packet Table of Contents Entries and Codec Data Frame Format

301	      5.1. Packet Table of Contents entries

303	         Each codec data frame in a Interleaved/Bundled packet has a
304	         corresponding Table of Contents (ToC) entry. The ToC entry indicates
305	         the rate of the codec frame. (Header-Free packets MUST NOT have a ToC
306	         field.)

308	         Each ToC entry is occupies four bits. The format of the bits is
309	         indicated below:

311	             0 1 2 3
312	            +-+-+-+-+
313	            |fr type|
314	            +-+-+-+-+

316	         Frame Type: 4 bits
317	            The frame type indicates the type of the corresponding codec data
318	            frame in the RTP packet.

320	         For EVRC and SMV codecs, the frame type values and size of the
321	         associated codec data frame are described in the table below:

323	         Value   Rate      Total codec data frame size (in octets)
324	         ---------------------------------------------------------
325	           0     Blank      0    (0 bit)
326	           1     1/8        2    (16 bits)
327	           2     1/4        5    (40 bits; not valid for EVRC)
328	           3     1/2       10    (80 bits)
329	           4     1         22    (171 bits; 5 padded at end with zeros)
330	           5     Erasure    0    (SHOULD NOT be transmitted by sender)

332	         All values not listed in the above table MUST be considered reserved.
333	         A ToC entry with a reserved Frame Type value SHOULD be considered
334	         invalid. Note that the EVRC codec does not have 1/4 rate frames, thus
335	         frame type value 2 MUST be considered a reserved value when the EVRC
336	         codec is in use.

338	         Other vocoders that use this packet format need to specify their own
339	         table of frame types and corresponding codec data frames.

341	      5.2. Codec Data Frames

343	         The output of the vocoder MUST be converted into codec data frames
344	         for inclusion in the RTP payload. The conversions for EVRC and SMV
345	         codecs are specified below. (Note: Because the EVRC codec does not
346	         have Rate 1/4 frames, the specifications of 1/4 frames does not apply
347	         to EVRC codec data frames). Other vocoders that use this packet
348	         format need to specify how to convert vocoder output data into
349	         frames.

351	         The codec output data bits as numbered in EVRC and SMV are packed
352	         into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2,
353	         Rate 1/4 and Rate 1/8) is placed in the most significant bit
354	         (internet bit 0) of octet 1 of the codec data frame, the second
355	         lowest bit is placed in the second most significant bit of the first
356	         octet, the third lowest in the third most significant bit of the
357	         first octet, and so on. This continues until all of the bits have
358	         been placed in the codec data frame.

360	         The remaining unused bits of the last octet of the codec data frame
361	         MUST be set to zero. Note that in EVRC and SMV this is only
362	         applicable to Rate 1 frames (171 bits) as the Rate 1/2 (80 bits),
363	         Rate 1/4 (40 bits, SMV only) and Rate 1/8 frames (16 bits) fit
364	         exactly into a whole number of octets.

366	         Following is a detailed listing showing a Rate 1 EVRC/SMV codec
367	         output frame converted into a codec data frame:

369	         The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets
370	         long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are
371	         placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV
372	         codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly,
373	         but do not require zero padding because they align on octet
374	         boundaries.

376	                              Rate 1 codec data frame

378	          0                   1                   2                   3
379	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
380	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
381	         |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
382	         |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
383	         |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
384	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
385	         :                                                               :
386	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
387	         |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
388	         |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
389	         |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
390	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

392	      6. Interleaving Codec Data Frames

394	         As indicated in Section 4.1, more than one codec data frame MAY be
395	         included in a single Interleaved/Bundled packet by a sender. This is
396	         accomplished by interleaving or bundling.

398	         Bundling is used to spread the transmission overhead of the RTP and
399	         payload header over multiple vocoder frames. Interleaving
400	         additionally reduces the listener's perception of data loss by
401	         spreading such loss over non-consecutive vocoder frames. EVRC, SMV,
402	         and similar vocoders are able to compensate for an occasional lost
403	         frame, but speech quality degrades exponentially with consecutive
404	         frame loss.

406	         Bundling is signaled by setting the LLL field to zero and the Count
407	         field to greater than zero. Interleaving is indicated by setting the
408	         LLL field to a value greater than zero.

410	         The discussions on general interleaving apply to the bundling (which
411	         can be viewed as a reduced case of interleaving) with reduced
412	         complexity. The bundling case is discussed in detail in Section 7.

414	         Senders MAY support interleaving and/or bundling. All receivers that
415	         support Interleave/Bundling packet format MUST support both
416	         interleaving and bundling.

418	         Given a time-ordered sequence of output frames from the codec
419	         numbered 0..n, a bundling value B (the value in the Count field plus
420	         one), and an interleave length L where n = B * (L+1) - 1, the output
421	         frames are placed into RTP packets as follows (the values of the
422	         fields LLL and NNN are indicated for each RTP packet):

424	         First RTP Packet in Interleave group:
425	            LLL=L, NNN=0
426	            Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of
427	            B frames

429	         Second RTP Packet in Interleave group:
430	            LLL=L, NNN=1
431	            Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
432	            total of B frames

434	         This continues to the last RTP packet in the interleave group:

436	         L+1 RTP Packet in Interleave group:
437	            LLL=L, NNN=L
438	            Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a
439	            total of B frames

441	         Within each interleave group, the RTP packets making up the
442	         interleave group MUST be transmitted in value-increasing order of the
443	         NNN field. While this does not guarantee reduced end-to-end delay on
444	         the receiving end, when packets are delivered in order by the
445	         underlying transport, delay will be reduced to the minimum possible.

447	         Receivers MAY signal the maximum number of codec data frames (i.e.,
448	         the maximum acceptable bundling value B) they can handle in a single
449	         RTP packet using the OPTIONAL maxptime RTP mode parameter identified
450	         in Section 12.

452	         Receivers MAY signal the maximum interleave length (i.e., the maximum
453	         acceptable LLL value in the Interleaving Octet) they will accept
454	         using the OPTIONAL maxinterleave RTP mode parameter identified in
455	         Section 12.

457	         The parameters maxptime and maxinterleave are exchanged at the
458	         initial setup of the session. In one-to-one sessions, the sender MUST
459	         respect these values set be the receiver, and MUST NOT
460	         interleave/bundle more packets than what the receiver signals that it
461	         can handle. This ensures that the receiver can allocate a known
462	         amount of buffer space that will be sufficient for all
463	         interleaving/bundling used in that session. During the session, the
464	         sender may decrease the bundling value or interleaving length (so
465	         that less buffer space is required at the receiver), but never exceed
466	         the maximum value set by the receiver. This prevents the situation
467	         where a receiver needs to allocate more buffer space in the middle of
468	         a session but is unable to do so.

470	         Additionally, senders have the following restrictions:

472	         o  MUST NOT bundle more codec data frames in a single RTP packet than
473	            indicated by maxptime (see Section 12) if it is signaled.

475	         o  SHOULD NOT bundle more codec data frames in a single RTP packet
476	            than will fit in the MTU of the underlying network.

478	         o  Once beginning a session with a given maximum interleaving value
479	            set by maxinterleave in Section 12, MUST NOT increase the
480	            interleaving value (LLL) to exceed the maximum interleaving value
481	            that is signaled.

483	         o  MAY change the interleaving value, but MUST do so only between
484	            interleave groups.

486	         o  Silence suppression MUST only be used between interleave groups. A
487	            ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used
488	            within interleaving groups if the codec outputs a blank frame.
489	            The M bit in the RTP header is not set for these blank frames, as
490	            the stream is continuous in time. Because there is only one time
491	            stamp for each RTP packet, silence suppression used within an
492	            interleave group would cause ambiguities when reconstructing the
493	            speech at the receiver side, and thus is prohibited.

495	         Given an RTP packet with sequence number S, interleave length (field
496	         LLL) L, interleave index value (field NNN) N, and bundling value B,
497	         the interleave group consists of this RTP packet and other RTP
498	         packets with sequence numbers from S-N mod 65536 to S-N+L mod 65536
499	         inclusive. In other words, the interleave group always consists of
500	         L+1 RTP packets with sequential sequence numbers. The bundling value
501	         for all RTP packets in an interleave group MUST be the same.

503	         The receiver determines the expected bundling value for all RTP
504	         packets in an interleave group by the number of codec data frames
505	         bundled in the first RTP packet of the interleave group received.
506	         Note that this may not be the first RTP packet of the interleave
507	         group if packets are delivered out of order by the underlying
508	         transport.

510	      7. Bundling Codec Data Frames

512	         As discussed in Section 6, the bundling of codec data frames is a
513	         special reduced case of interleaving with LLL value in the Interleave
514	         Octet set to 0.

516	         Bundling codec data frames indicates that multiple data frames are
517	         included consecutively in a packet, because the interleaving length
518	         (LLL) is 0. The interleaving group is thus reduced to a single RTP
519	         packet, and the reconstruction of the codec data frames from RTP
520	         packets becomes a much simpler process.

522	         Furthermore, the additional restrictions on senders are reduced to:

524	         o  MUST NOT bundle more codec data frames in a single RTP packet than
525	            indicated by maxptime (see Section 12) if it is signaled.

527	         o  SHOULD NOT bundle more codec data frames in a single RTP packet
528	            than will fit in the MTU of the underlying network.

530	      8. Handling Missing Codec Data Frames

532	         The vocoders covered by this payload format support erasure frames as
533	         an indication when frames are not available. The erasure frames are
534	         normally used internally by a receiver to advance the state of the
535	         voice decoder by exactly one frame time for each missing frame. Using
536	         the information from packet sequence number, time stamp, and the M
537	         bit, the receiver can detect missing codec data frames from RTP
538	         packet loss and/or silence suppression, and generate corresponding
539	         erasure frames. Erasure frames MUST also be used in storage format to
540	         record missing frames.

542	      9. Implementation Issues

544	      9.1. Interleaving Length

546	         The vocoder interpolates the missing speech content when given an
547	         erasure frame. However, the best quality is perceived by the listener
548	         when erasure frames are not consecutive. This makes interleaving
549	         desirable as it increases speech quality when packet loss occurs.

551	         On the other hand, interleaving can greatly increase the end-to-end
552	         delay. Where an interactive session is desired, either
553	         Interleaved/Bundled packet format with interleaving length (field
554	         LLL) 0 or Header-Free packet format is RECOMMENDED.

556	         When end-to-end delay is not a primary concern, an interleaving
557	         length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable
558	         compromise between robustness and latency.

560	      9.2. Validation of Received Packets

562	         When receiving an RTP packet, the receiver SHOULD check the validity
563	         of the ToC fields and match the length of the packet with what is
564	         indicated by the ToC fields. If any invalidity or mismatch is
565	         detected, it is RECOMMENDED to discard the received packet to avoid
566	         potential severe degradation of the speech quality. The discarded
567	         packet is treated following the same procedure as a lost packet, and
568	         the discarded data will be replaced with erasure frames.

570	         On receipt of an RTP packet with an invalid value of the LLL or NNN
571	         fields, the RTP packet SHOULD be treated as lost by the receiver for
572	         the purpose of generating erasure frames as described in Section 8.

574	         On receipt of an RTP packet in an interleave group with other than
575	         the expected frame count value, the receiver MAY discard codec data
576	         frames off the end of the RTP packet or add erasure codec data frames
577	         to the end of the packet in order to manufacture a substitute packet
578	         with the expected bundling value.  The receiver MAY instead choose to
579	         discard the whole interleave group.

581	      9.3. Processing the Late Packets

583	         Assume that the receiver has begun playing frames from an interleave
584	         group. The time has come to play frame x from packet n of the
585	         interleave group. Further assume that packet n of the interleave
586	         group has not been received. As described in Section 8, an erasure
587	         frame will be sent to the receiving vocoder.

589	         Now, assume that packet n of the interleave group arrives before
590	         frame x+1 of that packet is needed. Receivers should use frame x+1 of
591	         the newly received packet n rather than substituting an erasure
592	         frame. In other words, just because packet n was not available the
593	         first time it was needed to reconstruct the interleaved speech, the
594	         receiver should not assume it is not available when it is
595	         subsequently needed for interleaved speech reconstruction.

597	      10. Mode Request

599	         The Mode Request signal requests a particular encoding mode for the
600	         speech encoding in the reverse direction. All implementations are
601	         RECOMMENDED to honor the Mode Request signal. The Mode Request signal
602	         SHOULD only be used in one-to-one sessions. In multiparty sessions,
603	         any received Mode Request signals SHOULD be ignored.

605	         In addition, the Mode Request signal MAY also be sent through non-RTP
606	         means, which is out of the scope of this specification.

608	         The three-bit Mode Request field is used to signal the receiver to
609	         set a particular encoding mode to its audio encoder. If the Mode
610	         Request field is set to a non-zero value in RTP packets from node A
611	         to node B, it is a request for node B to change to the requested
612	         encoding mode for its audio encoder and therefore the bit rate of the
613	         RTP stream from node B to node A. Once a node sets this field to a
614	         non-zero value it SHOULD continue to set the field to the same value
615	         in subsequent packets until the requested mode has changed. This
616	         design helps to eliminate the scenario of getting the codec stuck in
617	         an unintended state if one of the packets that carries the Mode
618	         Request is lost. An otherwise silent node MAY send an RTP packet
619	         containing a blank frame in order to send a Mode Request.

621	         Each codec type using this format SHOULD define its own
622	         interpretation of the Mode Request field. Codecs SHOULD follow the
623	         convention that higher values of the three-bit field correspond to an
624	         equal or lower average output bit rate.

626	         For the EVRC codec, the Mode Request field MUST be interpreted
627	         according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
628	         specifications [1].  Values above '100' (4) are currently reserved.
629	         If an unknown value above '100' (4) is received, it MUST be handled
630	         as if '100' (4) were received, for interoperability with potential
631	         future revisions.

633	         For SMV codec, the Mode Request field MUST be interpreted according
634	         to Table 2.2-2 of the SMV codec specifications [2]. Values above
635	         '101' (5) are currently reserved. If an unknown value above '101' (5)
636	         is received, it MUST be handled as if '101' (5) were received, also
637	         for interoperability with potential future revisions.

639	      11. Storage Format

641	         The storage format is used for storing speech frames, e.g., as a file
642	         or e-mail attachment.

644	         The file begins with a magic number to identify the vocoder that is
645	         used. The magic number for EVRC corresponds to the ASCII character
646	         string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A". The
647	         magic number for SMV corresponds to the ASCII character string
648	         "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a".

650	         The codec data frames are stored in consecutive order, with a single
651	         TOC entry field, extended to one octet, prefixing each codec data
652	         frame. The ToC field is extended to one octet by setting the four
653	         most significant bits of the octet to zero. For example, a ToC value
654	         of 4 (a full-rate frame) is stored as 0x04.

656	         Speech frames lost in transmission and non-received frames MUST be
657	         stored as erasure frames (frame type 5, see definition in Section
658	         5.1) to maintain synchronization with the original media.

660	      12. IANA Considerations

662	         Four new MIME sub-types as described in this section are to be
663	         registered.

665	         The MIME-names for the EVRC and SMV codec are allocated from the IETF
666	         tree since all the vocoders covered are expected to be widely used
667	         for Voice-over-IP applications.

669	      12.1. Registration of Media Type EVRC

671	         Media Type Name:           audio

673	         Media Subtype Name:        EVRC

675	         Required Parameter:        none

677	         Optional parameters:
678	            The following parameters apply to RTP transfer only.

680	            ptime:    Defined as usual for RTP audio RFC 2327.

682	            maxptime: The maximum amount of media which can be encapsulated
683	               in each packet, expressed as time in milliseconds. The time
684	               SHALL be calculated as the sum of the time the media present
685	               in the packet represents. The time SHOULD be a multiple of the
686	               duration of a single codec data frame (20 msec). If not
687	               signaled, the default maxptime value SHALL be 200
688	               milliseconds.

690	            maxinterleave: Maximum number for interleaving length (field LLL
691	               in the Interleaving Octet). The interleaving lengths used in
692	               the entire session MUST NOT exceed this maximum value. If not
693	               signaled, the maxinterleave length SHALL be 5.

695	         Encoding considerations:
696	            This type is defined for transfer of EVRC-encoded data via RTP
697	            using the Interleaved/Bundled packet format specified in Sections
698	            4.1, 6, and 7 of RFC xxxx. It is also defined for other transfer
699	            methods using the storage format specified in Section 11 of RFC
700	            xxxx.

702	         Security considerations:
703	            See Section 14 "Security Considerations" of RFC xxxx.

705	         Public specification:
706	            The EVRC vocoder is specified in 3GPP2 C.S0014.
707	            Transfer methods are specified in RFC xxxx.

709	         Additional information:
710	            The following information applies for storage format only.

712	            Magic number: #!EVRC\n (see Section 11 of RFC xxxx)
713	            File extensions: evc, EVC
714	            Macintosh file type code: none
715	            Object identifier or OID: none

717	         Intended usage:
718	            COMMON. It is expected that many VoIP applications (as well as
719	            mobile applications) will use this type.

721	         Person & email address to contact for further information:
722	            Adam Li
723	            adamli@icsl.ucla.edu

725	         Author/Change controller:
726	            Adam Li
727	            adamli@icsl.ucla.edu
728	            IETF Audio/Video Transport Working Group

730	      12.2. Registration of Media Type EVRC0

732	         Media Type Name:           audio

734	         Media Subtype Name:        EVRC0

736	         Required Parameters:       none

738	         Optional parameters:       none

740	         Encoding considerations:   none
741	            This type is only defined for transfer of EVRC-encoded data via
742	            RTP using the Header-Free packet format specified in Section 4.2
743	            of RFC xxxx.

745	         Security considerations:
746	            See Section 14 "Security Considerations" of RFC xxxx.

748	         Public specification:
749	            The EVRC vocoder is specified in 3GPP2 C.S0014.
750	            Transfer methods are specified in RFC xxxx.

752	         Additional information:    none

754	         Intended usage:
755	            COMMON. It is expected that many VoIP applications (as well as
756	            mobile applications) will use this type.

758	         Person & email address to contact for further information:
759	            Adam Li
760	            adamli@icsl.ucla.edu

762	         Author/Change controller:
763	            Adam Li
764	            adamli@icsl.ucla.edu
765	            IETF Audio/Video Transport Working Group

767	      12.3. Registration of Media Type SMV

769	         Media Type Name:           audio

771	         Media Subtype Name:        SMV

773	         Required Parameter:        none

775	         Optional parameters:
776	            The following parameters apply to RTP transfer only.

778	            ptime:    Defined as usual for RTP audio 2327.

780	            maxptime: The maximum amount of media which can be encapsulated
781	               in each packet, expressed as time in milliseconds. The time
782	               SHALL be calculated as the sum of the time the media present
783	               in the packet represents. The time SHOULD be a multiple of the
784	               duration of a single codec data frame (20 msec). If not
785	               signaled, the default maxptime value SHALL be 200
786	               milliseconds.

788	            maxinterleave: Maximum number for interleaving length (field LLL
789	               in the Interleaving Octet). The interleaving lengths used in
790	               the entire session MUST NOT exceed this maximum value. If not
791	               signaled, the maxinterleave length SHALL be 5.

793	         Encoding considerations:
794	            This type is defined for transfer of SMV-encoded data via RTP
795	            using the Interleaved/Bundled packet format specified in Section
796	            4.1, 6, and 7 of RFC xxxx. It is also defined for other transfer
797	            methods using the storage format specified in Section 11 of RFC
798	            xxxx.

800	         Security considerations:
801	            See Section 14 "Security Considerations" of RFC xxxx.

803	         Public specification:
804	            The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0.
805	            Transfer methods are specified in RFC xxxx.

807	         Additional information:
808	            The following information applies to storage format only.

810	            Magic number: #!SMV\n (see Section 11 of RFC xxxx)
811	            File extensions: smv, SMV
812	            Macintosh file type code: none
813	            Object identifier or OID: none

815	         Intended usage:
816	            COMMON. It is expected that many VoIP applications (as well as
817	            mobile applications) will use this type.

819	         Person & email address to contact for further information:
820	            Adam Li
821	            adamli@icsl.ucla.edu

823	         Author/Change controller:
824	            Adam Li
825	            adamli@icsl.ucla.edu
826	            IETF Audio/Video Transport Working Group

828	      12.4. Registration of Media Type SMV0

830	         Media Type Name:           audio

832	         Media Subtype Name:        SMV0

834	         Required Parameter:        none

836	         Optional parameters:       none

838	         Encoding considerations:   none
839	            This type is only defined for transfer of SMV-encoded data via
840	            RTP using the Header-Free packet format specified in Section 4.2
841	            of RFC xxxx.

843	         Security considerations:
844	            See Section 14 "Security Considerations" of RFC xxxx.

846	         Public specification:
847	            The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0.
848	            Transfer methods are specified in RFC xxxx.

850	         Additional information:    none

852	         Intended usage:
853	            COMMON. It is expected that many VoIP applications (as well as
854	            mobile applications) will use this type.

856	         Person & email address to contact for further information:
857	            Adam Li
858	            adamli@icsl.ucla.edu

860	         Author/Change controller:
861	            Adam Li
862	            adamli@icsl.ucla.edu
863	            IETF Audio/Video Transport Working Group

865	      13. Mapping to SDP Parameters

867	         Please note that this section applies to the RTP transfer only.

869	         The information carried in the MIME media type specification has a
870	         specific mapping to fields in the Session Description Protocol (SDP)
871	         [6], which is commonly used to describe RTP sessions. When SDP is
872	         used to specify sessions employing the EVRC or EMV codec, the mapping
873	         is as follows:

875	            o The MIME type ("audio") goes in SDP "m=" as the media name.

877	            o The MIME subtype (payload format name) goes in SDP "a=rtpmap"
878	              as the encoding name.

880	            o The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
881	              and "a=maxptime" attributes, respectively.

883	            o The parameter �maxinterleave� goes in the SDP "a=fmtp"
884	              attribute by copying it directly from the MIME media type string
885	              as �maxinterleave=value�.

887	         Some examples of SDP session descriptions for EVRC and SMV encodings
888	         follow below.

890	         Example of usage of EVRC:

892	           m=audio 49120 RTP/AVP 97
893	           a=rtpmap:97 EVRC
894	           a=fmtp:97 maxinterleave=2
895	           a=maxptime:80

897	         Example of usage of SMV

899	           m=audio 49122 RTP/AVP 99
900	           a=rtpmap:99 SMV0
901	           a=fmtp:99

903	         Note that the payload format (encoding) names are commonly shown in
904	         upper case. MIME subtypes are commonly shown in lower case. These
905	         names are case-insensitive in both places. Similarly, parameter names
906	         are case-insensitive both in MIME types and in the default mapping to
907	         the SDP a=fmtp attribute.

909	      14. Security Considerations

911	         RTP packets using the payload format defined in this specification
912	         are subject to the security considerations discussed in the RTP
913	         specification [4], and any appropriate profile (for example [5]).
914	         This implies that confidentiality of the media streams is achieved by
915	         encryption. Because the data compression used with this payload
916	         format is applied end-to-end, encryption may be performed after
917	         compression so there is no conflict between the two operations.

919	         A potential denial-of-service threat exists for data encoding using
920	         compression techniques that have non-uniform receiver-end
921	         computational load. The attacker can inject pathological datagrams
922	         into the stream which are complex to decode and cause the receiver to
923	         become overloaded. However, the encodings covered in this document do
924	         not exhibit any significant non-uniformity.

926	         As with any IP-based protocol, in some circumstances, a receiver may
927	         be overloaded simply by the receipt of too many packets, either
928	         desired or undesired. Network-layer authentication may be used to
929	         discard packets from undesired sources, but the processing cost of
930	         the authentication itself may be too high. In a multicast
931	         environment, pruning of specific sources may be implemented in
932	         future versions of IGMP [7] and in multicast routing protocols to
933	         allow a receiver to select which sources are allowed to reach it.

935	         Interleaving may affect encryption. Depending on the used encryption
936	         scheme there may be restrictions on for example the time when keys
937	         can be changed. Specifically, the key change may need to occur at the
938	         boundary between interleave groups.

940	      15. Adding Support of Other Frame-Based Vocoders

942	         As described above, the RTP packet format defined in this document is
943	         very flexible and designed to be usable by other frame-based
944	         vocoders.

946	         Additional vocoders using this format MUST have properties as
947	         described in Section 3.3.

949	         For an eligible vocoder to use the payload format mechanisms defined
950	         in this document, a new RTP payload format document needs to be
951	         published as a standards track RFC. That document can simply refer to
952	         this document and then specify the following parameters:

954	          o Define the unit used for RTP time stamp;
955	          o Define the meaning of the Mode Request bits;
956	          o Define corresponding codec data frame type values for ToC;
957	          o Define the conversion procedure for vocoders output data frame;
958	          o Define a magic number for storage format, and complete the
959	            corresponding MIME registration.

961	      16. Acknowledgements

963	         The following authors have made significant contributions to this
964	         document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
965	         Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
966	         Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
967	         Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
968	         Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
969	         Sherwood, and Thomas Zeng.

971	      17. References

973	         [1]  3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
974	              Option 3 for Wideband Spread Spectrum Digital Systems", January
975	              1997.

977	         [2]  3GPP2 C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option
978	              for Wideband Spread Spectrum Communication Systems", May 2002.

980	         [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
981	              Levels", BCP 14, RFC 2119, March 1997.

983	         [4]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
984	              "RTP:  A Transport Protocol for Real-Time Applications", RFC
985	              1889, January 1996.

987	         [5]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
988	              with Minimal Control", RFC 1890, January 1996.

990	         [6]  M. Handley and V. Jacobson, "SDP: Session Description Protocol",
991	              RFC 2327, April 1998.

993	         [7]  Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
994	              1112, August 1989.

996	      18. Authors' Address

998	         The editor will serve as the point of contact for technical issues.

1000	         Adam H. Li
1001	         Image Communication Lab
1002	         Electrical Engineering Department
1003	         University of California
1004	         Los Angeles, CA 90095
1005	         USA
1006	         Phone: +1 310 825 5178
1007	         Email: adamli@icsl.ucla.edu