idnits 2.17.1 draft-ietf-avt-rtp-ipmr-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 19 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 20 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 25 instances of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'not RECOMMENDED' in this paragraph: Application MAY utilize bitstream redundancy to combat packet loss. But the gateway is free to chose any option to reduce transmission rate - coding layer or redundancy bits can be dropped. Due to this fact it is not RECOMMENDED application to increase total bitrate when adding redundancy in a response to packet loss. -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 20, 2010) is 4967 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '6' on line 824

  == Missing Reference: 'RFC3550' is mentioned on line 507, but not defined

  == Missing Reference: 'RFC-3711' is mentioned on line 513, but not defined

  -- Looks like a reference, but probably isn't: '4' on line 823

  -- Looks like a reference, but probably isn't: '16' on line 782

  -- Looks like a reference, but probably isn't: '2' on line 839

  -- Looks like a reference, but probably isn't: '14' on line 803

  -- Looks like a reference, but probably isn't: '0' on line 850

  -- Looks like a reference, but probably isn't: '1' on line 838

  -- Looks like a reference, but probably isn't: '3' on line 840

  -- Looks like a reference, but probably isn't: '5' on line 841

  -- Looks like a reference, but probably isn't: '7' on line 830

  == Unused Reference: 'RFC 3711' is defined on line 622, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC 5246' is defined on line 626, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC 4301' is defined on line 629, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446)


     Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Audio/Video Transport Working Group                            S. Ikonin
3	Internet Draft                                                SPIRIT DSP
4	Intended status: Proposed Standard                    September 20, 2010

6	               RTP Payload Format for IP-MR Speech Codec
7	                     draft-ietf-avt-rtp-ipmr-13.txt

9	Abstract

11	   This document specifies the payload format for packetization of
12	   SPIRIT IP-MR encoded speech signals into the real-time transport
13	   protocol (RTP). The payload format supports transmission of multiple
14	   frames per packet and introduced redundancy for robustness against
15	   packet loss and bit errors.

17	Status of this Memo

19	   This Internet-Draft is submitted to IETF in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF), its areas, and its working groups. Note that other
24	   groups may also distribute working documents as Internet-Drafts.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time. It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   The list of current Internet-Drafts can be accessed at
32	   http://www.ietf.org/1id-abstracts.html

34	   The list of Internet-Draft Shadow Directories can be accessed at
35	   http://www.ietf.org/shadow.html

37	   This Internet-Draft will expire on December 18, 2010.

39	Copyright Notice

41	   Copyright (c) 2010 IETF Trust and the persons identified as the
42	   document authors. All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	   The source codes included in this document are provided under BSD
55	   license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf).

57	Table of Contents

59	   1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3
60	   2. IP-MR Codec Description  . . . . . . . . . . . . . . . . . . . . 3
61	   3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . . 4
62	      3.1. RTP Header Usage  . . . . . . . . . . . . . . . . . . . . . 4
63	      3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . . . 5
64	      3.3. Speech Payload Header . . . . . . . . . . . . . . . . . . . 5
65	      3.4. Speech Payload Table of Contents  . . . . . . . . . . . . . 6
66	      3.5. Speech Payload Data . . . . . . . . . . . . . . . . . . . . 6
67	      3.6. Redundancy Payload Header . . . . . . . . . . . . . . . . . 7
68	      3.7. Redundancy Payload Table of Contents  . . . . . . . . . . . 8
69	      3.8. Redundancy Payload Data . . . . . . . . . . . . . . . . . . 8
70	   4. Payload Examples . . . . . . . . . . . . . . . . . . . . . . . . 9
71	      4.1. Payload Carrying a Single Frame . . . . . . . . . . . . . . 9
72	      4.2. Payload Carrying Multiple Frames with Redundancy  . . . .  10
73	   5. Congestion Control . . . . . . . . . . . . . . . . . . . . . .  11
74	   6. Security Considerations  . . . . . . . . . . . . . . . . . . .  12
75	   7. Payload Format Parameters  . . . . . . . . . . . . . . . . . .  12
76	      7.1. Media Type Registration . . . . . . . . . . . . . . . . .  12
77	      7.2. Mapping Media Type Parameters into SDP  . . . . . . . . .  13
78	   8. IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  14
79	   9. Normative References . . . . . . . . . . . . . . . . . . . . .  14
80	   10. Disclaimer  . . . . . . . . . . . . . . . . . . . . . . . . .  14
81	   11. Legal Terms . . . . . . . . . . . . . . . . . . . . . . . . .  15
82	   12. Authors' Addresses  . . . . . . . . . . . . . . . . . . . . .  16
83	   APPENDIX A. RETRIEVING FRAME INFORMATION  . . . . . . . . . . . .  17
84	      A.1. get_frame_info.c  . . . . . . . . . . . . . . . . . . . .  17

86	1. Introduction

88	   This document specifies the payload format for packetization of
89	   SPIRIT IP-MR encoded speech signals into the real-time transport
90	   protocol (RTP). The payload format supports transmission of multiple
91	   frames per packet and introduced redundancy for robustness against
92	   packet loss and bit errors.

94	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
95	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
96	   document are to be interpreted as described in RFC 2119 [RFC 2119].

98	2. IP-MR Codec Description

100	   IP-MR is a wideband speech codec designed by SPIRIT for conferencing
101	   services over packet-switched networks such as the Internet.

103	   IP-MR is a scalable codec. It means that not only source has the
104	   ability to change transmission rate on a fly, but the gateway is also
105	   able to decrease bandwidth at any time without performance overhead.
106	   There are 6 coding rates from 7.7 to 34.2 kbps available.

108	   Codec operates on a frame-by-frame basis with a frame size of 20 ms
109	   at 16 kHz sampling rate with the total end-to-end delay of 25ms. Each
110	   compressed frame represented as a sequence of layers. The first
111	   (base) layer is mandatory while the other (enhancement) can be safely
112	   discarded. Information about particular frame structure is available
113	   from the payload header. In order to adjust outgoing bandwidth the
114	   gateway MUST read frame(s) structure from the payload header, define
115	   which enhancement layers to discard and compose new RTP packet
116	   according to this specification.

118	   In fact, not all of bits within a frame are equally tolerant to
119	   distortion. IP-MR defines 6 classes ('A'-'F') of sensitivity to bit
120	   errors. Any damage of class 'A' bits cause significant reconstruction
121	   artifacts while the lost in class 'F' may be even not perceived by
122	   the listener. Note, only base layer in a bitstream is represented as
123	   a set of classes.

125	   The IP-MR payload format allows frame duplicate through the packets
126	   to improve robustness against packet loss (Section 3.6). Base layer
127	   can be retransmitted completely or in several sensitive classes.
128	   Enchantment layers are not retransmittable.

130	   The fine-grained redundancy in conjunction with bitrate scalability
131	   allows application adjust the trade-off between overhead and
132	   robustness against packet loss. Note, this approach supported
133	   natively within a packet and requires no out-of-band signals or
134	   session initialization procedures.

136	   Main IP-MR features are as the following:

138	      o High quality wideband speech codec.

140	      o Bitrate scalable with 6 average rates from 7.7 to 34.2 kbps.

142	      o Built-in discontinuous transmission (DTX) and comfort noise
143	      generation (CNG) support.

145	      o Flexible in-band redundancy control scheme for packet loss
146	      protection.

148	3. Payload Format

150	   The payload format consists of the RTP header, and IP-MR payload.

152	3.1. RTP Header Usage

154	   The format of the RTP header is specified in RFC 1889. This payload
155	   format uses the fields of the header in a manner consistent with that
156	   specification.

158	   The RTP timestamp corresponds to the sampling instant of the first
159	   sample encoded for the first frame-block in the packet. The timestamp
160	   clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms,
161	   this corresponding to 320 samples per frame. Thus the timestamp is
162	   increased by 320 for each consecutive frame. The timestamp is also
163	   used to recover the correct decoding order of the frame-blocks.

165	   The RTP header marker bit (M) SHALL be set to 1 whenever the first
166	   frame-block carried in the packet is the first frame-block in a
167	   talkspurt (see definition of the talkspurt in Section 4.1 [RFC
168	   3551]). For all other packets, the marker bit SHALL be set to zero
169	   (M=0).

171	   The assignment of an RTP payload type for the format defined in this
172	   memo is outside the scope of this document. The RTP profiles in use
173	   currently mandate binding the payload type dynamically for this
174	   payload format. This is basically necessary because the payload type
175	   expresses the configuration of the payload itself, i.e. basic or
176	   interleaved mode, and the number of channels carried.

178	   The remaining RTP header fields are used as specified in [RFC 3550].

180	3.2. RTP Payload Structure

182	   The IP-MR payload composed of two payloads, one for current (speech)
183	   speech and one for redundancy. Both of payloads are represented in a
184	   form of: Header, Table of contents (TOC) and Data. Redundancy payload
185	   carries data for preceding and pre-preceding packets.

187	     +--------+-----+----------------------+- - - - +- -  +- - - - - +
188	     | Header | TOC | Data                 | Header | TOC | Data     |
189	     +--------+-----+----------------------+- - - - +- -  +- - - - - +
190	     |<- Speech -------------------------->|<- Redundancy (opt) ---->|

192	3.3. Speech Payload Header

194	   This header carries parameters which are common for all frames in the
195	   packet:

197	                           0                   1
198	                           0 1 2 3 4 5 6 7 8 9 0 1
199	                          +-+-+-+-+-+-+-+-+-+-+-+-+
200	                          |T| CR  | BR  |D|A|GR |R|
201	                          +-+-+-+-+-+-+-+-+-+-+-+-+

203	      o T (1 bit): Reserved. MUST be always set to 0. Receiver SHOULD
204	      discard packet if 'T' bit is not equal to 0.

206	      o CR (3 bits): Coding rate index - top enchantment layer
207	      available. The CR value 7 (NO_DATA) indicates that there is no
208	      speech data (and speech TOC accordingly) in the payload. This MAY
209	      be used to transmit redundancy data only.

211	      o BR (3 bits): Base rate index - base layer bitrate. Speech
212	      payload can be scaled to any rate index between BR and CR. Packets
213	      with BR = 6 or BR > CR MUST be discarded. Redundancy data is also
214	      considered as having a base rate of BR.

216	      o D (1 bit): Reserved. MUST be always set to 1. Receiver MAY
217	      discard packet if 'D' bit is zero.

219	      o A (1 bit): Byte-alignment. The value of 1 specifies that padding
220	      bits were added to enable each compressed frame (3.5) starts with
221	      the byte (8 bit) boundary. The value of 0 specifies unaligned
222	      frames. Note, speech payload is always padded to byte boundary
223	      independently on  'A' bit value.

225	      o GR (2 bits): Number of frames in packet (grouping size). Actual
226	      grouping size is GR + 1, thus maximum grouping supported is 4.

228	      o R (1 bit): Redundancy presence. Value of 1 indicates redundancy
229	      payload presence.

231	   Note, the values of 'T' and 'D' bits are fixed, any other values are
232	   not allowed by specification.  Note, the values of padding bit is not
233	   specified.

235	   The following table defines mapping between rate index and rate
236	   value:

238	                    +------------+--------------+
239	                    | rate index | avg. bitrate |
240	                    +------------+--------------+
241	                    |      0     |   7.7 kbps   |
242	                    |      1     |   9.8 kbps   |
243	                    |      2     |  14.3 kbps   |
244	                    |      3     |  20.8 kbps   |
245	                    |      4     |  27.9 kbps   |
246	                    |      5     |  34.2 kbps   |
247	                    |      6     |  (reserved)  |
248	                    |      7     |   NO_DATA    |
249	                    +------------+--------------+

251	   The value of 6 is reserved. If receiving this value the packet MUST
252	   be discarded.

254	3.4. Speech Payload Table of Contents

256	   The speech TOC is a bit mask indicating the presence of each frame in
257	   the packet. TOC is only available if 'CR' value is not equal to 7
258	   (NO_DATA).

260	                               0 1 2 3
261	                              +-+-+-+-+
262	                              |E|E|E|E|
263	                              +-+-+-+-+
264	                              |<----->| <-- #(GR+1)

266	      o E (1 bit): Frame existence indicator. The value of 0 indicates
267	      speech data does not present for corresponding frame. IP-MR
268	      encoder sets E flag to 0 for the periods of silence in DTX mode.
269	      Application MUST set this bit to 0 if the frame is known to be
270	      damaged.

272	3.5. Speech Payload Data

274	   Speech data contains (GR+1) compressed IP-MR frames (20ms of data).
275	   Compressed frame have zero length if corresponding TOC flag is zero.

277	   The beginning of each compressed frame is aligned if 'A' bit is
278	   nonzero, while the end of speech payload is always aligned to a byte
279	   (8 bit) boundary:

281	   +- - -+------------+------------+------------+------------+
282	   | TOC | Frame1     | Frame2     | Frame3     | Frame4     |
283	   +- - -+------------+------------+------------+------------+   ALWAYS
284	         |<- aligned  |<- aligned  |<- aligned  |<- aligned  |<- ALIGNED

286	   Marked regions MUST be aligned (padded) only if 'A' bit is set to '1'.

288	   The compressed frame structure is the following:

290	   |<---- sensitive classes ------>|<----- enchantment layers --------->|
291	   +-------------------------------+----+-----+------+- - - - - +-------+
292	   | L1 (Base Layer)               | L2 | L3  | L4   |          | LN    |
293	   +-------------------------------+----+-----+------+- - - - - +-------+
294	   |<- A --->|<- B ->| ... |<- F ->|                                    |
295	   |<- BR rate ------------------->|                                    |
296	   |<- CR rate -------------------------------------------------------->|

298	   The Annex A of this document provides helper routine written in "C"
299	   which MUST be used to extract sensitivity classes and enchantment
300	   layers bounds from the compressed frame data.

302	3.6. Redundancy Payload Header

304	   The redundancy payload presence is signaled by R bit of speech
305	   payload header. Redundancy header composed of two fields of 3 bits
306	   each:

308	                               0 1 2 3 4 5
309	                              +-+-+-+-+-+-+
310	                              | CL1 | CL2 |
311	                              +-+-+-+-+-+-+

313	   Both of 'CL1' and 'CL2' fields specify the sensitivity classes
314	   available for preceding and pre-preceding packets correspondingly.

316	                    +-------+--------------------+
317	                    |  CL   | Redundancy classes |
318	                    |       |      available     |
319	                    +-------+--------------------+
320	                    |   0   |       NONE         |
321	                    |   1   |        A           |
322	                    |   2   |        A-B         |
323	                    |   3   |        A-C         |
324	                    |   4   |        A-D         |
325	                    |   5   |        A-E         |
326	                    |   6   |        A-F         |
327	                    |   7   |    (reserved)      |
328	                    +-------+--------------------+

330	   Receiver can reconstruct base layer of preceding packets completely
331	   (CL=6) or partially (0| pre-preceding payload #(GR+1)
349	                |<----->| preceding payload #(GR+1)

351	   o E (1 bit): Redundancy frame existence indicator. The value of 0
352	   indicates redundancy data does not present for corresponding frame.

354	3.8. Redundancy Payload Data

356	   IP-MR defines 6 classes ('A'-'F') of sensitivity to bit errors. Any
357	   damage of class 'A' bits cause significant reconstruction artifacts
358	   while the lost in class 'F' may be even not perceived by the
359	   listener. Note, only base layer in a bitstream is represented as a
360	   set of classes. Together, the set of sensitivity classes approach and
361	   redundancy allows IP-MR duplicate frames through the packets to
362	   improve robustness against packet loss.

364	   Redundancy data carries a number of sensitivity classes for preceding
365	   and pre-preceding packets as indicated by 'CL1' and 'CL2' fields of
366	   redundancy header. The sensitivity classes data is available
367	   individually for each frame only if corresponding 'E' bit of
368	   redundancy TOC is nonzero:

370	   +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
371	   |A-C|A-B|1000|1001|cl_A1|cl_B1|cl_C1|cl_A1|cl_B1|cl_A4|cl_B4|
372	   +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
373	   |<- CL >|<- TOC ->|<- preceding --->|<- pre-preceding ----->|

375	   Redundancy data only available if base (BR) and coding (CR) rates of
376	   preceding and pre-preceding packets are the same as for the current
377	   packet.

379	   Receiver MAY use redundancy data to compensate packet loss, note this
380	   case the 'CL' field MUST be also passed to decoder. Helper routine
381	   provided in Annex A MUST be used to extract sensitivity classes
382	   length for each frame. The following pseudo code describes the
383	   sequence of operations:

385	      int sensitivityBits[numOfRedundancyFrames][6];
386	      int redundancyBits [numOfRedundancyFrames];
387	      for(i = 0 ; i < numOfRedundancyFrames; i++) {
388	          GetFrameInfo(CR, BR, pRedundancyPayloadData, dummy,
389	                       sensitivityBits[i], dummy);
390	          redundancyBits[i] = 0;
391	          for(j = 0; j < CL[i]; j++ ) {
392	               redundancyBits[i] += sensitivityBits[i][j];
393	          }
394	          flushBits(pRedundancyPayloadData, redundancyBits[i]);
395	      }

397	4. Payload Examples

399	   This section provides detailed examples of IP-MR payload format.

401	4.1. Payload Carrying a Single Frame

403	   The following diagram shows typical IP-MR payload carrying a one
404	   (GR=0) non-aligned (A=0) speech frame without redundancy (R=0). The
405	   base layer is coded at 7.8 kbps (BR=0) while the coding rate is 9.7
406	   kbps (CR=1). The 'E' bit value of 1 signals that compressed frame
407	   bits s(0) - s(193) are present. There is a padding bit 'P' to
408	   maintain speech payload size alignment.

410	       0                   1                   2                   3
411	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
412	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
413	      |0|CR=1 |BR=0 |1|0|0 0|0|1|s(0)                                 |
414	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
415	      |                                                               |
416	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
417	      |                                                               |
418	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
419	      |                                                               |
420	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421	      |                                                               |
422	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423	      |                                                               |
424	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
425	      |                       s(193)|P|
426	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

428	4.2. Payload Carrying Multiple Frames with Redundancy

430	   The following diagram shows a payload carrying 3 (GR=2) aligned (A=1)
431	   speech frames with redundancy (R=1). The TOC value of '101' indicates
432	   speech data presents for a first (bits sp1(0)-sp1(92)) and third
433	   frames (bits sp3(0)-sp3(171)). There is no enchantment layers because
434	   of base and coding rates are equal (BR=CR=0). Padding bit 'P' is
435	   inserted to maintain necessary alignment.

437	   The redundancy payload presents for both preceding and pre-preceding
438	   payloads (CL1 = A-B, CL2=A), but redundancy data only available for a
439	   5 (TOC='111011') of 6 (2*(GR+1)) frames. There are redundancy data of
440	   20, 39 and 35 bits for each three frames of preceding packet and 15
441	   and 19 bits for two frames of pre-preceding packet.

443	       0                   1                   2                   3
444	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
445	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
446	      |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0)                         |
447	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
448	      |                                                               |
449	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
450	      |                                                               |
451	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	      |                  sp1(92)|P|P|P|sp3(0)                         |
453	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
454	      |                                                               |
455	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
456	      |                                                               |
457	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
458	      |                                                               |
459	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
460	      |                                                               |
461	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
462	      |                                               sp3(171)|P|P|P|P|
463	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
464	      |CL1=2|CL2=1|1 1 1|0 1 1|red1_1_AB(0)              red1_1_AB(19)|
465	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
466	      |red1_2_AB(0)                                                   |
467	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
468	      |red1_2_AB(38)|red1_3_AB(0)                                     |
469	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
470	      |      red1_3_AB(34)|red2_2_A(0)      red2_2_A(14)|red2_3_A(0)  |
471	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
472	      |           red2_3_A(18)|P|P|P|P|
473	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

475	5. Congestion Control The general congestion control considerations for
476	   transporting RTP data applicable to IP-MR speech over RTP (see RTP
477	   [RFC 3550] and any applicable RTP profile like AVP [RFC 3551]).
478	   However, the multi-rate capability of IP-MR speech coding provides a
479	   mechanism that may help to control congestion, since the bandwidth
480	   demand can be adjusted by selecting a different encoding mode.

482	   The number of frames encapsulated in each RTP payload highly
483	   influences the overall bandwidth of the RTP stream due to header
484	   overhead constraints. Packetizing more frames in each RTP payload can
485	   reduce the number of packets sent and hence the overhead from
486	   IP/UDP/RTP headers, at the expense of increased delay.

488	   Due to scalability nature of IP_MR codec the transmission rate can be
489	   reduced at any transport stage to fit channel bandwidth. The minimal
490	   rate is specified by BR field of payload header and can be is low as
491	   7.7 kbps. It is up to application to keep balance between coding
492	   quality (high BR) and bitstream scalability (small BR). Because of
493	   coding quality depends rather on coding rate(CR) than base rate (BR),
494	   it is not recommended to use high BR values for real-time
495	   communications.

497	   Application MAY utilize bitstream redundancy to combat packet loss.
498	   But the gateway is free to chose any option to reduce transmission
499	   rate - coding layer or redundancy bits can be dropped. Due to this
500	   fact it is not RECOMMENDED application to increase total bitrate when
501	   adding redundancy in a response to packet loss.

503	6. Security Considerations

505	   RTP packets using the payload format defined in this specification
506	   are subject to the security considerations discussed in the RTP
507	   specification [RFC3550] and in any applicable RTP profile. As this
508	   format transports encoded audio, the main security issues include
509	   confidentiality, integrity protection, and data origin authentication
510	   of the audio itself.

512	   The payload format itself does not have any built-in security
513	   mechanisms.  Any suitable external mechanisms, such as SRTP [RFC-
514	   3711], MAY be used.

516	   This payload format does not exhibit any significant non-uniformity
517	   in the receiver side computational complexity for packet processing
518	   and thus is unlikely to pose a denial-of-service threat due to the
519	   receipt of pathological data.

521	7. Payload Format Parameters

523	   This section describes the media types and names associated with this
524	   payload format.  Note, the IP-MR bitstream was frozen starting from
525	   internal release version of 2.5. Currently 'IP-MR' and 'IP-MR v2.5'
526	   terms are synonyms.

528	7.1. Media Type Registration

530	   Media Type name:     audio

532	   Media Subtype name:  ip-mr_v2.5

534	   Required parameters: none

536	   Optional parameters:
537	      These parameters apply to RTP transfer only.

539	      ptime: The media packet length in in milliseconds. Allowed values
540	      are: 20, 40, 60 and 80.

542	   Encoding considerations:
543	      This media type is framed binary data (see RFC4288, Section 4.8).

545	   Security considerations:
546	      See section 6 of RFC XXXX (RFC editor please replace with this RFC
547	      number).

549	   Interoperability considerations:
550	      none

552	   Published specification:
553	      RFC XXXX (RFC editor please replace with this RFC number)

555	   Applications that use this media type:
556	      Real-time audio applications like voice over IP and
557	      teleconference, and multi-media streaming.

559	   Additional information:
560	      none

562	   Person & email address to contact for further information:
563	      Dmitry Yudin 

565	   Intended usage:
566	      COMMON

568	   Restrictions on usage:
569	      This media type depends on RTP framing, and hence is only defined
570	      fortransfer via RTP [RFC 3550].

572	   Authors:
573	      Sergey Ikonin  Dmitry Yudin
574	      

576	   Change controller:
577	      IETF Audio/Video Transport working group delegated from the IESG.

579	7.2. Mapping Media Type Parameters into SDP

581	   The information carried in the media type specification has a
582	   specific mapping to fields in the Session Description Protocol (SDP)
583	   [RFC 4566], which is commonly used to describe RTP sessions. When SDP
584	   is used to specify sessions employing the IP-MR codec, the mapping is
585	   as follows:
586	      o The media type ("audio") goes in SDP "m=" as the media name.

588	      o The media subtype (payload format name) goes in SDP "a=rtpmap"
589	      as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000.

591	      o The parameter "ptime" goes in the SDP "a=ptime" attributes.

593	   Any remaining parameters go in the SDP "a=fmtp" attribute by copying
594	   them directly from the media type parameter string as a semicolon-
595	   separated list of parameter=value pairs.

597	   Note that the payload format (encoding) names are commonly shown in
598	   upper case. Media subtypes are commonly shown in lower case. These
599	   names are case-insensitive in both places.

601	8. IANA Considerations

603	   One media type has been defined and needs registration in the media
604	   types registry.

606	9. Normative References

608	   [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
609	              Requirement Levels", BCP 14, RFC 2119, March 1997.

611	   [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
612	              Jacobson, "RTP: A Transport Protocol for Real-Time
613	              Applications", STD 64, RFC 3550, July 2003.

615	   [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
616	              Video Conferences with Minimal Control", STD 65, RFC 3551,
617	              July 2003.

619	   [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
620	              Description Protocol", RFC 4566, July 2006.

622	   [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E.,
623	              Norrman, K., "The Secure Real-Time Transport Protocol
624	              (SRTP)", RFC 3711, March 2004.

626	   [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
627	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

629	   [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the
630	              Internet Protocol", RFC 4301, December 2005.

632	10. Disclaimer

634	   This document may contain material from IETF Documents or IETF
635	   Contributions published or made publicly available before November
636	   10, 2008. The person(s) controlling the copyright in some of this
637	   material may not have granted the IETF Trust the right to allow
638	   modifications of such material outside the IETF Standards Process.
639	   Without obtaining an adequate license from the person(s) controlling
640	   the copyright in such materials, this document may not be modified
641	   outside the IETF Standards Process, and derivative works of it may
642	   not be created outside the IETF Standards Process, except to format
643	   it for publication as an RFC or to translate it into languages other
644	   than English.

646	11. Legal Terms

648	   All IETF Documents and the information contained therein are provided
649	   on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
650	   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
651	   IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
652	   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
653	   WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
654	   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
655	   FOR A PARTICULAR PURPOSE.

657	   The IETF Trust takes no position regarding the validity or scope of
658	   any Intellectual Property Rights or other rights that might be
659	   claimed to pertain to the implementation or use of the technology
660	   described in any IETF Document or the extent to which any license
661	   under such rights might or might not be available; nor does it
662	   represent that it has made any independent effort to identify any
663	   such rights.

665	   Copies of Intellectual Property disclosures made to the IETF
666	   Secretariat and any assurances of licenses to be made available, or
667	   the result of an attempt made to obtain a general license or
668	   permission for the use of such proprietary rights by implementers or
669	   users of this specification can be obtained from the IETF on-line IPR
670	   repository at http://www.ietf.org/ipr.

672	   The IETF invites any interested party to bring to its attention any
673	   copyrights, patents or patent applications, or other proprietary
674	   rights that may cover technology that may be required to implement
675	   any standard or specification contained in an IETF Document. Please
676	   address the information to the IETF at ietf-ipr@ietf.org.

678	   The definitive version of an IETF Document is that published by, or
679	   under the auspices of, the IETF. Versions of IETF Documents that are
680	   published by third parties, including those that are translated into
681	   other languages, should not be considered to be definitive versions
682	   of IETF Documents. The definitive version of these Legal Provisions
683	   is that published by, or under the auspices of, the IETF. Versions of
684	   these Legal Provisions that are published by third parties, including
685	   those that are translated into other languages, should not be
686	   considered to be definitive versions of these Legal Provisions.

688	   For the avoidance of doubt, each Contributor to the IETF Standards
689	   Process licenses each Contribution that he or she makes as part of
690	   the IETF Standards Process to the IETF Trust pursuant to the
691	   provisions of RFC 5378. No language to the contrary, or terms,
692	   conditions or rights that differ from or are inconsistent with the
693	   rights and licenses granted under RFC 5378, shall have any effect and
694	   shall be null and void, whether published or posted by such
695	   Contributor, or included with or in such Contribution.

697	12. Authors' Addresses

699	   SPIRIT DSP
700	   Building 27, A. Solzhenitsyna street
701	   109004, Moscow, RUSSIA

703	   Tel: +7 495 661-2178
704	   Fax: +7 495 912-6786
705	   EMail: info@spiritdsp.com

707	APPENDIX A. RETRIEVING FRAME INFORMATION

709	   This appendix contains the c-code for implementation of frame parsing
710	   function. This function extracts information about coded frame
711	   including frame size, number of layers, size of each layer and size
712	   of perceptual sensitive classes.

714	A.1. get_frame_info.c

716	   /*
717	     Copyright (c) 2010
718	     IETF Trust and the persons identified as authors of the code.
719	     All rights reserved.

721	     Redistribution and use in source and binary forms, with or without
722	     modification, are permitted provided that the following conditions
723	     are met:
724	     - Redistributions of source code must retain the above copyright notice,
725	       this list of conditions and the following disclaimer.
726	     - Redistributions in binary form must reproduce the above copyright
727	       notice, this list of conditions and the following disclaimer in the
728	       documentation and/or other materials provided with the distribution.
729	     - Neither the name of Internet Society, IETF or IETF Trust, nor the names
730	       of specific contributors, may be used to endorse or promote products
731	       derived from this software without specific prior written permission.

733	   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
734	   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
735	   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
736	   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
737	   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
738	   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
739	   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
740	   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
741	   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
742	   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
743	   POSSIBILITY OF SUCH DAMAGE.

745	   */

747	   /******************************************************************

749	     get_frame_info.c

751	     Retrieving frame information for IP-MR Speech Codec

753	   ******************************************************************/
754	   #define RATES_NUM       6   // number of codec rates
755	   #define SENSE_CLASSES   6   // number of sensitivity classes (A..F)

757	   // frame types
758	   #define FT_SPEECH       0   // active speech
759	   #define FT_DTX_SID      1   // silence insertion descriptor

761	   // get specified bit from coded data
762	   int GetBit(unsigned char *data, int curBit)
763	   {
764	     return ((data[curBit >> 3] >> (curBit % 8)) & 1);
765	   }

767	   // retrieve frame information
768	   int GetFrameInfo(           // o: frame size in bits
769	     short rate,               // i: encoding rate (0..5)
770	     short base_rate,          // i: base (core) layer rate,
771	                               //    if base_rate > rate, then assumed
772	                               //    that base_rate = rate.
773	     unsigned char *pCoded,    // i: coded bit frame
774	     short pLayerBits          // o: number of bits in layers
775	         [RATES_NUM],
776	     short pSenseBits          // o: number of bits in sensitivity classes
777	         [SENSE_CLASSES],
778	     short *nLayers            // o: number of layers
779	   )
780	   {
781	     static const short Bits_1[4]    = {0, 9, 9, 15};
782	     static const short Bits_2[16]   = { 43,50,36,31,46,48,40,44,47,43,44,
783	                                         45,43,44,47,36};
784	     static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31},
785	                                        {25,  0, 23, 32, 36, 31},};

787	     int FrType;
788	     int i,nBits;

790	     if (rate < 0 || rate > 5) {
791	       return 0; // incorrect stream
792	     }

794	     for(i = 0; i < SENSE_CLASSES; i++) {
795	       pSenseBits[i] = 0;
796	     }

798	     nBits = 0;
799	     // extract frame type bit if required
800	     FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID;
801	     {
802	       int cw_0;
803	       int b[14];

805	       // extract meaning bits
806	       for(i = 0 ; i < 14; i++) {
807	           b[i] = GetBit(pCoded, nBits++);
808	       }

810	       // parse
811	       if(FrType == FT_DTX_SID) {
812	         cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3);
813	         rate = 0;
814	         pSenseBits[0] = 10 + Bits_2[cw_0];
815	       } else {

817	         int i, idx;
818	         int nFlag_1, nFlag_2, cw_1, cw_2;

820	         nFlag_1 = b[0] + b[2] + b[4] + b[6];
821	         cw_1 = (cw_1 << 1) | b[0];
822	         cw_1 = (cw_1 << 1) | b[2];
823	         cw_1 = (cw_1 << 1) | b[4];
824	         cw_1 = (cw_1 << 1) | b[6];

826	         nFlag_2 = b[1] + b[3] + b[5] + b[7];
827	         cw_2 = (cw_2 << 1) | b[1];
828	         cw_2 = (cw_2 << 1) | b[3];
829	         cw_2 = (cw_2 << 1) | b[5];
830	         cw_2 = (cw_2 << 1) | b[7];

832	         cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3);
833	         if (base_rate < 0)    base_rate = 0;
834	         if (base_rate > rate) base_rate = rate;
835	         idx = base_rate == 0 ? 0 : 1;

837	         pSenseBits[0] = 15+Bits_2[cw_0];
838	         pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3];
839	         pSenseBits[2] = nFlag_1*5;
840	         pSenseBits[3] = nFlag_2*30;
841	         pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]);

843	         for (i = 1; i < rate+1; i++) {
844	           pLayerBits[i] = 4*(Bits_3[idx][i]);
845	         }
846	       }

848	       pLayerBits[0] = 0;
849	       for (i = 0; i < SENSE_CLASSES; i++) {
850	           pLayerBits[0] += pSenseBits[i];
851	       }

853	       *nLayers = rate+1;
854	     }

856	     {
857	       // count total frame size
858	       int payloadBitCount = 0;
859	       for (i = 0; i < *nLayers; i++) {
860	         payloadBitCount += pLayerBits[i];
861	       }
862	       return payloadBitCount;
863	     }
864	   }