idnits 2.17.1 

draft-ietf-avt-rtp-ipmr-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 28, 2009) is 5317 days in the past.  Is
     this intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '4' on line 797

  -- Looks like a reference, but probably isn't: '16' on line 751

  -- Looks like a reference, but probably isn't: '2' on line 813

  -- Looks like a reference, but probably isn't: '6' on line 798

  -- Looks like a reference, but probably isn't: '14' on line 777

  -- Looks like a reference, but probably isn't: '0' on line 824

  -- Looks like a reference, but probably isn't: '1' on line 812

  -- Looks like a reference, but probably isn't: '3' on line 814

  -- Looks like a reference, but probably isn't: '5' on line 815

  -- Looks like a reference, but probably isn't: '7' on line 804

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport Working Group                            S. Ikonin
2	Internet Draft                                                SPIRIT DSP
3	Intended status: Informational                        September 28, 2009

5	RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-09.txt

7	Status of this Memo

9	This Internet-Draft is submitted to IETF in full conformance with the
10	provisions of BCP 78 and BCP 79.

12	Copyright (c) 2009 IETF Trust and the persons identified as the document
13	authors. All rights reserved.

15	This document is subject to BCP 78 and the IETF Trust's Legal Provisions
16	Relating to IETF Documents in effect on the date of publication of this
17	document (http://trustee.ietf.org/license-info). Please review these
18	documents carefully, as they describe your rights and restrictions with
19	respect to this document.

21	The source codes included in this document are provided under BSD
22	license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf).

24	Internet-Drafts are working documents of the Internet Engineering Task
25	Force (IETF), its areas, and its working groups. Note that other groups
26	may also distribute working documents as Internet-Drafts.

28	Internet-Drafts are draft documents valid for a maximum of six months
29	and may be updated, replaced, or obsoleted by other documents at any
30	time. It is inappropriate to use Internet-Drafts as reference material
31	or to cite them other than as "work in progress."

33	The list of current Internet-Drafts can be accessed at
34	http://www.ietf.org/1id-abstracts.html

36	The list of Internet-Draft Shadow Directories can be accessed at
37	http://www.ietf.org/shadow.html

39	This Internet-Draft will expire on March 28, 2010.

41	Abstract

43	This document specifies the payload format for packetization of SPIRIT
44	IP-MR encoded speech signals into the Real-time Transport Protocol
45	(RTP). The payload format supports transmission of multiple frames per
46	payload and introduced redundancy for robustness against packet loss.

48	Table of Contents

50	 1. Introduction......................................................3
51	 2. IP-MR Codec Description...........................................3
52	 3. Payload Format....................................................4
53	    3.1. RTP Header Usage.............................................4
54	    3.2. Payload Format Structure.....................................5
55	    3.3. Payload Header...............................................5
56	    3.4. Speech Table of Contents.....................................6
57	    3.5. Speech Data..................................................7
58	    3.6. Redundancy Header............................................7
59	    3.7. Redundancy Table of Contents.................................8
60	    3.8. Redundancy Data..............................................9
61	 4. Payload Examples..................................................9
62	    4.1. Payload Carrying a Single Frame..............................9
63	    4.2. Payload Carrying Multiple Frames with Redundancy............10
64	 5. Media Type Registration..........................................11
65	    5.1. Registration of media subtype audio/ip-mr_v2.5..............11
66	    5.2. Mapping Media Type Parameters into SDP......................12
67	 6. Security Considerations..........................................13
68	 7. Congestion Control...............................................13
69	 8. IANA Considerations..............................................14
70	 9. Normative References.............................................14
71	 10. Author(s) Information...........................................15
72	 11. Disclaimer......................................................15
73	 12. Legal Terms.....................................................15
74	 APPENDIX A. RETRIEVING FRAME INFORMATION............................17
75	 A.1. get_frame_info.c...............................................17
76	 Authors' Addresses..................................................19

78	1. Introduction

80	This document specifies the payload format for packetization of SPIRIT
81	IP-MR encoded speech signals into the Real-time Transport Protocol
82	(RTP). The payload format supports transmission of multiple frames per
83	payload and introduced redundancy for robustness against packet loss.

85	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
86	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
87	document are to be interpreted as described in RFC 2119 [RFC 2119].

89	2. IP-MR Codec Description

91	The IP-MR codec is scalable adaptive multi-rate wideband speech codec
92	designed by SPIRIT for use in IP based networks. These codec is suitable
93	for real time communications such as telephony and videoconferencing.

95	The codec operates on 20 ms frames at 16 kHz sampling rate and has an
96	algorithmic delay of 25ms.

98	The IP-MR supports six wide band speech coding modes with respective bit
99	rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be
100	changed at any 20 ms frame boundary making possible to dynamically
101	adjust the speech encoding rate during a session to adapt to the varying
102	transmission conditions.

104	The coded frame consists of multiple coding layers - base (or core)
105	layer and several enhancement layers which are coded independently.
106	Only the core layer is mandatory to decode understandable speech and
107	upper layers provide quality enhancement. These enhancement layers
108	may be omitted and remaining base layer can be meaningfully decoded
109	without artifacts. This makes the bit stream scalable and allows
110	to reduce bit rate during transmission without re-encoding.

112	This memo specifies an optional form of redundancy coding within RTP
113	for protection against packet loss. It is based on commonly known
114	scheme when previously transmitted frames are aggregated together
115	with new ones. Each frame is retransmitted once in the following
116	RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech
117	frames, and p(n-1)...p(n+4) is a sequence of payload packets:

119	   --+--------+--------+--------+--------+--------+--------+--------+--
120	     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
121	   --+--------+--------+--------+--------+--------+--------+--------+--

123	      <---- p(n-1) ---->
124	               <----- p(n) ----->
125	                        <---- p(n+1) ---->
126	                                 <---- p(n+2) ---->
127	                                          <---- p(n+3) ---->
128	                                                   <---- p(n+4) ---->

130	But because of the scalable nature of IP-MR codec there is no need to
131	duplicate the whole previous frame - only the core layer may be
132	retransmitted. This reduces redundancy overhead while keeping
133	efficiency. Moreover, the speech bits encoded in core layer are divided
134	on six classes (from A to F) of perceptual sensitivity to errors. Using
135	these classes as introduced redundancy make possible to adjust trade-off
136	between overhead and robustness against packet loss.

138	The mechanism described does not really require signaling at the session
139	setup. The sender is responsible for selecting an appropriate amount of
140	redundancy based on feedback about the channel conditions.

142	The main codec characteristics can be summarized as follows:

144	    o Wideband, 16 kHz, speech codec

146	    o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps

148	    o Bit rate scalable

150	    o Variable bit rate changing in accordance with actual speech
151	      content

153	    o Discontinuous Transmission (DTX), silence suppression and
154	      comfort noise generation

156	    o In-band redundancy scheme for protection against packet loss

158	3. Payload Format

160	The main purpose of the payload design for IP-MR is to maximize the
161	potential of the codec with as minimal overhead as possible. The payload
162	format allows changing parameters of the codec  (such as bit rate,
163	level of scalability, DTX and redundancy mode) without re-negotiation
164	at any packet boundary. This make possible dynamically adjust streaming
165	parameters in accordance to changing network conditions. The payload
166	format also supports aggregation of multiple consecutive frames
167	(up to 4) in a payload. That allows controlling trade-off between
168	delay and header overhead.

170	3.1. RTP Header Usage

172	The RTP timestamp corresponds to the sampling instant of the first
173	sample encoded for the first frame-block in the packet. The timestamp
174	clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms,
175	corresponding to 320 samples at 16 kHz. Thus the timestamp is increased
176	by 320 for each consecutive frame. The timestamp is also used to recover
177	the correct decoding order of the frame-blocks.

179	The RTP header marker bit (M) SHALL be set to 1 whenever the first
180	frame-block carried in the packet is the first frame-block in a
181	talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]).
182	For all other packets, the marker bit SHALL be set to zero (M=0).

184	The assignment of an RTP payload type for the format defined in this
185	memo is outside the scope of this document. The RTP profiles in use
186	currently mandate binding the payload type dynamically for this payload
187	format. This is basically necessary because the payload type expresses
188	the configuration of the payload itself, i.e. basic or interleaved mode,
189	and the number of channels carried.

191	The remaining RTP header fields are used as specified in [RFC 3550].

193	3.2. Payload Format Structure

195	The IP-MR payload format consists of a payload header with general
196	information about packet, a speech table of contents (TOC), and speech
197	data. An optional redundancy section follows after speech data. The
198	redundancy section consists of redundancy header, redundancy TOC and
199	redundancy data payload.

201	The following diagram shows the standard payload format layout:

203	  +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - +
204	  | payload | speech | speech | redundancy | redundancy | redundancy |
205	  | header  | TOC    | data   | header     | TOC        | data       |
206	  +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - +

208	3.3. Payload Header

210	The payload header has the following format:

212	                           0                   1
213	                           0 1 2 3 4 5 6 7 8 9 0 1
214	                          +-+-+-+-+-+-+-+-+-+-+-+-+
215	                          |T| CR  | BR  |D|A|GR |R|
216	                          +-+-+-+-+-+-+-+-+-+-+-+-+

218	    o T (1 bit): Reserved compatibility with future extensions. SHOULD
219	      be set to 0.

221	    o CR (3 bits): coding rate of frame(s) in this packet, as per the
222	       following table:

224	                          +-------+--------------+
225	                          |  CR   | avg. bitrate |
226	                          +-------+--------------+
227	                          |   0   |   7.7 kbps   |
228	                          |   1   |   9.8 kbps   |
229	                          |   2   |  14.3 kbps   |
230	                          |   3   |  20.8 kbps   |
231	                          |   4   |  27.9 kbps   |
232	                          |   5   |  34.2 kbps   |
233	                          |   6   |  (reserved)  |
234	                          |   7   |   NO_DATA    |
235	                          +-------+--------------+

237	The CR value 7 (NO_DATA) indicates that there is no speech data (and
238	speech TOC accordingly) in the payload. This MAY be used to transmit
239	redundancy data only. The value 6 is reserved. If receiving this value
240	the packet SHOULD be discarded.

242	    o BR (3 bits): base rate for core layer of frame(s) in this packet
243	      using the table for CR. Values in the range 0-5 indicate bitrates
244	      for core layer, same as for packet SHOULD be discarded. The base
245	      rate is the lowest rate for scalability, so speech payload can
246	      be scaled down not lower than BR value. If a received packet has
247	      BR > CR then during decoding it will be assumed that BR = CR.

249	    o D (1 bit): indicates if the DTX mode is active or not. This
250	      parameter is retained for backward interoperability with previous
251	      codec releases and required for payload parsing. The
252	      decoder implementation MUST always include DTX mode
253	      support and update internal states properly. The decoder cannot
254	      assume that DTX will be constantly inactive during a session.

256	    o A (1 bit): reserved. Must be always set to 1.

258	    o GR (2 bits): number of frames in packet (grouping size). Actual
259	      grouping size is GR + 1, thus maximum grouping supported is 4.

261	    o R (1 bit): redundancy presence bit. If R=1 then the packet
262	      contains redundancy information for lost packets recovery.
263	      In this case after speech data the redundancy section is present.

265	3.4. Speech Table of Contents

267	The speech TOC contains entries for each frame in packet (grouping size
268	in total). Each entry contains a single field:

270	                                   0
271	                                  +-+
272	                                  |E|
273	                                  +-+

275	    o E (1 bit): frame existence indicator. If set to 0, this indicates
276	      the corresponding frame is absent and the receiver should set
277	      special LOST_FRAME flag for decoder. This can be followed by the
278	      lost frame itself or by empty frames generated by the encoder
279	      during silence intervals in DTX mode.

281	Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC
282	is empty.

284	3.5. Speech Data

286	Speech data of a payload contains one or more speech frames or comfort
287	noise frames, as specified in the speech TOC of the payload.

289	Each speech frame represents 20 ms of speech encoded with the rate
290	indicated in the CR and base rate indicated in BR field of the payload
291	header.

293	The size of coded speech frame is variable due to the nature of codec.
294	The Encoder's algorithm decides what size of each frame is and returns
295	it after encoding. In order to save bandwidth the size is not placed
296	into payload obviously. The frame size can be determined by frame's
297	content using a special service function specified in Appendix A.
298	This function provides complete information about coded frame including
299	size, number of layers, size of each layer and size of perceptual
300	sensitive classes.

302	3.6. Redundancy Header

304	If a packet contains redundancy (R field of payload header is 1) the
305	speech data is followed by redundancy header:

307	                             0 1 2 3 4 5
308	                            +-+-+-+-+-+-+
309	                            | CL1 | CL2 |
310	                            +-+-+-+-+-+-+

312	Redundancy header consists of two fields. Each field contains class
313	specifier for amount of redundancy partly taken from the preceding
314	packet (CL1) and pre-preceding packet (CL2), e.g. distant from the
315	current packet by 1 and 2 packets accordingly. The values are listed
316	in the table below:

318	                     +-------+-------------------+
319	                     |  CL   | amount redundancy |
320	                     +-------+-------------------+
321	                     |   0   |       NONE        |
322	                     |   1   |      CLASS A      |
323	                     |   2   |      CLASS B      |
324	                     |   3   |      CLASS C      |
325	                     |   4   |      CLASS D      |
326	                     |   5   |      CLASS E      |
327	                     |   6   |      CLASS F      |
328	                     |   7   |     (reserved)    |
329	                     +-------+-------------------+

331	Each specifier takes 3 bits, thus the total redundancy header size is 6
332	bits.

334	These classes indicate subjective importance of bits from core layer.
335	Class A contains the bits most sensitive to errors and lost of these
336	bits results in a corrupted speech frame which should not be decoded
337	without applying packet loss concealment (PLC) procedure. Class B is
338	less sensitive than class A and so on to F. Sum of all bit classes
339	from A to F composes core layer.

341	Putting some part (classes of bits) from previous frame into current
342	packet makes possible to partially decode previous frame in case of
343	it's lost. Than more information is delivered than less speech quality
344	degradation will be. Flags CL1 and CL2 specify how many classes from
345	previous frames current packet contain. E.g. CL1=3 (class C), it means
346	that packet contains bits from classes A, B and C of previous frame.
347	If CL1=6 (class F) then whole core layer is included.

349	3.7. Redundancy Table of Contents

351	                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
352	                    | Pkt1 Entries| Pkt2 Entries|
353	                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

355	The redundancy TOC contains entries for redundancy frames from preceding
356	and pre-preceding packets. Each entry takes 1 bit like speech TOC entry
357	(3.3):

359	                                   0
360	                                  +-+
361	                                  |E|
362	                                  +-+

364	    o E (1 bit): frame existence indicator. If set to 0, this indicates
365	      the corresponding frame is absent.

367	    o For each preceding and pre-preceding packet the number of entries
368	      is equal to the grouping size of the current packet. E.g. maximum
369	      number of entries is 4*2 = 8.

371	    o If class specifier in the redundancy header is CL=0 (NO_DATA)
372	      then there is no entries for corresponding packet redundancy.

374	3.8. Redundancy Data

376	Redundancy data of a payload contains redundancy information for one or
377	more speech frames or comfort noise frames that may be lost during
378	transition, as specified in the redundancy TOC of the payload. Actually
379	redundancy is the most important part of preceding frames representing
380	20 ms of speech. This data MAY be used for partial reconstruction of
381	lost frames. The amount of available redundancy is specified by CL flag
382	in redundancy header section (3.5). This flag SHOULD be passed to
383	decoder. The size of redundancy frame is variable and can be obtained
384	using service function specified in Appendix A.

386	4. Payload Examples

388	A few examples to highlight the payload format follow.

390	4.1. Payload Carrying a Single Frame

392	The following diagram shows a standard IP-MR payload carrying a single
393	speech frame without redundancy:

395	   0                   1                   2                   3
396	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
397	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
398	  |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0)                                |
399	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
400	  |                                                               |
401	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
402	  |                                                               |
403	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
404	  |                                                               |
405	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
406	  |                                                               |
407	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
408	  |                                                               |
409	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
410	  |                      sp(193)|P|
411	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

413	In the payload the speech frame is not damaged at the IP origin (E=1),
414	the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and
415	the DTX mode is off. There is no byte alignment (A=0) and no redundancy
416	(R=0). The encoded speech bits - s(0) to s(193) - are placed immediately
417	after TOC. Finally, one zero bit is added at the end as padding to make
418	the payload byte aligned.

420	4.2. Payload Carrying Multiple Frames with Redundancy

422	The following diagram shows a payload that contains three frames, one of
423	them with no speech data. The coding rate is 7.7 kbps (CR=0), the base
424	rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are
425	byte aligned (A=1), so 1 zero bit is added at the end of the header.
426	Besides the speech frames the payload contains six redundancy frames
427	(three per each delayed packet).

429	The first speech frame consists of bits sp1(0) to sp1(92). After that 3
430	bits are added for byte alignment. The second frame does not contain any
431	speech information that is represented in the payload by its TOC entry.
432	The third frame consists of bits sp3(0) to sp3(171).

434	The redundancy header follows after speech data. The one-packet-delayed
435	redundancy contains class A+B bits (CL1=2), and two-packet-delayed
436	redundancy contains class A bits (Cl2=1). The one-packet-delayed
437	redundancy contains three frames with 20, 39 and 35 bits respectively.

439	The first frame of two-packet-delayed redundancy is absent, it is
440	represented in its TOC entry, and two other frames have sizes 15 and 19
441	bits.

443	Note that all speech frames are padded with zero bits for byte
444	alignment.

446	   0                   1                   2                   3
447	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
448	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
449	  |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0)                         |
450	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
451	  |                                                               |
452	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
453	  |                                                               |
454	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
455	  |                  sp1(92)|P|P|P|sp3(0)                         |
456	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
457	  |                                                               |
458	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
459	  |                                                               |
460	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
461	  |                                                               |
462	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
463	  |                                                               |
464	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
465	  |                                               sp3(171)|P|P|P|P|
466	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
467	  |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0)                    red1_1(19)|
468	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
469	  |red1_2(0)                                                      |
470	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
471	  |   red1_2(38)|red1_3(0)                                        |
472	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
473	  |         red1_3(34)|red2_2(0)          red2_2(14)|red2_3(0)    |
474	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
475	  |             red2_3(18)|P|P|P|P|
476	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

478	5. Media Type Registration

480	This section describes the media types and names associated with this
481	payload format.

483	5.1. Registration of media subtype audio/ip-mr_v2.5

485	Type name: audio

487	Subtype name: ip-mr_v2.5

489	Required parameters: none

491	Optional parameters:
492	* ptime: Gives the length of time in milliseconds represented by the
493	media in a packet. Allowed values are: 20, 40, 60 and 80.

495	Encoding considerations: This media type is framed binary data (see RFC
496	4288, Section 4.8).

498	Security considerations: See RFC 3550 [RFC 3550]

500	Interoperability considerations: none

502	Published specification: RFC XXXX

504	Applications that use this media type: Real-time audio applications like
505	voice over IP and teleconference, and multi-media streaming.

507	Additional information: none

509	Person & email address to contact for further information:
510	Yury Morzeev
511	morzeev@spiritdsp.com

513	Intended usage: COMMON

515	Restrictions on usage: This media type depends on RTP framing, and hence
516	is only defined for transfer via RTP [RFC 3550].

518	Authors:
519	Sergey Ikonin <info@spiritdsp.com>

521	Change controller: IETF Audio/Video Transport working group delegated
522	from the IESG.

524	5.2. Mapping Media Type Parameters into SDP

526	The information carried in the media type specification has a specific
527	mapping to fields in the Session Description Protocol (SDP) [RFC 4566],
528	which is commonly used to describe RTP sessions. When SDP is used to
529	specify sessions employing the IP-MR codec, the mapping is as follows:

531	    o The media type ("audio") goes in SDP "m=" as the media name.

533	    o The media subtype (payload format name) goes in SDP "a=rtpmap"
534	    as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000.

536	    o The parameter "ptime" goes in the SDP "a=ptime" attributes.

538	Any remaining parameters go in the SDP "a=fmtp" attribute by copying
539	them directly from the media type parameter string as a semicolon-
540	separated list of parameter=value pairs.

542	Note that the payload format (encoding) names are commonly shown in
543	upper case. Media subtypes are commonly shown in lower case. These
544	names are case-insensitive in both places.

546	6. Security Considerations

548	RTP packets using the payload format defined in this specification
549	are subject to the security considerations discussed in the RTP
550	specification [RFC 3550] and in any applicable RTP profile. The main
551	security considerations for the RTP packet carrying the RTP payload
552	format defined within this memo are confidentiality, integrity, and
553	source authenticity. Confidentiality is achieved by encryption of the
554	RTP payload. Integrity of the RTP packets is achieved through a suitable
555	cryptographic integrity protection mechanism. Such a cryptographic
556	system may also allow the authentication of the source of the payload.

558	A suitable security mechanism for this RTP payload format should
559	provide confidentiality, integrity protection, and at least source
560	authentication capable of determining if an RTP packet is from a
561	member of the RTP session.

563	Note that the appropriate mechanism to provide security to RTP and
564	payloads following this memo may vary. It is dependent on the
565	application, the transport, and the signaling protocol employed.
566	Therefore, a single mechanism is not sufficient, although if suitable,
567	usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is
568	recommended.  Other mechanisms that may be used are IPsec [RFC 4301]
569	and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other
570	alternatives may exist.

572	This payload format does not exhibit any significant non-uniformity in
573	the receiver side computational complexity for packet processing, and
574	thus is unlikely to pose a denial-of-service threat due to the receipt
575	of pathological data.

577	7. Congestion Control

579	The general congestion control considerations for transporting RTP data
580	apply; see RTP [RFC 3550] and any applicable RTP profile like AVP
581	[RFC 3551]. However, the multi-rate capability of IP-MR speech coding
582	provides a mechanism that may help to control congestion, since the
583	bandwidth demand can be adjusted by selecting a different encoding mode.

585	The number of frames encapsulated in each RTP payload highly
586	influences the overall bandwidth of the RTP stream due to header
587	overhead constraints. Packetizing more frames in each RTP payload
588	can reduce the number of packets sent and hence the overhead from
589	IP/UDP/RTP headers, at the expense of increased delay.

591	If in-band redundancy scheme is used to protect against packet loss,
592	the amount of introduced redundancy will need to be regulated so that
593	the use of redundancy itself does not cause a congestion problem. In
594	other words, a sender SHALL NOT increase the total bitrate when adding
595	redundancy in response to packet loss, and needs instead to adjust it
596	down in accordance to the congestion control algorithm being run. Thus,
597	when adding redundancy, the media bitrate will need to be reduced to
598	provide room for the redundancy.

600	8. IANA Considerations

602	One media type has been defined and needs registration in the media
603	types registry.

605	9. Normative References

607	  [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
608	             Requirement Levels", BCP 14, RFC 2119, March 1997.

610	  [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and
611	             V. Jacobson, "RTP: A Transport Protocol for Real-Time
612	             Applications", STD 64, RFC 3550, July 2003.

614	  [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio
615	             and Video Conferences with Minimal Control", STD 65,
616	             RFC 3551, July 2003.

618	  [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
619	             Description Protocol", RFC 4566, July 2006.

621	  [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman,
622	             K., "The Secure Real-Time Transport Protocol (SRTP)", RFC
623	             3711, March 2004.

625	  [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer
626	             Security (TLS) Protocol Version 1.2", RFC 5246,
627	             August 2008.

629	  [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the
630	             Internet Protocol", RFC 4301, December 2005.

632	10. Author(s) Information:

634	Sergey Ikonin
635	email: info@spiritdsp.com

637	Russia 109004
638	Building 27, A. Solzhenitsyna street
639	Tel: +7 495 661-2178
640	Fax: +7 495 912-6786

642	11. Disclaimer

644	This document may contain material from IETF Documents or IETF
645	Contributions published or made publicly available before November 10,
646	2008. The person(s) controlling the copyright in some of this material
647	may not have granted the IETF Trust the right to allow modifications of
648	such material outside the IETF Standards Process. Without obtaining an
649	adequate license from the person(s) controlling the copyright in such
650	materials, this document may not be modified outside the IETF Standards
651	Process, and derivative works of it may not be created outside the IETF
652	Standards Process, except to format it for publication as an RFC or to
653	translate it into languages other than English.

655	12. Legal Terms

657	All IETF Documents and the information contained therein are provided on
658	an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
659	OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
660	THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
661	IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
662	INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
663	WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

665	The IETF Trust takes no position regarding the validity or scope of any
666	Intellectual Property Rights or other rights that might be claimed to
667	pertain to the implementation or use of the technology described in any
668	IETF Document or the extent to which any license under such rights might
669	or might not be available; nor does it represent that it has made any
670	independent effort to identify any such rights.

672	Copies of Intellectual Property disclosures made to the IETF Secretariat
673	and any assurances of licenses to be made available, or the result of an
674	attempt made to obtain a general license or permission for the use of
675	such proprietary rights by implementers or users of this specification
676	can be obtained from the IETF on-line IPR repository at
677	http://www.ietf.org/ipr.

679	The IETF invites any interested party to bring to its attention any
680	copyrights, patents or patent applications, or other proprietary rights
681	that may cover technology that may be required to implement any standard
682	or specification contained in an IETF Document. Please address the
683	information to the IETF at ietf-ipr@ietf.org.

685	The definitive version of an IETF Document is that published by, or
686	under the auspices of, the IETF. Versions of IETF Documents that are
687	published by third parties, including those that are translated into
688	other languages, should not be considered to be definitive versions of
689	IETF Documents. The definitive version of these Legal Provisions is that
690	published by, or under the auspices of, the IETF. Versions of these
691	Legal Provisions that are published by third parties, including those
692	that are translated into other languages, should not be considered to be
693	definitive versions of these Legal Provisions.

695	For the avoidance of doubt, each Contributor to the IETF Standards
696	Process licenses each Contribution that he or she makes as part of the
697	IETF Standards Process to the IETF Trust pursuant to the provisions of
698	RFC 5378. No language to the contrary, or terms, conditions or rights
699	that differ from or are inconsistent with the rights and licenses
700	granted under RFC 5378, shall have any effect and shall be null and
701	void, whether published or posted by such Contributor, or included with
702	or in such Contribution.

704	APPENDIX A. RETRIEVING FRAME INFORMATION

706	This appendix contains the c-code for implementation of frame parsing
707	function. This function extracts information about coded frame including
708	frame size, number of layers, size of each layer and size of perceptual
709	sensitive classes.

711	A.1. get_frame_info.c

713	/******************************************************************

715	  get_frame_info.c

717	  Retrieving frame information for IP-MR Speech Codec

719	******************************************************************/

721	#define RATES_NUM       6   // number of codec rates
722	#define SENSE_CLASSES   6   // number of sensitivity classes (A..F)

724	// frame types
725	#define FT_DTX_SPEECH   0   // active speech in DTX mode
726	#define FT_DTX_SID      1   // silence insertion descriptor
727	#define FT_NO_DTX       2   // no DTX frame

729	// get specified bit from coded data
730	int GetBit(unsigned char *data, int curBit)
731	{
732	  return ((data[curBit >> 3] >> (curBit % 8)) & 1);
733	}

735	// retrieve frame information
736	int GetFrameInfo(           // o: frame size in bits
737	  short rate,               // i: encoding rate (0..5)
738	  short base_rate,          // i: base (core) layer rate,
739	                            //    if base_rate > rate, then assumed
740	                            //    that base_rate = rate.
741	  short allow_DTX,          // i: flag of DTX mode
742	  unsigned char *pCoded,    // i: coded bit frame
743	  short pLayerBits          // o: number of bits in layers
744	      [RATES_NUM],
745	  short pSenseBits          // o: number of bits in sensitivity classes
746	      [SENSE_CLASSES],
747	  short *nLayers            // o: number of layers
748	)
749	{
750	  static const short Bits_1[4]    = {0, 9, 9, 15};
751	  static const short Bits_2[16]   = { 43,50,36,31,46,48,40,44,47,43,44,
752	                                      45,43,44,47,36};

754	  static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31},
755	                                     {25,  0, 23, 32, 36, 31},};

757	  int FrType;
758	  int i,nBits;

760	  if (rate < 0 || rate > 5) {
761	    return 0; // incorrect stream
762	  }

764	  for(i = 0; i < SENSE_CLASSES; i++) {
765	    pSenseBits[i] = 0;
766	  }

768	  nBits = 0;
769	  // extract frame type bit if required
770	  if (allow_DTX) {
771	    FrType = GetBit(pCoded, nBits++) ? FT_DTX_SPEECH : FT_DTX_SID;
772	  } else {
773	    FrType = FT_NO_DTX;
774	  }
775	  {
776	    int cw_0;
777	    int b[14];

779	    // extract meaning bits
780	    for(i = 0 ; i < 14; i++) {
781	        b[i] = GetBit(pCoded, nBits++);
782	    }

784	    // parse
785	    if(FrType == FT_DTX_SID) {
786	      cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3);
787	      rate = 0;
788	      pSenseBits[0] = 10 + Bits_2[cw_0];
789	    } else {

791	      int i, idx;
792	      int nFlag_1, nFlag_2, cw_1, cw_2;

794	      nFlag_1 = b[0] + b[2] + b[4] + b[6];
795	      cw_1 = (cw_1 << 1) | b[0];
796	      cw_1 = (cw_1 << 1) | b[2];
797	      cw_1 = (cw_1 << 1) | b[4];
798	      cw_1 = (cw_1 << 1) | b[6];

800	      nFlag_2 = b[1] + b[3] + b[5] + b[7];
801	      cw_2 = (cw_2 << 1) | b[1];
802	      cw_2 = (cw_2 << 1) | b[3];
803	      cw_2 = (cw_2 << 1) | b[5];
804	      cw_2 = (cw_2 << 1) | b[7];

806	      cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3);
807	      if (base_rate < 0)    base_rate = 0;
808	      if (base_rate > rate) base_rate = rate;
809	      idx = base_rate == 0 ? 0 : 1;

811	      pSenseBits[0] = (FrType == FT_DTX_SPEECH ? 1:0)+14+Bits_2[cw_0];
812	      pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3];
813	      pSenseBits[2] = nFlag_1*5;
814	      pSenseBits[3] = nFlag_2*30;
815	      pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]);

817	      for (i = 1; i < rate+1; i++) {
818	        pLayerBits[i] = 4*(Bits_3[idx][i]);
819	      }
820	    }

822	    pLayerBits[0] = 0;
823	    for (i = 0; i < SENSE_CLASSES; i++) {
824	        pLayerBits[0] += pSenseBits[i];
825	    }

827	    *nLayers = rate+1;
828	  }

830	  {
831	    // count total frame size
832	    int payloadBitCount = 0;
833	    for (i = 0; i < *nLayers; i++) {
834	      payloadBitCount += pLayerBits[i];
835	    }
836	    return payloadBitCount;
837	  }
838	}

840	Authors' Addresses

842	   SPIRIT DSP
843	   Building 27, A. Solzhenitsyna street
844	   109004, Moscow, RUSSIA

846	   Tel: +7 495 661-2178
847	   Fax: +7 495 912-6786
848	   EMail: info@spiritdsp.com