idnits 2.17.1 

draft-ietf-avt-rtp-ipmr-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 17) being 87 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 2 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 02, 2010) is 5198 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '4' on line 826

  -- Looks like a reference, but probably isn't: '16' on line 783

  -- Looks like a reference, but probably isn't: '2' on line 842

  -- Looks like a reference, but probably isn't: '6' on line 827

  -- Looks like a reference, but probably isn't: '14' on line 806

  -- Looks like a reference, but probably isn't: '0' on line 853

  -- Looks like a reference, but probably isn't: '1' on line 841

  -- Looks like a reference, but probably isn't: '3' on line 843

  -- Looks like a reference, but probably isn't: '5' on line 844

  -- Looks like a reference, but probably isn't: '7' on line 833

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport Working Group                            S. Ikonin
2	Internet Draft                                                SPIRIT DSP
3	Intended status: Informational                          February 02, 2010

5	RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-11.txt

7	Status of this Memo

9	This Internet-Draft is submitted to IETF in full conformance with the
10	provisions of BCP 78 and BCP 79.

12	Copyright (c) 2010 IETF Trust and the persons identified as the document
13	authors. All rights reserved.

15	This document is subject to BCP 78 and the IETF Trust's Legal Provisions
16	Relating to IETF Documents (http://trustee.ietf.org/license-info)
17	in effect on the date of publication of this document.  Please
18	review these documents carefully, as they describe your rights and
19	restrictions with respect to this document.  Code Components
20	extracted from this document must include Simplified BSD License
21	text as described in Section 4.e of the Trust Legal Provisions and
22	are provided without warranty as described in the Simplified BSD
23	License.

25	The source codes included in this document are provided under BSD
26	license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf).

28	Internet-Drafts are working documents of the Internet Engineering Task
29	Force (IETF), its areas, and its working groups. Note that other groups
30	may also distribute working documents as Internet-Drafts.

32	Internet-Drafts are draft documents valid for a maximum of six months
33	and may be updated, replaced, or obsoleted by other documents at any
34	time. It is inappropriate to use Internet-Drafts as reference material
35	or to cite them other than as "work in progress."

37	The list of current Internet-Drafts can be accessed at
38	http://www.ietf.org/1id-abstracts.html

40	The list of Internet-Draft Shadow Directories can be accessed at
41	http://www.ietf.org/shadow.html

43	This Internet-Draft will expire on June 02, 2010.

45	Abstract

47	This document specifies the payload format for packetization of SPIRIT
48	IP-MR encoded speech signals into the Real-time Transport Protocol
49	(RTP). The payload format supports transmission of multiple frames per
50	payload and introduced redundancy for robustness against packet loss.

52	Table of Contents

54	 1. Introduction......................................................3
55	 2. IP-MR Codec Description...........................................3
56	 3. Payload Format....................................................4
57	    3.1. RTP Header Usage.............................................4
58	    3.2. Payload Format Structure.....................................5
59	    3.3. Payload Header...............................................5
60	    3.4. Speech Table of Contents.....................................6
61	    3.5. Speech Data..................................................7
62	    3.6. Redundancy Header............................................7
63	    3.7. Redundancy Table of Contents.................................8
64	    3.8. Redundancy Data..............................................9
65	 4. Payload Examples..................................................9
66	    4.1. Payload Carrying a Single Frame..............................9
67	    4.2. Payload Carrying Multiple Frames with Redundancy............10
68	 5. Media Type Registration..........................................11
69	    5.1. Registration of media subtype audio/ip-mr_v2.5..............11
70	    5.2. Mapping Media Type Parameters into SDP......................12
71	 6. Security Considerations..........................................13
72	 7. Congestion Control...............................................13
73	 8. IANA Considerations..............................................14
74	 9. Normative References.............................................14
75	 10. Author(s) Information...........................................15
76	 11. Disclaimer......................................................15
77	 12. Legal Terms.....................................................15
78	 APPENDIX A. RETRIEVING FRAME INFORMATION............................17
79	 A.1. get_frame_info.c...............................................17
80	 Authors' Addresses..................................................19

82	1. Introduction

84	This document specifies the payload format for packetization of SPIRIT
85	IP-MR encoded speech signals into the Real-time Transport Protocol
86	(RTP). The payload format supports transmission of multiple frames per
87	payload and introduced redundancy for robustness against packet loss.

89	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
90	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
91	document are to be interpreted as described in RFC 2119 [RFC 2119].

93	2. IP-MR Codec Description

95	The IP-MR codec is scalable adaptive multi-rate wideband speech codec
96	designed by SPIRIT for use in IP based networks. These codec is suitable
97	for real time communications such as telephony and videoconferencing.

99	The codec operates on 20 ms frames at 16 kHz sampling rate and has an
100	algorithmic delay of 25ms.

102	The IP-MR supports six wide band speech coding modes with respective bit
103	rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be
104	changed at any 20 ms frame boundary making possible to dynamically
105	adjust the speech encoding rate during a session to adapt to the varying
106	transmission conditions.

108	The coded frame consists of multiple coding layers - base (or core)
109	layer and several enhancement layers which are coded independently.
110	Only the core layer is mandatory to decode understandable speech and
111	upper layers provide quality enhancement. These enhancement layers
112	may be omitted and remaining base layer can be meaningfully decoded
113	without artifacts. This makes the bit stream scalable and allows
114	to reduce bit rate during transmission without re-encoding.

116	This memo specifies an optional form of redundancy coding within RTP
117	for protection against packet loss. It is based on commonly known
118	scheme when previously transmitted frames are aggregated together
119	with new ones. Each frame is retransmitted once in the following
120	RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech
121	frames, and p(n-1)...p(n+4) is a sequence of payload packets:

123	   --+--------+--------+--------+--------+--------+--------+--------+--
124	     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
125	   --+--------+--------+--------+--------+--------+--------+--------+--

127	      <---- p(n-1) ---->
128	               <----- p(n) ----->
129	                        <---- p(n+1) ---->
130	                                 <---- p(n+2) ---->
131	                                          <---- p(n+3) ---->
132	                                                   <---- p(n+4) ---->

134	But because of the scalable nature of IP-MR codec there is no need to
135	duplicate the whole previous frame - only the core layer may be
136	retransmitted. This reduces redundancy overhead while keeping
137	efficiency. Moreover, the speech bits encoded in core layer are divided
138	on six classes (from A to F) of perceptual sensitivity to errors. Using
139	these classes as introduced redundancy make possible to adjust trade-off
140	between overhead and robustness against packet loss.

142	The mechanism described does not really require signaling at the session
143	setup. The sender is responsible for selecting an appropriate amount of
144	redundancy based on feedback about the channel conditions.

146	The main codec characteristics can be summarized as follows:

148	    o Wideband, 16 kHz, speech codec

150	    o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps

152	    o Bit rate scalable

154	    o Variable bit rate changing in accordance with actual speech
155	      content

157	    o Discontinuous Transmission (DTX), silence suppression and
158	      comfort noise generation

160	    o In-band redundancy scheme for protection against packet loss

162	3. Payload Format

164	The main purpose of the payload design for IP-MR is to maximize the
165	potential of the codec with as minimal overhead as possible. The payload
166	format allows changing parameters of the codec  (such as bit rate,
167	level of scalability, DTX and redundancy mode) without re-negotiation
168	at any packet boundary. This make possible dynamically adjust streaming
169	parameters in accordance to changing network conditions. The payload
170	format also supports aggregation of multiple consecutive frames
171	(up to 4) in a payload. That allows controlling trade-off between
172	delay and header overhead.

174	3.1. RTP Header Usage

176	The RTP timestamp corresponds to the sampling instant of the first
177	sample encoded for the first frame-block in the packet. The timestamp
178	clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms,
179	corresponding to 320 samples at 16 kHz. Thus the timestamp is increased
180	by 320 for each consecutive frame. The timestamp is also used to recover
181	the correct decoding order of the frame-blocks.

183	The RTP header marker bit (M) SHALL be set to 1 whenever the first
184	frame-block carried in the packet is the first frame-block in a
185	talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]).
186	For all other packets, the marker bit SHALL be set to zero (M=0).

188	The assignment of an RTP payload type for the format defined in this
189	memo is outside the scope of this document. The RTP profiles in use
190	currently mandate binding the payload type dynamically for this payload
191	format. This is basically necessary because the payload type expresses
192	the configuration of the payload itself, i.e. basic or interleaved mode,
193	and the number of channels carried.

195	The remaining RTP header fields are used as specified in [RFC 3550].

197	3.2. Payload Format Structure

199	The IP-MR payload format consists of a payload header with general
200	information about packet, a speech table of contents (TOC), and speech
201	data. An optional redundancy section follows after speech data. The
202	redundancy section consists of redundancy header, redundancy TOC and
203	redundancy data payload.

205	The following diagram shows the standard payload format layout:

207	  +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - +
208	  | payload | speech | speech | redundancy | redundancy | redundancy |
209	  | header  | TOC    | data   | header     | TOC        | data       |
210	  +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - +

212	3.3. Payload Header

214	The payload header has the following format:

216	                           0                   1
217	                           0 1 2 3 4 5 6 7 8 9 0 1
218	                          +-+-+-+-+-+-+-+-+-+-+-+-+
219	                          |T| CR  | BR  |D|A|GR |R|
220	                          +-+-+-+-+-+-+-+-+-+-+-+-+

222	    o T (1 bit): Reserved compatibility with future extensions. MUST
223	      be set to 0.

225	    o CR (3 bits): coding rate of frame(s) in this packet, as per the
226	       following table:

228	                          +-------+--------------+
229	                          |  CR   | avg. bitrate |
230	                          +-------+--------------+
231	                          |   0   |   7.7 kbps   |
232	                          |   1   |   9.8 kbps   |
233	                          |   2   |  14.3 kbps   |
234	                          |   3   |  20.8 kbps   |
235	                          |   4   |  27.9 kbps   |
236	                          |   5   |  34.2 kbps   |
237	                          |   6   |  (reserved)  |
238	                          |   7   |   NO_DATA    |
239	                          +-------+--------------+

241	The CR value 7 (NO_DATA) indicates that there is no speech data (and
242	speech TOC accordingly) in the payload. This MAY be used to transmit
243	redundancy data only. The value 6 is reserved. If receiving this value
244	the packet MUST be discarded.

246	    o BR (3 bits): base rate for core layer of frame(s) in this packet
247	      using the table for CR. The base rate is the lowest rate for
248	      scalability, so speech payload can be scaled down not lower than BR
249	      value. Packets with BR = 6 or BR > CR MUST be discarded.

251	    o D (1 bit): reserved. Must be always set to 1.
252	      Previously, this bit indicated DTX mode availability, but in fact
253	      payload dublicates this information.

255	    o A (1 bit): reserved. Must be always set to 1.
256	      Previously, this bit indicated aligned mode, but this mode has
257	      never been used and was always set to 1.

259	    o GR (2 bits): number of frames in packet (grouping size). Actual
260	      grouping size is GR + 1, thus maximum grouping supported is 4.

262	    o R (1 bit): redundancy presence bit. If R=1 then the packet
263	      contains redundancy information for lost packets recovery.
264	      In this case after speech data the redundancy section is present.

266	3.4. Speech Table of Contents

268	The speech TOC contains entries for each frame in packet (grouping size
269	in total). Each entry contains a single field:

271	                                   0
272	                                  +-+
273	                                  |E|
274	                                  +-+

276	    o E (1 bit): frame existence indicator. If set to 0, this indicates
277	      the corresponding frame is absent and the receiver should set
278	      special LOST_FRAME flag for decoder. This can be followed by the
279	      lost frame itself or by empty frames generated by the encoder
280	      during silence intervals in DTX mode.

282	Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC
283	is empty.

285	3.5. Speech Data

287	Speech data of a payload contains one or more speech frames or comfort
288	noise frames, as specified in the speech TOC of the payload.

290	Each speech frame represents 20 ms of speech encoded with the rate
291	indicated in the CR and base rate indicated in BR field of the payload
292	header.

294	The size of coded speech frame is variable due to the nature of codec.
295	The Encoder's algorithm decides what size of each frame is and returns
296	it after encoding. In order to save bandwidth the size is not placed
297	into payload obviously. The frame size can be determined by frame's
298	content using a special service function specified in Appendix A.
299	This function provides complete information about coded frame including
300	size, number of layers, size of each layer and size of perceptual
301	sensitive classes.

303	3.6. Redundancy Header

305	If a packet contains redundancy (R field of payload header is 1) the
306	speech data is followed by redundancy header:

308	                             0 1 2 3 4 5
309	                            +-+-+-+-+-+-+
310	                            | CL1 | CL2 |
311	                            +-+-+-+-+-+-+

313	Redundancy header consists of two fields. Each field contains class
314	specifier for amount of redundancy partly taken from the preceding
315	packet (CL1) and pre-preceding packet (CL2), e.g. distant from the
316	current packet by 1 and 2 packets accordingly. The values are listed
317	in the table below:

319	                     +-------+-------------------+
320	                     |  CL   | amount redundancy |
321	                     +-------+-------------------+
322	                     |   0   |       NONE        |
323	                     |   1   |      CLASS A      |
324	                     |   2   |      CLASS B      |
325	                     |   3   |      CLASS C      |
326	                     |   4   |      CLASS D      |
327	                     |   5   |      CLASS E      |
328	                     |   6   |      CLASS F      |
329	                     |   7   |     (reserved)    |
330	                     +-------+-------------------+

332	Each specifier takes 3 bits, thus the total redundancy header size is 6
333	bits.

335	These classes indicate subjective importance of bits from core layer.
336	Class A contains the bits most sensitive to errors and lost of these
337	bits results in a corrupted speech frame which should not be decoded
338	without applying packet loss concealment (PLC) procedure. Class B is
339	less sensitive than class A and so on to F. Sum of all bit classes
340	from A to F composes core layer.

342	Putting some part (classes of bits) from previous frame into current
343	packet makes possible to partially decode previous frame in case of
344	it's lost. Than more information is delivered than less speech quality
345	degradation will be. Flags CL1 and CL2 specify how many classes from
346	previous frames current packet contain. E.g. CL1=3 (class C), it means
347	that packet contains bits from classes A, B and C of previous frame.
348	If CL1=6 (class F) then whole core layer is included.

350	3.7. Redundancy Table of Contents

352	                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
353	                    | Pkt1 Entries| Pkt2 Entries|
354	                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

356	The redundancy TOC contains entries for redundancy frames from preceding
357	and pre-preceding packets. Each entry takes 1 bit like speech TOC entry
358	(3.3):

360	                                   0
361	                                  +-+
362	                                  |E|
363	                                  +-+

365	    o E (1 bit): frame existence indicator. If set to 0, this indicates
366	      the corresponding frame is absent.

368	    o For each preceding and pre-preceding packet the number of entries
369	      is equal to the grouping size of the current packet. E.g. maximum
370	      number of entries is 4*2 = 8.

372	    o If class specifier in the redundancy header is CL=0 (NO_DATA)
373	      then there is no entries for corresponding packet redundancy.

375	3.8. Redundancy Data

377	Redundancy data of a payload contains redundancy information for one or
378	more speech frames or comfort noise frames that may be lost during
379	transition, as specified in the redundancy TOC of the payload. Actually
380	redundancy is the most important part of preceding frames representing
381	20 ms of speech. This data MAY be used for partial reconstruction of
382	lost frames. The amount of available redundancy is specified by CL flag
383	in redundancy header section (3.5). This flag SHOULD be passed to
384	decoder. The size of redundancy frame is variable and can be obtained
385	using service function specified in Appendix A.

387	4. Payload Examples

389	A few examples to highlight the payload format follow.

391	4.1. Payload Carrying a Single Frame

393	The following diagram shows a standard IP-MR payload carrying a single
394	speech frame without redundancy:

396	   0                   1                   2                   3
397	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
398	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399	  |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0)                                |
400	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401	  |                                                               |
402	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
403	  |                                                               |
404	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
405	  |                                                               |
406	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
407	  |                                                               |
408	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
409	  |                                                               |
410	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
411	  |                      sp(193)|P|
412	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

414	In the payload the speech frame is not damaged at the IP origin (E=1),
415	the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and
416	the DTX mode is off. There is no byte alignment (A=0) and no redundancy
417	(R=0). The encoded speech bits - s(0) to s(193) - are placed immediately
418	after TOC. Finally, one zero bit is added at the end as padding to make
419	the payload byte aligned.

421	4.2. Payload Carrying Multiple Frames with Redundancy

423	The following diagram shows a payload that contains three frames, one of
424	them with no speech data. The coding rate is 7.7 kbps (CR=0), the base
425	rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are
426	byte aligned (A=1), so 1 zero bit is added at the end of the header.
427	Besides the speech frames the payload contains six redundancy frames
428	(three per each delayed packet).

430	The first speech frame consists of bits sp1(0) to sp1(92). After that 3
431	bits are added for byte alignment. The second frame does not contain any
432	speech information that is represented in the payload by its TOC entry.
433	The third frame consists of bits sp3(0) to sp3(171).

435	The redundancy header follows after speech data. The one-packet-delayed
436	redundancy contains class A+B bits (CL1=2), and two-packet-delayed
437	redundancy contains class A bits (Cl2=1). The one-packet-delayed
438	redundancy contains three frames with 20, 39 and 35 bits respectively.

440	The first frame of two-packet-delayed redundancy is absent, it is
441	represented in its TOC entry, and two other frames have sizes 15 and 19
442	bits.

444	Note that all speech frames are padded with zero bits for byte
445	alignment.

447	   0                   1                   2                   3
448	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
449	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
450	  |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0)                         |
451	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	  |                                                               |
453	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
454	  |                                                               |
455	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
456	  |                  sp1(92)|P|P|P|sp3(0)                         |
457	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
458	  |                                                               |
459	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
460	  |                                                               |
461	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
462	  |                                                               |
463	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
464	  |                                                               |
465	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
466	  |                                               sp3(171)|P|P|P|P|
467	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
468	  |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0)                    red1_1(19)|
469	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
470	  |red1_2(0)                                                      |
471	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
472	  |   red1_2(38)|red1_3(0)                                        |
473	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
474	  |         red1_3(34)|red2_2(0)          red2_2(14)|red2_3(0)    |
475	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
476	  |             red2_3(18)|P|P|P|P|
477	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

479	5. Media Type Registration

481	This section describes the media types and names associated with this
482	payload format.

484	5.1. Registration of media subtype audio/ip-mr_v2.5

486	Type name: audio

488	Subtype name: ip-mr_v2.5

490	Required parameters: none

492	Optional parameters:
493	* ptime: Gives the length of time in milliseconds represented by the
494	media in a packet. Allowed values are: 20, 40, 60 and 80.

496	Encoding considerations: This media type is framed binary data (see RFC
497	4288, Section 4.8).

499	Security considerations: See RFC 3550 [RFC 3550]

501	Interoperability considerations: none

503	Published specification: RFC XXXX

505	Applications that use this media type: Real-time audio applications like
506	voice over IP and teleconference, and multi-media streaming.

508	Additional information: none

510	Person & email address to contact for further information:
511	Yury Morzeev
512	morzeev@spiritdsp.com

514	Intended usage: COMMON

516	Restrictions on usage: This media type depends on RTP framing, and hence
517	is only defined for transfer via RTP [RFC 3550].

519	Authors:
520	Sergey Ikonin <info@spiritdsp.com>

522	Change controller: IETF Audio/Video Transport working group delegated
523	from the IESG.

525	5.2. Mapping Media Type Parameters into SDP

527	The information carried in the media type specification has a specific
528	mapping to fields in the Session Description Protocol (SDP) [RFC 4566],
529	which is commonly used to describe RTP sessions. When SDP is used to
530	specify sessions employing the IP-MR codec, the mapping is as follows:

532	    o The media type ("audio") goes in SDP "m=" as the media name.

534	    o The media subtype (payload format name) goes in SDP "a=rtpmap"
535	    as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000.

537	    o The parameter "ptime" goes in the SDP "a=ptime" attributes.

539	Any remaining parameters go in the SDP "a=fmtp" attribute by copying
540	them directly from the media type parameter string as a semicolon-
541	separated list of parameter=value pairs.

543	Note that the payload format (encoding) names are commonly shown in
544	upper case. Media subtypes are commonly shown in lower case. These
545	names are case-insensitive in both places.

547	6. Security Considerations

549	RTP packets using the payload format defined in this specification
550	are subject to the security considerations discussed in the RTP
551	specification [RFC 3550] and in any applicable RTP profile. The main
552	security considerations for the RTP packet carrying the RTP payload
553	format defined within this memo are confidentiality, integrity, and
554	source authenticity. Confidentiality is achieved by encryption of the
555	RTP payload. Integrity of the RTP packets is achieved through a suitable
556	cryptographic integrity protection mechanism. Such a cryptographic
557	system may also allow the authentication of the source of the payload.

559	A suitable security mechanism for this RTP payload format should
560	provide confidentiality, integrity protection, and at least source
561	authentication capable of determining if an RTP packet is from a
562	member of the RTP session.

564	Note that the appropriate mechanism to provide security to RTP and
565	payloads following this memo may vary. It is dependent on the
566	application, the transport, and the signaling protocol employed.
567	Therefore, a single mechanism is not sufficient, although if suitable,
568	usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is
569	recommended.  Other mechanisms that may be used are IPsec [RFC 4301]
570	and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other
571	alternatives may exist.

573	This payload format does not exhibit any significant non-uniformity in
574	the receiver side computational complexity for packet processing, and
575	thus is unlikely to pose a denial-of-service threat due to the receipt
576	of pathological data.

578	7. Congestion Control

580	The general congestion control considerations for transporting RTP data
581	apply; see RTP [RFC 3550] and any applicable RTP profile like AVP
582	[RFC 3551]. However, the multi-rate capability of IP-MR speech coding
583	provides a mechanism that may help to control congestion, since the
584	bandwidth demand can be adjusted by selecting a different encoding mode.

586	The number of frames encapsulated in each RTP payload highly
587	influences the overall bandwidth of the RTP stream due to header
588	overhead constraints. Packetizing more frames in each RTP payload
589	can reduce the number of packets sent and hence the overhead from
590	IP/UDP/RTP headers, at the expense of increased delay.

592	If in-band redundancy scheme is used to protect against packet loss,
593	the amount of introduced redundancy will need to be regulated so that
594	the use of redundancy itself does not cause a congestion problem. In
595	other words, a sender SHALL NOT increase the total bitrate when adding
596	redundancy in response to packet loss, and needs instead to adjust it
597	down in accordance to the congestion control algorithm being run. Thus,
598	when adding redundancy, the media bitrate will need to be reduced to
599	provide room for the redundancy.

601	8. IANA Considerations

603	One media type has been defined and needs registration in the media
604	types registry.

606	9. Normative References

608	  [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
609	             Requirement Levels", BCP 14, RFC 2119, March 1997.

611	  [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and
612	             V. Jacobson, "RTP: A Transport Protocol for Real-Time
613	             Applications", STD 64, RFC 3550, July 2003.

615	  [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio
616	             and Video Conferences with Minimal Control", STD 65,
617	             RFC 3551, July 2003.

619	  [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
620	             Description Protocol", RFC 4566, July 2006.

622	  [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman,
623	             K., "The Secure Real-Time Transport Protocol (SRTP)", RFC
624	             3711, March 2004.

626	  [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer
627	             Security (TLS) Protocol Version 1.2", RFC 5246,
628	             August 2008.

630	  [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the
631	             Internet Protocol", RFC 4301, December 2005.

633	10. Author(s) Information:

635	Sergey Ikonin
636	email: info@spiritdsp.com

638	Russia 109004
639	Building 27, A. Solzhenitsyna street
640	Tel: +7 495 661-2178
641	Fax: +7 495 912-6786

643	11. Disclaimer

645	This document may contain material from IETF Documents or IETF
646	Contributions published or made publicly available before November 10,
647	2008. The person(s) controlling the copyright in some of this material
648	may not have granted the IETF Trust the right to allow modifications of
649	such material outside the IETF Standards Process. Without obtaining an
650	adequate license from the person(s) controlling the copyright in such
651	materials, this document may not be modified outside the IETF Standards
652	Process, and derivative works of it may not be created outside the IETF
653	Standards Process, except to format it for publication as an RFC or to
654	translate it into languages other than English.

656	12. Legal Terms

658	All IETF Documents and the information contained therein are provided on
659	an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
660	OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
661	THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
662	IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
663	INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
664	WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

666	The IETF Trust takes no position regarding the validity or scope of any
667	Intellectual Property Rights or other rights that might be claimed to
668	pertain to the implementation or use of the technology described in any
669	IETF Document or the extent to which any license under such rights might
670	or might not be available; nor does it represent that it has made any
671	independent effort to identify any such rights.

673	Copies of Intellectual Property disclosures made to the IETF Secretariat
674	and any assurances of licenses to be made available, or the result of an
675	attempt made to obtain a general license or permission for the use of
676	such proprietary rights by implementers or users of this specification
677	can be obtained from the IETF on-line IPR repository at
678	http://www.ietf.org/ipr.

680	The IETF invites any interested party to bring to its attention any
681	copyrights, patents or patent applications, or other proprietary rights
682	that may cover technology that may be required to implement any standard
683	or specification contained in an IETF Document. Please address the
684	information to the IETF at ietf-ipr@ietf.org.

686	The definitive version of an IETF Document is that published by, or
687	under the auspices of, the IETF. Versions of IETF Documents that are
688	published by third parties, including those that are translated into
689	other languages, should not be considered to be definitive versions of
690	IETF Documents. The definitive version of these Legal Provisions is that
691	published by, or under the auspices of, the IETF. Versions of these
692	Legal Provisions that are published by third parties, including those
693	that are translated into other languages, should not be considered to be
694	definitive versions of these Legal Provisions.

696	For the avoidance of doubt, each Contributor to the IETF Standards
697	Process licenses each Contribution that he or she makes as part of the
698	IETF Standards Process to the IETF Trust pursuant to the provisions of
699	RFC 5378. No language to the contrary, or terms, conditions or rights
700	that differ from or are inconsistent with the rights and licenses
701	granted under RFC 5378, shall have any effect and shall be null and
702	void, whether published or posted by such Contributor, or included with
703	or in such Contribution.

705	APPENDIX A. RETRIEVING FRAME INFORMATION

707	This appendix contains the c-code for implementation of frame parsing
708	function. This function extracts information about coded frame including
709	frame size, number of layers, size of each layer and size of perceptual
710	sensitive classes.

712	A.1. get_frame_info.c

714	/*
715	  Copyright (c) 2010 IETF Trust and the persons identified as authors
716	  of the code.  All rights reserved.

718	  Redistribution and use in source and binary forms, with or without
719	  modification, are permitted provided that the following conditions
720	  are met:

722	  - Redistributions of source code must retain the above copyright
723	    notice, this list of conditions and the following disclaimer.

725	  - Redistributions in binary form must reproduce the above copyright
726	    notice, this list of conditions and the following disclaimer in the
727	    documentation and/or other materials provided with the distribution.

729	  - Neither the name of the Xiph.org Foundation nor the names of its
730	    contributors may be used to endorse or promote products derived from
731	    this software without specific prior written permission.

733	  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
734	  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
735	  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
736	  A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
737	  FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
738	  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
739	  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
740	  OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
741	  AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
742	  OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
743	  THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
744	  DAMAGE.
745	*/

747	/******************************************************************

749	  get_frame_info.c

751	  Retrieving frame information for IP-MR Speech Codec

753	******************************************************************/

755	#define RATES_NUM       6   // number of codec rates
756	#define SENSE_CLASSES   6   // number of sensitivity classes (A..F)

758	// frame types
759	#define FT_SPEECH       0   // active speech
760	#define FT_DTX_SID      1   // silence insertion descriptor

762	// get specified bit from coded data
763	int GetBit(unsigned char *data, int curBit)
764	{
765	  return ((data[curBit >> 3] >> (curBit % 8)) & 1);
766	}

768	// retrieve frame information
769	int GetFrameInfo(           // o: frame size in bits
770	  short rate,               // i: encoding rate (0..5)
771	  short base_rate,          // i: base (core) layer rate,
772	                            //    if base_rate > rate, then assumed
773	                            //    that base_rate = rate.
774	  unsigned char *pCoded,    // i: coded bit frame
775	  short pLayerBits          // o: number of bits in layers
776	      [RATES_NUM],
777	  short pSenseBits          // o: number of bits in sensitivity classes
778	      [SENSE_CLASSES],
779	  short *nLayers            // o: number of layers
780	)
781	{
782	  static const short Bits_1[4]    = {0, 9, 9, 15};
783	  static const short Bits_2[16]   = { 43,50,36,31,46,48,40,44,47,43,44,
784	                                      45,43,44,47,36};

786	  static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31},
787	                                     {25,  0, 23, 32, 36, 31},};

789	  int FrType;
790	  int i,nBits;

792	  if (rate < 0 || rate > 5) {
793	    return 0; // incorrect stream
794	  }

796	  for(i = 0; i < SENSE_CLASSES; i++) {
797	    pSenseBits[i] = 0;
798	  }

800	  nBits = 0;
801	  // extract frame type bit if required
802	  FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID;

804	  {
805	    int cw_0;
806	    int b[14];

808	    // extract meaning bits
809	    for(i = 0 ; i < 14; i++) {
810	        b[i] = GetBit(pCoded, nBits++);
811	    }

813	    // parse
814	    if(FrType == FT_DTX_SID) {
815	      cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3);
816	      rate = 0;
817	      pSenseBits[0] = 10 + Bits_2[cw_0];
818	    } else {

820	      int i, idx;
821	      int nFlag_1, nFlag_2, cw_1, cw_2;

823	      nFlag_1 = b[0] + b[2] + b[4] + b[6];
824	      cw_1 = (cw_1 << 1) | b[0];
825	      cw_1 = (cw_1 << 1) | b[2];
826	      cw_1 = (cw_1 << 1) | b[4];
827	      cw_1 = (cw_1 << 1) | b[6];

829	      nFlag_2 = b[1] + b[3] + b[5] + b[7];
830	      cw_2 = (cw_2 << 1) | b[1];
831	      cw_2 = (cw_2 << 1) | b[3];
832	      cw_2 = (cw_2 << 1) | b[5];
833	      cw_2 = (cw_2 << 1) | b[7];

835	      cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3);
836	      if (base_rate < 0)    base_rate = 0;
837	      if (base_rate > rate) base_rate = rate;
838	      idx = base_rate == 0 ? 0 : 1;

840	      pSenseBits[0] = 15+Bits_2[cw_0];
841	      pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3];
842	      pSenseBits[2] = nFlag_1*5;
843	      pSenseBits[3] = nFlag_2*30;
844	      pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]);

846	      for (i = 1; i < rate+1; i++) {
847	        pLayerBits[i] = 4*(Bits_3[idx][i]);
848	      }
849	    }

851	    pLayerBits[0] = 0;
852	    for (i = 0; i < SENSE_CLASSES; i++) {
853	        pLayerBits[0] += pSenseBits[i];
854	    }

856	    *nLayers = rate+1;
857	  }

859	  {
860	    // count total frame size
861	    int payloadBitCount = 0;
862	    for (i = 0; i < *nLayers; i++) {
863	      payloadBitCount += pLayerBits[i];
864	    }
865	    return payloadBitCount;
866	  }
867	}

869	Authors' Addresses

871	   SPIRIT DSP
872	   Building 27, A. Solzhenitsyna street
873	   109004, Moscow, RUSSIA

875	   Tel: +7 495 661-2178
876	   Fax: +7 495 912-6786
877	   EMail: info@spiritdsp.com