idnits 2.17.1 draft-ietf-avt-rtp-ipmr-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 18, 2009) is 5363 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 792 -- Looks like a reference, but probably isn't: '16' on line 746 -- Looks like a reference, but probably isn't: '2' on line 808 -- Looks like a reference, but probably isn't: '6' on line 793 -- Looks like a reference, but probably isn't: '14' on line 772 -- Looks like a reference, but probably isn't: '0' on line 819 -- Looks like a reference, but probably isn't: '1' on line 807 -- Looks like a reference, but probably isn't: '3' on line 809 -- Looks like a reference, but probably isn't: '5' on line 810 -- Looks like a reference, but probably isn't: '7' on line 799 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Working Group S. Ikonin 2 Internet Draft SPIRIT DSP 3 Intended status: Informational August 18, 2009 5 RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-05.txt 7 Status of this Memo 9 This Internet-Draft is submitted to IETF in full conformance with the 10 provisions of BCP 78 and BCP 79. 12 Copyright (c) 2009 IETF Trust and the persons identified as the document 13 authors. All rights reserved. 15 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 16 Relating to IETF Documents in effect on the date of publication of this 17 document (http://trustee.ietf.org/license-info). Please review these 18 documents carefully, as they describe your rights and restrictions with 19 respect to this document. 21 Internet-Drafts are working documents of the Internet Engineering Task 22 Force (IETF), its areas, and its working groups. Note that other groups 23 may also distribute working documents as Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference material 28 or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/1id-abstracts.html 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html 36 This Internet-Draft will expire on February 18, 2010. 38 Abstract 40 This document specifies the payload format for packetization of SPIRIT 41 IP-MR encoded speech signals into the Real-time Transport Protocol 42 (RTP). The payload format supports transmission of multiple frames per 43 payload and introduced redundancy for robustness against packet loss. 45 Table of Contents 47 1. Introduction......................................................3 48 2. IP-MR Codec Description...........................................3 49 3. Payload Format....................................................4 50 3.1. RTP Header Usage.............................................4 51 3.2. Payload Format Structure.....................................5 52 3.3. Payload Header...............................................5 53 3.4. Speech Table of Contents.....................................6 54 3.5. Speech Data..................................................7 55 3.6. Redundancy Header............................................7 56 3.7. Redundancy Table of Contents.................................8 57 3.8. Redundancy Data..............................................9 58 4. Payload Examples..................................................9 59 4.1. Payload Carrying a Single Frame..............................9 60 4.2. Payload Carrying Multiple Frames with Redundancy............10 61 5. Media Type Registration..........................................11 62 5.1. Registration of media subtype audio/ip-mr_v2.5..............11 63 5.2. Mapping Media Type Parameters into SDP......................12 64 6. Security Considerations..........................................13 65 7. Congestion Control...............................................13 66 8. IANA Considerations..............................................14 67 9. Normative References.............................................14 68 10. Author(s) Information...........................................15 69 11. Disclaimer......................................................15 70 12. Legal Terms.....................................................15 71 APPENDIX A. RETRIEVING FRAME INFORMATION............................17 72 A.1. get_frame_info.c...............................................17 73 Authors' Addresses..................................................19 75 1. Introduction 77 This document specifies the payload format for packetization of SPIRIT 78 IP-MR encoded speech signals into the Real-time Transport Protocol 79 (RTP). The payload format supports transmission of multiple frames per 80 payload and introduced redundancy for robustness against packet loss. 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in RFC 2119 [RFC 2119]. 86 2. IP-MR Codec Description 88 The IP-MR codec is scalable adaptive multi-rate wideband speech codec 89 designed by SPIRIT for use in IP based networks. These codec is suitable 90 for real time communications such as telephony and videoconferencing. 92 The codec operates on 20 ms frames at 16 kHz sampling rate and has an 93 algorithmic delay of 25ms. 95 The IP-MR supports six wide band speech coding modes with respective bit 96 rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be 97 changed at any 20 ms frame boundary making possible to dynamically 98 adjust the speech encoding rate during a session to adapt to the varying 99 transmission conditions. 101 The coded frame consists of multiple coding layers - base (or core) 102 layer and several enhancement layers which are coded independently. 103 Only the core layer is mandatory to decode understandable speech and 104 upper layers provide quality enhancement. These enhancement layers 105 may be omitted and remaining base layer can be meaningfully decoded 106 without artifacts. This makes the bit stream scalable and allows 107 to reduce bit rate during transmission without re-encoding. 109 This memo specifies an optional form of redundancy coding within RTP 110 for protection against packet loss. It is based on commonly known 111 scheme when previously transmitted frames are aggregated together 112 with new ones. Each frame is retransmitted once in the following 113 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 114 frames, and p(n-1)...p(n+4) is a sequence of payload packets: 116 --+--------+--------+--------+--------+--------+--------+--------+-- 117 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 118 --+--------+--------+--------+--------+--------+--------+--------+-- 120 <---- p(n-1) ----> 121 <----- p(n) -----> 122 <---- p(n+1) ----> 123 <---- p(n+2) ----> 124 <---- p(n+3) ----> 125 <---- p(n+4) ----> 127 But because of the scalable nature of IP-MR codec there is no need to 128 duplicate the whole previous frame - only the core layer may be 129 retransmitted. This reduces redundancy overhead while keeping 130 efficiency. Moreover, the speech bits encoded in core layer are divided 131 on six classes (from A to F) of perceptual sensitivity to errors. Using 132 these classes as introduced redundancy make possible to adjust trade-off 133 between overhead and robustness against packet loss. 135 The mechanism described does not really require signaling at the session 136 setup. The sender is responsible for selecting an appropriate amount of 137 redundancy based on feedback about the channel conditions. 139 The main codec characteristics can be summarized as follows: 141 o Wideband, 16 kHz, speech codec 143 o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps 145 o Bit rate scalable 147 o Variable bit rate changing in accordance with actual speech 148 content 150 o Discontinuous Transmission (DTX), silence suppression and 151 comfort noise generation 153 o In-band redundancy scheme for protection against packet loss 155 3. Payload Format 157 The main purpose of the payload design for IP-MR is to maximize the 158 potential of the codec with as minimal overhead as possible. The payload 159 format allows changing parameters of the codec (such as bit rate, 160 level of scalability, DTX and redundancy mode) without re-negotiation 161 at any packet boundary. This make possible dynamically adjust streaming 162 parameters in accordance to changing network conditions. The payload 163 format also supports aggregation of multiple consecutive frames 164 (up to 4) in a payload. That allows controlling trade-off between 165 delay and header overhead. 167 3.1. RTP Header Usage 169 The RTP timestamp corresponds to the sampling instant of the first 170 sample encoded for the first frame-block in the packet. The timestamp 171 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 172 corresponding to 320 samples at 16 kHz. Thus the timestamp is increased 173 by 320 for each consecutive frame. The timestamp is also used to recover 174 the correct decoding order of the frame-blocks. 176 The RTP header marker bit (M) SHALL be set to 1 whenever the first 177 frame-block carried in the packet is the first frame-block in a 178 talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). 179 For all other packets, the marker bit SHALL be set to zero (M=0). 181 The assignment of an RTP payload type for the format defined in this 182 memo is outside the scope of this document. The RTP profiles in use 183 currently mandate binding the payload type dynamically for this payload 184 format. This is basically necessary because the payload type expresses 185 the configuration of the payload itself, i.e. basic or interleaved mode, 186 and the number of channels carried. 188 The remaining RTP header fields are used as specified in [RFC 3550]. 190 3.2. Payload Format Structure 192 The IP-MR payload format consists of a payload header with general 193 information about packet, a speech table of contents (TOC), and speech 194 data. An optional redundancy section follows after speech data. The 195 redundancy section consists of redundancy header, redundancy TOC and 196 redundancy data payload. 198 The following diagram shows the standard payload format layout: 200 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 201 | payload | speech | speech | redundancy | redundancy | redundancy | 202 | header | TOC | data | header | TOC | data | 203 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 205 3.3. Payload Header 207 The payload header has the following format: 209 0 1 210 0 1 2 3 4 5 6 7 8 9 0 1 211 +-+-+-+-+-+-+-+-+-+-+-+-+ 212 |T| CR | BR |D|A|GR |R| 213 +-+-+-+-+-+-+-+-+-+-+-+-+ 215 o T (1 bit): Reserved compatibility with future extensions. SHOULD 216 be set to 0. 218 o CR (3 bits): coding rate of frame(s) in this packet, as per the 219 following table: 221 +-------+--------------+ 222 | CR | avg. bitrate | 223 +-------+--------------+ 224 | 0 | 7.7 kbps | 225 | 1 | 9.8 kbps | 226 | 2 | 14.3 kbps | 227 | 3 | 20.8 kbps | 228 | 4 | 27.9 kbps | 229 | 5 | 34.2 kbps | 230 | 6 | (reserved) | 231 | 7 | NO_DATA | 232 +-------+--------------+ 234 The CR value 7 (NO_DATA) indicates that there is no speech data (and 235 speech TOC accordingly) in the payload. This MAY be used to transmit 236 redundancy data only. The value 6 is reserved. If receiving this value 237 the packet SHOULD be discarded. 239 o BR (3 bits): base rate for core layer of frame(s) in this packet 240 using the table for CR. Values in the range 0-5 indicate bitrates 241 for core layer, same as for packet SHOULD be discarded. The base 242 rate is the lowest rate for scalability, so speech payload can 243 be scaled down not lower than BR value. If a received packet has 244 BR > CR then during decoding it will be assumed that BR = CR. 246 o D (1 bit): indicates if the DTX mode is allowed or not. 248 o A (1 bit): byte-aligned payload. If A=1 then all speech frames 249 MUST be byte-aligned. This mode speeds up speech data access. 250 The A=0 value specifies bandwidth-efficient mode with no byte 251 alignment(including end of header). 253 o GR (2 bits): number of frames in packet (grouping size). Actual 254 grouping size is GR + 1, thus maximum grouping supported is 4. 256 o R (1 bit): redundancy presence bit. If R=1 then the packet 257 contains redundancy information for lost packets recovery. 258 In this case after speech data the redundancy section is present. 260 3.4. Speech Table of Contents 262 The speech TOC contains entries for each frame in packet (grouping size 263 in total). Each entry contains a single field: 265 0 266 +-+ 267 |E| 268 +-+ 270 o E (1 bit): frame existence indicator. If set to 0, this indicates 271 the corresponding frame is absent and the receiver should set 272 special LOST_FRAME flag for decoder. This can be followed by the 273 lost frame itself or by empty frames generated by the encoder 274 during silence intervals in DTX mode. 276 Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 277 is empty. 279 3.5. Speech Data 281 Speech data of a payload contains one or more speech frames or comfort 282 noise frames, as specified in the speech TOC of the payload. 284 Each speech frame represents 20 ms of speech encoded with the rate 285 indicated in the CR and base rate indicated in BR field of the payload 286 header. 288 The size of coded speech frame is variable due to the nature of codec. 289 The Encoder's algorithm decides what size of each frame is and returns 290 it after encoding. In order to save bandwidth the size is not placed 291 into payload obviously. The frame size can be determined by frame's 292 content using a special service function specified in Appendix A. 293 This function provides complete information about coded frame including 294 size, number of layers, size of each layer and size of perceptual 295 sensitive classes. 297 3.6. Redundancy Header 299 If a packet contains redundancy (R field of payload header is 1) the 300 speech data is followed by redundancy header: 302 0 1 2 3 4 5 303 +-+-+-+-+-+-+ 304 | CL1 | CL2 | 305 +-+-+-+-+-+-+ 307 Redundancy header consists of two fields. Each field contains class 308 specifier for amount of redundancy partly taken from the preceding 309 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 310 current packet by 1 and 2 packets accordingly. The values are listed 311 in the table below: 313 +-------+-------------------+ 314 | CL | amount redundancy | 315 +-------+-------------------+ 316 | 0 | NONE | 317 | 1 | CLASS A | 318 | 2 | CLASS B | 319 | 3 | CLASS C | 320 | 4 | CLASS D | 321 | 5 | CLASS E | 322 | 6 | CLASS F | 323 | 7 | (reserved) | 324 +-------+-------------------+ 326 Each specifier takes 3 bits, thus the total redundancy header size is 6 327 bits. 329 These classes indicate subjective importance of bits from core layer. 330 Class A contains the bits most sensitive to errors and lost of these 331 bits results in a corrupted speech frame which should not be decoded 332 without applying packet loss concealment (PLC) procedure. Class B is 333 less sensitive than class A and so on to F. Sum of all bit classes 334 from A to F composes core layer. 336 Putting some part (classes of bits) from previous frame into current 337 packet makes possible to partially decode previous frame in case of 338 it's lost. Than more information is delivered than less speech quality 339 degradation will be. Flags CL1 and CL2 specify how many classes from 340 previous frames current packet contain. E.g. CL1=3 (class C), it means 341 that packet contains bits from classes A, B and C of previous frame. 342 If CL1=6 (class F) then whole core layer is included. 344 3.7. Redundancy Table of Contents 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | Pkt1 Entries| Pkt2 Entries| 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 The redundancy TOC contains entries for redundancy frames from preceding 351 and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 352 (3.3): 354 0 355 +-+ 356 |E| 357 +-+ 359 o E (1 bit): frame existence indicator. If set to 0, this indicates 360 the corresponding frame is absent. 362 o For each preceding and pre-preceding packet the number of entries 363 is equal to the grouping size of the current packet. E.g. maximum 364 number of entries is 4*2 = 8. 366 o If class specifier in the redundancy header is CL=0 (NO_DATA) 367 then there is no entries for corresponding packet redundancy. 369 3.8. Redundancy Data 371 Redundancy data of a payload contains redundancy information for one or 372 more speech frames or comfort noise frames that may be lost during 373 transition, as specified in the redundancy TOC of the payload. Actually 374 redundancy is the most important part of preceding frames representing 375 20 ms of speech. This data MAY be used for partial reconstruction of 376 lost frames. The amount of available redundancy is specified by CL flag 377 in redundancy header section (3.5). This flag SHOULD be passed to 378 decoder. The size of redundancy frame is variable and can be obtained 379 using service function specified in Appendix A. 381 4. Payload Examples 383 A few examples to highlight the payload format follow. 385 4.1. Payload Carrying a Single Frame 387 The following diagram shows a standard IP-MR payload carrying a single 388 speech frame without redundancy: 390 0 1 2 3 391 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 393 |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | 394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 395 | | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 | | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 | | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | sp(193)|P| 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 408 In the payload the speech frame is not damaged at the IP origin (E=1), 409 the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and 410 the DTX mode is off. There is no byte alignment (A=0) and no redundancy 411 (R=0). The encoded speech bits - s(0) to s(193) - are placed immediately 412 after TOC. Finally, one zero bit is added at the end as padding to make 413 the payload byte aligned. 415 4.2. Payload Carrying Multiple Frames with Redundancy 417 The following diagram shows a payload that contains three frames, one of 418 them with no speech data. The coding rate is 7.7 kbps (CR=0), the base 419 rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are 420 byte aligned (A=1), so 1 zero bit is added at the end of the header. 421 Besides the speech frames the payload contains six redundancy frames 422 (three per each delayed packet). 424 The first speech frame consists of bits sp1(0) to sp1(92). After that 3 425 bits are added for byte alignment. The second frame does not contain any 426 speech information that is represented in the payload by its TOC entry. 427 The third frame consists of bits sp3(0) to sp3(171). 429 The redundancy header follows after speech data. The one-packet-delayed 430 redundancy contains class A+B bits (CL1=2), and two-packet-delayed 431 redundancy contains class A bits (Cl2=1). The one-packet-delayed 432 redundancy contains three frames with 20, 39 and 35 bits respectively. 434 The first frame of two-packet-delayed redundancy is absent, it is 435 represented in its TOC entry, and two other frames have sizes 15 and 19 436 bits. 438 Note that all speech frames are padded with zero bits for byte 439 alignment. 441 0 1 2 3 442 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 444 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | sp1(92)|P|P|P|sp3(0) | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | sp3(171)|P|P|P|P| 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)| 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 |red1_2(0) | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | red1_2(38)|red1_3(0) | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 | red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | red2_3(18)|P|P|P|P| 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 473 5. Media Type Registration 475 This section describes the media types and names associated with this 476 payload format. 478 5.1. Registration of media subtype audio/ip-mr_v2.5 480 Type name: audio 482 Subtype name: ip-mr_v2.5 484 Required parameters: none 486 Optional parameters: 487 * ptime: Gives the length of time in milliseconds represented by the 488 media in a packet. Allowed values are: 20, 40, 60 and 80. 490 Encoding considerations: This media type is framed binary data (see RFC 491 4288, Section 4.8). 493 Security considerations: See RFC 3550 [RFC 3550] 495 Interoperability considerations: none 497 Published specification: RFC XXXX 499 Applications that use this media type: Real-time audio applications like 500 voice over IP and teleconference, and multi-media streaming. 502 Additional information: none 504 Person & email address to contact for further information: 505 Yuri Morzeev 506 morzeev@spiritdsp.com 508 Intended usage: COMMON 510 Restrictions on usage: This media type depends on RTP framing, and hence 511 is only defined for transfer via RTP [RFC 3550]. 513 Authors: 514 Sergey Ikonin 516 Change controller: IETF Audio/Video Transport working group delegated 517 from the IESG. 519 5.2. Mapping Media Type Parameters into SDP 521 The information carried in the media type specification has a specific 522 mapping to fields in the Session Description Protocol (SDP) [RFC 4566], 523 which is commonly used to describe RTP sessions. When SDP is used to 524 specify sessions employing the IP-MR codec, the mapping is as follows: 526 o The media type ("audio") goes in SDP "m=" as the media name. 528 o The media subtype (payload format name) goes in SDP "a=rtpmap" 529 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 531 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 533 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 534 them directly from the media type parameter string as a semicolon- 535 separated list of parameter=value pairs. 537 Note that the payload format (encoding) names are commonly shown in 538 upper case. Media subtypes are commonly shown in lower case. These 539 names are case-insensitive in both places. 541 6. Security Considerations 543 RTP packets using the payload format defined in this specification 544 are subject to the security considerations discussed in the RTP 545 specification [RFC 3550] and in any applicable RTP profile. The main 546 security considerations for the RTP packet carrying the RTP payload 547 format defined within this memo are confidentiality, integrity, and 548 source authenticity. Confidentiality is achieved by encryption of the 549 RTP payload. Integrity of the RTP packets is achieved through a suitable 550 cryptographic integrity protection mechanism. Such a cryptographic 551 system may also allow the authentication of the source of the payload. 553 A suitable security mechanism for this RTP payload format should 554 provide confidentiality, integrity protection, and at least source 555 authentication capable of determining if an RTP packet is from a 556 member of the RTP session. 558 Note that the appropriate mechanism to provide security to RTP and 559 payloads following this memo may vary. It is dependent on the 560 application, the transport, and the signaling protocol employed. 561 Therefore, a single mechanism is not sufficient, although if suitable, 562 usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is 563 recommended. Other mechanisms that may be used are IPsec [RFC 4301] 564 and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other 565 alternatives may exist. 567 This payload format does not exhibit any significant non-uniformity in 568 the receiver side computational complexity for packet processing, and 569 thus is unlikely to pose a denial-of-service threat due to the receipt 570 of pathological data. 572 7. Congestion Control 574 The general congestion control considerations for transporting RTP data 575 apply; see RTP [RFC 3550] and any applicable RTP profile like AVP 576 [RFC 3551]. However, the multi-rate capability of IP-MR speech coding 577 provides a mechanism that may help to control congestion, since the 578 bandwidth demand can be adjusted by selecting a different encoding mode. 580 The number of frames encapsulated in each RTP payload highly 581 influences the overall bandwidth of the RTP stream due to header 582 overhead constraints. Packetizing more frames in each RTP payload 583 can reduce the number of packets sent and hence the overhead from 584 IP/UDP/RTP headers, at the expense of increased delay. 586 If in-band redundancy scheme is used to protect against packet loss, 587 the amount of introduced redundancy will need to be regulated so that 588 the use of redundancy itself does not cause a congestion problem. In 589 other words, a sender SHALL NOT increase the total bitrate when adding 590 redundancy in response to packet loss, and needs instead to adjust it 591 down in accordance to the congestion control algorithm being run. Thus, 592 when adding redundancy, the media bitrate will need to be reduced to 593 provide room for the redundancy. 595 8. IANA Considerations 597 One media type has been defined and needs registration in the media 598 types registry. 600 9. Normative References 602 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 603 Requirement Levels", BCP 14, RFC 2119, March 1997. 605 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and 606 V. Jacobson, "RTP: A Transport Protocol for Real-Time 607 Applications", STD 64, RFC 3550, July 2003. 609 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio 610 and Video Conferences with Minimal Control", STD 65, 611 RFC 3551, July 2003. 613 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 614 Description Protocol", RFC 4566, July 2006. 616 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 617 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 618 3711, March 2004. 620 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer 621 Security (TLS) Protocol Version 1.2", RFC 5246, 622 August 2008. 624 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 625 Internet Protocol", RFC 4301, December 2005. 627 10. Author(s) Information: 629 Sergey Ikonin 630 email: ikonin@spiritdsp.com 632 Russia 109004 633 Building 27, A. Solgenizyn street 634 Tel: +7 495 661-2178 635 Fax: +7 495 912-6786 637 11. Disclaimer 639 This document may contain material from IETF Documents or IETF 640 Contributions published or made publicly available before November 10, 641 2008. The person(s) controlling the copyright in some of this material 642 may not have granted the IETF Trust the right to allow modifications of 643 such material outside the IETF Standards Process. Without obtaining an 644 adequate license from the person(s) controlling the copyright in such 645 materials, this document may not be modified outside the IETF Standards 646 Process, and derivative works of it may not be created outside the IETF 647 Standards Process, except to format it for publication as an RFC or to 648 translate it into languages other than English. 650 12. Legal Terms 652 All IETF Documents and the information contained therein are provided on 653 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 654 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 655 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 656 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 657 INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 658 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 660 The IETF Trust takes no position regarding the validity or scope of any 661 Intellectual Property Rights or other rights that might be claimed to 662 pertain to the implementation or use of the technology described in any 663 IETF Document or the extent to which any license under such rights might 664 or might not be available; nor does it represent that it has made any 665 independent effort to identify any such rights. 667 Copies of Intellectual Property disclosures made to the IETF Secretariat 668 and any assurances of licenses to be made available, or the result of an 669 attempt made to obtain a general license or permission for the use of 670 such proprietary rights by implementers or users of this specification 671 can be obtained from the IETF on-line IPR repository at 672 http://www.ietf.org/ipr. 674 The IETF invites any interested party to bring to its attention any 675 copyrights, patents or patent applications, or other proprietary rights 676 that may cover technology that may be required to implement any standard 677 or specification contained in an IETF Document. Please address the 678 information to the IETF at ietf-ipr@ietf.org. 680 The definitive version of an IETF Document is that published by, or 681 under the auspices of, the IETF. Versions of IETF Documents that are 682 published by third parties, including those that are translated into 683 other languages, should not be considered to be definitive versions of 684 IETF Documents. The definitive version of these Legal Provisions is that 685 published by, or under the auspices of, the IETF. Versions of these 686 Legal Provisions that are published by third parties, including those 687 that are translated into other languages, should not be considered to be 688 definitive versions of these Legal Provisions. 690 For the avoidance of doubt, each Contributor to the IETF Standards 691 Process licenses each Contribution that he or she makes as part of the 692 IETF Standards Process to the IETF Trust pursuant to the provisions of 693 RFC 5378. No language to the contrary, or terms, conditions or rights 694 that differ from or are inconsistent with the rights and licenses 695 granted under RFC 5378, shall have any effect and shall be null and 696 void, whether published or posted by such Contributor, or included with 697 or in such Contribution. 699 APPENDIX A. RETRIEVING FRAME INFORMATION 701 This appendix contains the c-code for implementation of frame parsing 702 function. This function extracts information about coded frame including 703 frame size, number of layers, size of each layer and size of perceptual 704 sensitive classes. 706 A.1. get_frame_info.c 708 /****************************************************************** 710 get_frame_info.c 712 Retrieving frame information for IP-MR Speech Codec 714 ******************************************************************/ 716 #define RATES_NUM 6 // number of codec rates 717 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 719 // frame types 720 #define FT_DTX_SPEECH 0 // active speech in DTX mode 721 #define FT_DTX_SID 1 // silence insertion descriptor 722 #define FT_NO_DTX 2 // no DTX frame 724 // get specified bit from coded data 725 int GetBit(unsigned char *data, int curBit) 726 { 727 return ((data[curBit >> 3] >> (curBit % 8)) & 1); 728 } 730 // retrieve frame information 731 int GetFrameInfo( // o: frame size in bits 732 short rate, // i: encoding rate (0..5) 733 short base_rate, // i: base (core) layer rate, 734 // if base_rate > rate, then assumed 735 // that base_rate = rate. 736 short allow_DTX, // i: flag of DTX mode 737 unsigned char *pCoded, // i: coded bit frame 738 short pLayerBits // o: number of bits in layers 739 [RATES_NUM], 740 short pSenseBits // o: number of bits in sensitivity classes 741 [SENSE_CLASSES], 742 short *nLayers // o: number of layers 743 ) 744 { 745 static const short Bits_1[4] = {0, 9, 9, 15}; 746 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44, 747 45,43,44,47,36}; 749 static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, 750 {25, 0, 23, 32, 36, 31},}; 752 int FrType; 753 int i,nBits; 755 if (rate < 0 || rate > 5) { 756 return 0; // incorrect stream 757 } 759 for(i = 0; i < SENSE_CLASSES; i++) { 760 pSenseBits[i] = 0; 761 } 763 nBits = 0; 764 // extract frame type bit if required 765 if (allow_DTX) { 766 FrType = GetBit(pCoded, nBits++) ? FT_DTX_SPEECH : FT_DTX_SID; 767 } else { 768 FrType = FT_NO_DTX; 769 } 770 { 771 int cw_0; 772 int b[14]; 774 // extract meaning bits 775 for(i = 0 ; i < 14; i++) { 776 b[i] = GetBit(pCoded, nBits++); 777 } 779 // parse 780 if(FrType == FT_DTX_SID) { 781 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 782 rate = 0; 783 pSenseBits[0] = 10 + Bits_2[cw_0]; 784 } else { 786 int i, idx; 787 int nFlag_1, nFlag_2, cw_1, cw_2; 789 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 790 cw_1 = (cw_1 << 1) | b[0]; 791 cw_1 = (cw_1 << 1) | b[2]; 792 cw_1 = (cw_1 << 1) | b[4]; 793 cw_1 = (cw_1 << 1) | b[6]; 795 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 796 cw_2 = (cw_2 << 1) | b[1]; 797 cw_2 = (cw_2 << 1) | b[3]; 798 cw_2 = (cw_2 << 1) | b[5]; 799 cw_2 = (cw_2 << 1) | b[7]; 801 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 802 if (base_rate < 0) base_rate = 0; 803 if (base_rate > rate) base_rate = rate; 804 idx = base_rate == 0 ? 0 : 1; 806 pSenseBits[0] = (FrType == FT_DTX_SPEECH ? 1:0)+14+Bits_2[cw_0]; 807 pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; 808 pSenseBits[2] = nFlag_1*5; 809 pSenseBits[3] = nFlag_2*30; 810 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 812 for (i = 1; i < rate+1; i++) { 813 pLayerBits[i] = 4*(Bits_3[idx][i]); 814 } 815 } 817 pLayerBits[0] = 0; 818 for (i = 0; i < SENSE_CLASSES; i++) { 819 pLayerBits[0] += pSenseBits[i]; 820 } 822 *nLayers = rate+1; 823 } 825 { 826 // count total frame size 827 int payloadBitCount = 0; 828 for (i = 0; i < *nLayers; i++) { 829 payloadBitCount += pLayerBits[i]; 830 } 831 return payloadBitCount; 832 } 833 } 835 Authors' Addresses 837 SPIRIT DSP 838 Building 27, A. Solgenizyn street 839 109004, Moscow, RUSSIA 841 Tel: +7 495 661-2178 842 Fax: +7 495 912-6786 843 EMail: ikonin@spiritdsp.com