idnits 2.17.1 draft-ietf-avt-rtp-ipmr-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 16, 2009) is 5308 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 796 -- Looks like a reference, but probably isn't: '16' on line 750 -- Looks like a reference, but probably isn't: '2' on line 812 -- Looks like a reference, but probably isn't: '6' on line 797 -- Looks like a reference, but probably isn't: '14' on line 776 -- Looks like a reference, but probably isn't: '0' on line 823 -- Looks like a reference, but probably isn't: '1' on line 811 -- Looks like a reference, but probably isn't: '3' on line 813 -- Looks like a reference, but probably isn't: '5' on line 814 -- Looks like a reference, but probably isn't: '7' on line 803 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Working Group S. Ikonin 2 Internet Draft SPIRIT DSP 3 Intended status: Informational September 16, 2009 5 RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-08.txt 7 Status of this Memo 9 This Internet-Draft is submitted to IETF in full conformance with the 10 provisions of BCP 78 and BCP 79. 12 Copyright (c) 2009 IETF Trust and the persons identified as the document 13 authors. All rights reserved. 15 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 16 Relating to IETF Documents in effect on the date of publication of this 17 document (http://trustee.ietf.org/license-info). Please review these 18 documents carefully, as they describe your rights and restrictions with 19 respect to this document. 21 Internet-Drafts are working documents of the Internet Engineering Task 22 Force (IETF), its areas, and its working groups. Note that other groups 23 may also distribute working documents as Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference material 28 or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/1id-abstracts.html 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html 36 This Internet-Draft will expire on March 16, 2010. 38 Abstract 40 This document specifies the payload format for packetization of SPIRIT 41 IP-MR encoded speech signals into the Real-time Transport Protocol 42 (RTP). The payload format supports transmission of multiple frames per 43 payload and introduced redundancy for robustness against packet loss. 45 Table of Contents 47 1. Introduction......................................................3 48 2. IP-MR Codec Description...........................................3 49 3. Payload Format....................................................4 50 3.1. RTP Header Usage.............................................4 51 3.2. Payload Format Structure.....................................5 52 3.3. Payload Header...............................................5 53 3.4. Speech Table of Contents.....................................6 54 3.5. Speech Data..................................................7 55 3.6. Redundancy Header............................................7 56 3.7. Redundancy Table of Contents.................................8 57 3.8. Redundancy Data..............................................9 58 4. Payload Examples..................................................9 59 4.1. Payload Carrying a Single Frame..............................9 60 4.2. Payload Carrying Multiple Frames with Redundancy............10 61 5. Media Type Registration..........................................11 62 5.1. Registration of media subtype audio/ip-mr_v2.5..............11 63 5.2. Mapping Media Type Parameters into SDP......................12 64 6. Security Considerations..........................................13 65 7. Congestion Control...............................................13 66 8. IANA Considerations..............................................14 67 9. Normative References.............................................14 68 10. Author(s) Information...........................................15 69 11. Disclaimer......................................................15 70 12. Legal Terms.....................................................15 71 APPENDIX A. RETRIEVING FRAME INFORMATION............................17 72 A.1. get_frame_info.c...............................................17 73 Authors' Addresses..................................................19 75 1. Introduction 77 This document specifies the payload format for packetization of SPIRIT 78 IP-MR encoded speech signals into the Real-time Transport Protocol 79 (RTP). The payload format supports transmission of multiple frames per 80 payload and introduced redundancy for robustness against packet loss. 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in RFC 2119 [RFC 2119]. 86 2. IP-MR Codec Description 88 The IP-MR codec is scalable adaptive multi-rate wideband speech codec 89 designed by SPIRIT for use in IP based networks. These codec is suitable 90 for real time communications such as telephony and videoconferencing. 92 The codec operates on 20 ms frames at 16 kHz sampling rate and has an 93 algorithmic delay of 25ms. 95 The IP-MR supports six wide band speech coding modes with respective bit 96 rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be 97 changed at any 20 ms frame boundary making possible to dynamically 98 adjust the speech encoding rate during a session to adapt to the varying 99 transmission conditions. 101 The coded frame consists of multiple coding layers - base (or core) 102 layer and several enhancement layers which are coded independently. 103 Only the core layer is mandatory to decode understandable speech and 104 upper layers provide quality enhancement. These enhancement layers 105 may be omitted and remaining base layer can be meaningfully decoded 106 without artifacts. This makes the bit stream scalable and allows 107 to reduce bit rate during transmission without re-encoding. 109 This memo specifies an optional form of redundancy coding within RTP 110 for protection against packet loss. It is based on commonly known 111 scheme when previously transmitted frames are aggregated together 112 with new ones. Each frame is retransmitted once in the following 113 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 114 frames, and p(n-1)...p(n+4) is a sequence of payload packets: 116 --+--------+--------+--------+--------+--------+--------+--------+-- 117 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 118 --+--------+--------+--------+--------+--------+--------+--------+-- 120 <---- p(n-1) ----> 121 <----- p(n) -----> 122 <---- p(n+1) ----> 123 <---- p(n+2) ----> 124 <---- p(n+3) ----> 125 <---- p(n+4) ----> 127 But because of the scalable nature of IP-MR codec there is no need to 128 duplicate the whole previous frame - only the core layer may be 129 retransmitted. This reduces redundancy overhead while keeping 130 efficiency. Moreover, the speech bits encoded in core layer are divided 131 on six classes (from A to F) of perceptual sensitivity to errors. Using 132 these classes as introduced redundancy make possible to adjust trade-off 133 between overhead and robustness against packet loss. 135 The mechanism described does not really require signaling at the session 136 setup. The sender is responsible for selecting an appropriate amount of 137 redundancy based on feedback about the channel conditions. 139 The main codec characteristics can be summarized as follows: 141 o Wideband, 16 kHz, speech codec 143 o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps 145 o Bit rate scalable 147 o Variable bit rate changing in accordance with actual speech 148 content 150 o Discontinuous Transmission (DTX), silence suppression and 151 comfort noise generation 153 o In-band redundancy scheme for protection against packet loss 155 3. Payload Format 157 The main purpose of the payload design for IP-MR is to maximize the 158 potential of the codec with as minimal overhead as possible. The payload 159 format allows changing parameters of the codec (such as bit rate, 160 level of scalability, DTX and redundancy mode) without re-negotiation 161 at any packet boundary. This make possible dynamically adjust streaming 162 parameters in accordance to changing network conditions. The payload 163 format also supports aggregation of multiple consecutive frames 164 (up to 4) in a payload. That allows controlling trade-off between 165 delay and header overhead. 167 3.1. RTP Header Usage 169 The RTP timestamp corresponds to the sampling instant of the first 170 sample encoded for the first frame-block in the packet. The timestamp 171 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 172 corresponding to 320 samples at 16 kHz. Thus the timestamp is increased 173 by 320 for each consecutive frame. The timestamp is also used to recover 174 the correct decoding order of the frame-blocks. 176 The RTP header marker bit (M) SHALL be set to 1 whenever the first 177 frame-block carried in the packet is the first frame-block in a 178 talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). 179 For all other packets, the marker bit SHALL be set to zero (M=0). 181 The assignment of an RTP payload type for the format defined in this 182 memo is outside the scope of this document. The RTP profiles in use 183 currently mandate binding the payload type dynamically for this payload 184 format. This is basically necessary because the payload type expresses 185 the configuration of the payload itself, i.e. basic or interleaved mode, 186 and the number of channels carried. 188 The remaining RTP header fields are used as specified in [RFC 3550]. 190 3.2. Payload Format Structure 192 The IP-MR payload format consists of a payload header with general 193 information about packet, a speech table of contents (TOC), and speech 194 data. An optional redundancy section follows after speech data. The 195 redundancy section consists of redundancy header, redundancy TOC and 196 redundancy data payload. 198 The following diagram shows the standard payload format layout: 200 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 201 | payload | speech | speech | redundancy | redundancy | redundancy | 202 | header | TOC | data | header | TOC | data | 203 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 205 3.3. Payload Header 207 The payload header has the following format: 209 0 1 210 0 1 2 3 4 5 6 7 8 9 0 1 211 +-+-+-+-+-+-+-+-+-+-+-+-+ 212 |T| CR | BR |D|A|GR |R| 213 +-+-+-+-+-+-+-+-+-+-+-+-+ 215 o T (1 bit): Reserved compatibility with future extensions. SHOULD 216 be set to 0. 218 o CR (3 bits): coding rate of frame(s) in this packet, as per the 219 following table: 221 +-------+--------------+ 222 | CR | avg. bitrate | 223 +-------+--------------+ 224 | 0 | 7.7 kbps | 225 | 1 | 9.8 kbps | 226 | 2 | 14.3 kbps | 227 | 3 | 20.8 kbps | 228 | 4 | 27.9 kbps | 229 | 5 | 34.2 kbps | 230 | 6 | (reserved) | 231 | 7 | NO_DATA | 232 +-------+--------------+ 234 The CR value 7 (NO_DATA) indicates that there is no speech data (and 235 speech TOC accordingly) in the payload. This MAY be used to transmit 236 redundancy data only. The value 6 is reserved. If receiving this value 237 the packet SHOULD be discarded. 239 o BR (3 bits): base rate for core layer of frame(s) in this packet 240 using the table for CR. Values in the range 0-5 indicate bitrates 241 for core layer, same as for packet SHOULD be discarded. The base 242 rate is the lowest rate for scalability, so speech payload can 243 be scaled down not lower than BR value. If a received packet has 244 BR > CR then during decoding it will be assumed that BR = CR. 246 o D (1 bit): indicates if the DTX mode is active or not. This 247 parameter is required for payload parsing. The 248 decoder implementation MUST always include DTX mode 249 support and update internal states properly. The decoder cannot 250 assume that DTX will be constantly inactive during a session. 252 o A (1 bit): byte-aligned payload. If A=1 then all speech frames 253 MUST be byte-aligned. This mode speeds up speech data access. 254 The A=0 value specifies bandwidth-efficient mode with no byte 255 alignment(including end of header). 257 o GR (2 bits): number of frames in packet (grouping size). Actual 258 grouping size is GR + 1, thus maximum grouping supported is 4. 260 o R (1 bit): redundancy presence bit. If R=1 then the packet 261 contains redundancy information for lost packets recovery. 262 In this case after speech data the redundancy section is present. 264 3.4. Speech Table of Contents 266 The speech TOC contains entries for each frame in packet (grouping size 267 in total). Each entry contains a single field: 269 0 270 +-+ 271 |E| 272 +-+ 274 o E (1 bit): frame existence indicator. If set to 0, this indicates 275 the corresponding frame is absent and the receiver should set 276 special LOST_FRAME flag for decoder. This can be followed by the 277 lost frame itself or by empty frames generated by the encoder 278 during silence intervals in DTX mode. 280 Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 281 is empty. 283 3.5. Speech Data 285 Speech data of a payload contains one or more speech frames or comfort 286 noise frames, as specified in the speech TOC of the payload. 288 Each speech frame represents 20 ms of speech encoded with the rate 289 indicated in the CR and base rate indicated in BR field of the payload 290 header. 292 The size of coded speech frame is variable due to the nature of codec. 293 The Encoder's algorithm decides what size of each frame is and returns 294 it after encoding. In order to save bandwidth the size is not placed 295 into payload obviously. The frame size can be determined by frame's 296 content using a special service function specified in Appendix A. 297 This function provides complete information about coded frame including 298 size, number of layers, size of each layer and size of perceptual 299 sensitive classes. 301 3.6. Redundancy Header 303 If a packet contains redundancy (R field of payload header is 1) the 304 speech data is followed by redundancy header: 306 0 1 2 3 4 5 307 +-+-+-+-+-+-+ 308 | CL1 | CL2 | 309 +-+-+-+-+-+-+ 311 Redundancy header consists of two fields. Each field contains class 312 specifier for amount of redundancy partly taken from the preceding 313 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 314 current packet by 1 and 2 packets accordingly. The values are listed 315 in the table below: 317 +-------+-------------------+ 318 | CL | amount redundancy | 319 +-------+-------------------+ 320 | 0 | NONE | 321 | 1 | CLASS A | 322 | 2 | CLASS B | 323 | 3 | CLASS C | 324 | 4 | CLASS D | 325 | 5 | CLASS E | 326 | 6 | CLASS F | 327 | 7 | (reserved) | 328 +-------+-------------------+ 330 Each specifier takes 3 bits, thus the total redundancy header size is 6 331 bits. 333 These classes indicate subjective importance of bits from core layer. 334 Class A contains the bits most sensitive to errors and lost of these 335 bits results in a corrupted speech frame which should not be decoded 336 without applying packet loss concealment (PLC) procedure. Class B is 337 less sensitive than class A and so on to F. Sum of all bit classes 338 from A to F composes core layer. 340 Putting some part (classes of bits) from previous frame into current 341 packet makes possible to partially decode previous frame in case of 342 it's lost. Than more information is delivered than less speech quality 343 degradation will be. Flags CL1 and CL2 specify how many classes from 344 previous frames current packet contain. E.g. CL1=3 (class C), it means 345 that packet contains bits from classes A, B and C of previous frame. 346 If CL1=6 (class F) then whole core layer is included. 348 3.7. Redundancy Table of Contents 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | Pkt1 Entries| Pkt2 Entries| 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 The redundancy TOC contains entries for redundancy frames from preceding 355 and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 356 (3.3): 358 0 359 +-+ 360 |E| 361 +-+ 363 o E (1 bit): frame existence indicator. If set to 0, this indicates 364 the corresponding frame is absent. 366 o For each preceding and pre-preceding packet the number of entries 367 is equal to the grouping size of the current packet. E.g. maximum 368 number of entries is 4*2 = 8. 370 o If class specifier in the redundancy header is CL=0 (NO_DATA) 371 then there is no entries for corresponding packet redundancy. 373 3.8. Redundancy Data 375 Redundancy data of a payload contains redundancy information for one or 376 more speech frames or comfort noise frames that may be lost during 377 transition, as specified in the redundancy TOC of the payload. Actually 378 redundancy is the most important part of preceding frames representing 379 20 ms of speech. This data MAY be used for partial reconstruction of 380 lost frames. The amount of available redundancy is specified by CL flag 381 in redundancy header section (3.5). This flag SHOULD be passed to 382 decoder. The size of redundancy frame is variable and can be obtained 383 using service function specified in Appendix A. 385 4. Payload Examples 387 A few examples to highlight the payload format follow. 389 4.1. Payload Carrying a Single Frame 391 The following diagram shows a standard IP-MR payload carrying a single 392 speech frame without redundancy: 394 0 1 2 3 395 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 | | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | sp(193)|P| 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 412 In the payload the speech frame is not damaged at the IP origin (E=1), 413 the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and 414 the DTX mode is off. There is no byte alignment (A=0) and no redundancy 415 (R=0). The encoded speech bits - s(0) to s(193) - are placed immediately 416 after TOC. Finally, one zero bit is added at the end as padding to make 417 the payload byte aligned. 419 4.2. Payload Carrying Multiple Frames with Redundancy 421 The following diagram shows a payload that contains three frames, one of 422 them with no speech data. The coding rate is 7.7 kbps (CR=0), the base 423 rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are 424 byte aligned (A=1), so 1 zero bit is added at the end of the header. 425 Besides the speech frames the payload contains six redundancy frames 426 (three per each delayed packet). 428 The first speech frame consists of bits sp1(0) to sp1(92). After that 3 429 bits are added for byte alignment. The second frame does not contain any 430 speech information that is represented in the payload by its TOC entry. 431 The third frame consists of bits sp3(0) to sp3(171). 433 The redundancy header follows after speech data. The one-packet-delayed 434 redundancy contains class A+B bits (CL1=2), and two-packet-delayed 435 redundancy contains class A bits (Cl2=1). The one-packet-delayed 436 redundancy contains three frames with 20, 39 and 35 bits respectively. 438 The first frame of two-packet-delayed redundancy is absent, it is 439 represented in its TOC entry, and two other frames have sizes 15 and 19 440 bits. 442 Note that all speech frames are padded with zero bits for byte 443 alignment. 445 0 1 2 3 446 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | sp1(92)|P|P|P|sp3(0) | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 | sp3(171)|P|P|P|P| 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)| 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 |red1_2(0) | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | red1_2(38)|red1_3(0) | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | red2_3(18)|P|P|P|P| 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 5. Media Type Registration 479 This section describes the media types and names associated with this 480 payload format. 482 5.1. Registration of media subtype audio/ip-mr_v2.5 484 Type name: audio 486 Subtype name: ip-mr_v2.5 488 Required parameters: none 490 Optional parameters: 491 * ptime: Gives the length of time in milliseconds represented by the 492 media in a packet. Allowed values are: 20, 40, 60 and 80. 494 Encoding considerations: This media type is framed binary data (see RFC 495 4288, Section 4.8). 497 Security considerations: See RFC 3550 [RFC 3550] 499 Interoperability considerations: none 501 Published specification: RFC XXXX 503 Applications that use this media type: Real-time audio applications like 504 voice over IP and teleconference, and multi-media streaming. 506 Additional information: none 508 Person & email address to contact for further information: 509 Yury Morzeev 510 morzeev@spiritdsp.com 512 Intended usage: COMMON 514 Restrictions on usage: This media type depends on RTP framing, and hence 515 is only defined for transfer via RTP [RFC 3550]. 517 Authors: 518 Sergey Ikonin 520 Change controller: IETF Audio/Video Transport working group delegated 521 from the IESG. 523 5.2. Mapping Media Type Parameters into SDP 525 The information carried in the media type specification has a specific 526 mapping to fields in the Session Description Protocol (SDP) [RFC 4566], 527 which is commonly used to describe RTP sessions. When SDP is used to 528 specify sessions employing the IP-MR codec, the mapping is as follows: 530 o The media type ("audio") goes in SDP "m=" as the media name. 532 o The media subtype (payload format name) goes in SDP "a=rtpmap" 533 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 535 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 537 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 538 them directly from the media type parameter string as a semicolon- 539 separated list of parameter=value pairs. 541 Note that the payload format (encoding) names are commonly shown in 542 upper case. Media subtypes are commonly shown in lower case. These 543 names are case-insensitive in both places. 545 6. Security Considerations 547 RTP packets using the payload format defined in this specification 548 are subject to the security considerations discussed in the RTP 549 specification [RFC 3550] and in any applicable RTP profile. The main 550 security considerations for the RTP packet carrying the RTP payload 551 format defined within this memo are confidentiality, integrity, and 552 source authenticity. Confidentiality is achieved by encryption of the 553 RTP payload. Integrity of the RTP packets is achieved through a suitable 554 cryptographic integrity protection mechanism. Such a cryptographic 555 system may also allow the authentication of the source of the payload. 557 A suitable security mechanism for this RTP payload format should 558 provide confidentiality, integrity protection, and at least source 559 authentication capable of determining if an RTP packet is from a 560 member of the RTP session. 562 Note that the appropriate mechanism to provide security to RTP and 563 payloads following this memo may vary. It is dependent on the 564 application, the transport, and the signaling protocol employed. 565 Therefore, a single mechanism is not sufficient, although if suitable, 566 usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is 567 recommended. Other mechanisms that may be used are IPsec [RFC 4301] 568 and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other 569 alternatives may exist. 571 This payload format does not exhibit any significant non-uniformity in 572 the receiver side computational complexity for packet processing, and 573 thus is unlikely to pose a denial-of-service threat due to the receipt 574 of pathological data. 576 7. Congestion Control 578 The general congestion control considerations for transporting RTP data 579 apply; see RTP [RFC 3550] and any applicable RTP profile like AVP 580 [RFC 3551]. However, the multi-rate capability of IP-MR speech coding 581 provides a mechanism that may help to control congestion, since the 582 bandwidth demand can be adjusted by selecting a different encoding mode. 584 The number of frames encapsulated in each RTP payload highly 585 influences the overall bandwidth of the RTP stream due to header 586 overhead constraints. Packetizing more frames in each RTP payload 587 can reduce the number of packets sent and hence the overhead from 588 IP/UDP/RTP headers, at the expense of increased delay. 590 If in-band redundancy scheme is used to protect against packet loss, 591 the amount of introduced redundancy will need to be regulated so that 592 the use of redundancy itself does not cause a congestion problem. In 593 other words, a sender SHALL NOT increase the total bitrate when adding 594 redundancy in response to packet loss, and needs instead to adjust it 595 down in accordance to the congestion control algorithm being run. Thus, 596 when adding redundancy, the media bitrate will need to be reduced to 597 provide room for the redundancy. 599 8. IANA Considerations 601 One media type has been defined and needs registration in the media 602 types registry. 604 9. Normative References 606 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 607 Requirement Levels", BCP 14, RFC 2119, March 1997. 609 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and 610 V. Jacobson, "RTP: A Transport Protocol for Real-Time 611 Applications", STD 64, RFC 3550, July 2003. 613 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio 614 and Video Conferences with Minimal Control", STD 65, 615 RFC 3551, July 2003. 617 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 618 Description Protocol", RFC 4566, July 2006. 620 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 621 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 622 3711, March 2004. 624 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer 625 Security (TLS) Protocol Version 1.2", RFC 5246, 626 August 2008. 628 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 629 Internet Protocol", RFC 4301, December 2005. 631 10. Author(s) Information: 633 Sergey Ikonin 634 email: info@spiritdsp.com 636 Russia 109004 637 Building 27, A. Solzhenitsyna street 638 Tel: +7 495 661-2178 639 Fax: +7 495 912-6786 641 11. Disclaimer 643 This document may contain material from IETF Documents or IETF 644 Contributions published or made publicly available before November 10, 645 2008. The person(s) controlling the copyright in some of this material 646 may not have granted the IETF Trust the right to allow modifications of 647 such material outside the IETF Standards Process. Without obtaining an 648 adequate license from the person(s) controlling the copyright in such 649 materials, this document may not be modified outside the IETF Standards 650 Process, and derivative works of it may not be created outside the IETF 651 Standards Process, except to format it for publication as an RFC or to 652 translate it into languages other than English. 654 12. Legal Terms 656 All IETF Documents and the information contained therein are provided on 657 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 658 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 659 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 660 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 661 INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 662 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 664 The IETF Trust takes no position regarding the validity or scope of any 665 Intellectual Property Rights or other rights that might be claimed to 666 pertain to the implementation or use of the technology described in any 667 IETF Document or the extent to which any license under such rights might 668 or might not be available; nor does it represent that it has made any 669 independent effort to identify any such rights. 671 Copies of Intellectual Property disclosures made to the IETF Secretariat 672 and any assurances of licenses to be made available, or the result of an 673 attempt made to obtain a general license or permission for the use of 674 such proprietary rights by implementers or users of this specification 675 can be obtained from the IETF on-line IPR repository at 676 http://www.ietf.org/ipr. 678 The IETF invites any interested party to bring to its attention any 679 copyrights, patents or patent applications, or other proprietary rights 680 that may cover technology that may be required to implement any standard 681 or specification contained in an IETF Document. Please address the 682 information to the IETF at ietf-ipr@ietf.org. 684 The definitive version of an IETF Document is that published by, or 685 under the auspices of, the IETF. Versions of IETF Documents that are 686 published by third parties, including those that are translated into 687 other languages, should not be considered to be definitive versions of 688 IETF Documents. The definitive version of these Legal Provisions is that 689 published by, or under the auspices of, the IETF. Versions of these 690 Legal Provisions that are published by third parties, including those 691 that are translated into other languages, should not be considered to be 692 definitive versions of these Legal Provisions. 694 For the avoidance of doubt, each Contributor to the IETF Standards 695 Process licenses each Contribution that he or she makes as part of the 696 IETF Standards Process to the IETF Trust pursuant to the provisions of 697 RFC 5378. No language to the contrary, or terms, conditions or rights 698 that differ from or are inconsistent with the rights and licenses 699 granted under RFC 5378, shall have any effect and shall be null and 700 void, whether published or posted by such Contributor, or included with 701 or in such Contribution. 703 APPENDIX A. RETRIEVING FRAME INFORMATION 705 This appendix contains the c-code for implementation of frame parsing 706 function. This function extracts information about coded frame including 707 frame size, number of layers, size of each layer and size of perceptual 708 sensitive classes. 710 A.1. get_frame_info.c 712 /****************************************************************** 714 get_frame_info.c 716 Retrieving frame information for IP-MR Speech Codec 718 ******************************************************************/ 720 #define RATES_NUM 6 // number of codec rates 721 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 723 // frame types 724 #define FT_DTX_SPEECH 0 // active speech in DTX mode 725 #define FT_DTX_SID 1 // silence insertion descriptor 726 #define FT_NO_DTX 2 // no DTX frame 728 // get specified bit from coded data 729 int GetBit(unsigned char *data, int curBit) 730 { 731 return ((data[curBit >> 3] >> (curBit % 8)) & 1); 732 } 734 // retrieve frame information 735 int GetFrameInfo( // o: frame size in bits 736 short rate, // i: encoding rate (0..5) 737 short base_rate, // i: base (core) layer rate, 738 // if base_rate > rate, then assumed 739 // that base_rate = rate. 740 short allow_DTX, // i: flag of DTX mode 741 unsigned char *pCoded, // i: coded bit frame 742 short pLayerBits // o: number of bits in layers 743 [RATES_NUM], 744 short pSenseBits // o: number of bits in sensitivity classes 745 [SENSE_CLASSES], 746 short *nLayers // o: number of layers 747 ) 748 { 749 static const short Bits_1[4] = {0, 9, 9, 15}; 750 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44, 751 45,43,44,47,36}; 753 static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, 754 {25, 0, 23, 32, 36, 31},}; 756 int FrType; 757 int i,nBits; 759 if (rate < 0 || rate > 5) { 760 return 0; // incorrect stream 761 } 763 for(i = 0; i < SENSE_CLASSES; i++) { 764 pSenseBits[i] = 0; 765 } 767 nBits = 0; 768 // extract frame type bit if required 769 if (allow_DTX) { 770 FrType = GetBit(pCoded, nBits++) ? FT_DTX_SPEECH : FT_DTX_SID; 771 } else { 772 FrType = FT_NO_DTX; 773 } 774 { 775 int cw_0; 776 int b[14]; 778 // extract meaning bits 779 for(i = 0 ; i < 14; i++) { 780 b[i] = GetBit(pCoded, nBits++); 781 } 783 // parse 784 if(FrType == FT_DTX_SID) { 785 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 786 rate = 0; 787 pSenseBits[0] = 10 + Bits_2[cw_0]; 788 } else { 790 int i, idx; 791 int nFlag_1, nFlag_2, cw_1, cw_2; 793 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 794 cw_1 = (cw_1 << 1) | b[0]; 795 cw_1 = (cw_1 << 1) | b[2]; 796 cw_1 = (cw_1 << 1) | b[4]; 797 cw_1 = (cw_1 << 1) | b[6]; 799 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 800 cw_2 = (cw_2 << 1) | b[1]; 801 cw_2 = (cw_2 << 1) | b[3]; 802 cw_2 = (cw_2 << 1) | b[5]; 803 cw_2 = (cw_2 << 1) | b[7]; 805 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 806 if (base_rate < 0) base_rate = 0; 807 if (base_rate > rate) base_rate = rate; 808 idx = base_rate == 0 ? 0 : 1; 810 pSenseBits[0] = (FrType == FT_DTX_SPEECH ? 1:0)+14+Bits_2[cw_0]; 811 pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; 812 pSenseBits[2] = nFlag_1*5; 813 pSenseBits[3] = nFlag_2*30; 814 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 816 for (i = 1; i < rate+1; i++) { 817 pLayerBits[i] = 4*(Bits_3[idx][i]); 818 } 819 } 821 pLayerBits[0] = 0; 822 for (i = 0; i < SENSE_CLASSES; i++) { 823 pLayerBits[0] += pSenseBits[i]; 824 } 826 *nLayers = rate+1; 827 } 829 { 830 // count total frame size 831 int payloadBitCount = 0; 832 for (i = 0; i < *nLayers; i++) { 833 payloadBitCount += pLayerBits[i]; 834 } 835 return payloadBitCount; 836 } 837 } 839 Authors' Addresses 841 SPIRIT DSP 842 Building 27, A. Solzhenitsyna street 843 109004, Moscow, RUSSIA 845 Tel: +7 495 661-2178 846 Fax: +7 495 912-6786 847 EMail: info@spiritdsp.com