idnits 2.17.1 draft-ietf-avt-rtp-ipmr-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 17) being 87 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 02, 2010) is 5198 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 826 -- Looks like a reference, but probably isn't: '16' on line 783 -- Looks like a reference, but probably isn't: '2' on line 842 -- Looks like a reference, but probably isn't: '6' on line 827 -- Looks like a reference, but probably isn't: '14' on line 806 -- Looks like a reference, but probably isn't: '0' on line 853 -- Looks like a reference, but probably isn't: '1' on line 841 -- Looks like a reference, but probably isn't: '3' on line 843 -- Looks like a reference, but probably isn't: '5' on line 844 -- Looks like a reference, but probably isn't: '7' on line 833 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Working Group S. Ikonin 2 Internet Draft SPIRIT DSP 3 Intended status: Informational February 02, 2010 5 RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-11.txt 7 Status of this Memo 9 This Internet-Draft is submitted to IETF in full conformance with the 10 provisions of BCP 78 and BCP 79. 12 Copyright (c) 2010 IETF Trust and the persons identified as the document 13 authors. All rights reserved. 15 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 16 Relating to IETF Documents (http://trustee.ietf.org/license-info) 17 in effect on the date of publication of this document. Please 18 review these documents carefully, as they describe your rights and 19 restrictions with respect to this document. Code Components 20 extracted from this document must include Simplified BSD License 21 text as described in Section 4.e of the Trust Legal Provisions and 22 are provided without warranty as described in the Simplified BSD 23 License. 25 The source codes included in this document are provided under BSD 26 license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf). 28 Internet-Drafts are working documents of the Internet Engineering Task 29 Force (IETF), its areas, and its working groups. Note that other groups 30 may also distribute working documents as Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference material 35 or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 This Internet-Draft will expire on June 02, 2010. 45 Abstract 47 This document specifies the payload format for packetization of SPIRIT 48 IP-MR encoded speech signals into the Real-time Transport Protocol 49 (RTP). The payload format supports transmission of multiple frames per 50 payload and introduced redundancy for robustness against packet loss. 52 Table of Contents 54 1. Introduction......................................................3 55 2. IP-MR Codec Description...........................................3 56 3. Payload Format....................................................4 57 3.1. RTP Header Usage.............................................4 58 3.2. Payload Format Structure.....................................5 59 3.3. Payload Header...............................................5 60 3.4. Speech Table of Contents.....................................6 61 3.5. Speech Data..................................................7 62 3.6. Redundancy Header............................................7 63 3.7. Redundancy Table of Contents.................................8 64 3.8. Redundancy Data..............................................9 65 4. Payload Examples..................................................9 66 4.1. Payload Carrying a Single Frame..............................9 67 4.2. Payload Carrying Multiple Frames with Redundancy............10 68 5. Media Type Registration..........................................11 69 5.1. Registration of media subtype audio/ip-mr_v2.5..............11 70 5.2. Mapping Media Type Parameters into SDP......................12 71 6. Security Considerations..........................................13 72 7. Congestion Control...............................................13 73 8. IANA Considerations..............................................14 74 9. Normative References.............................................14 75 10. Author(s) Information...........................................15 76 11. Disclaimer......................................................15 77 12. Legal Terms.....................................................15 78 APPENDIX A. RETRIEVING FRAME INFORMATION............................17 79 A.1. get_frame_info.c...............................................17 80 Authors' Addresses..................................................19 82 1. Introduction 84 This document specifies the payload format for packetization of SPIRIT 85 IP-MR encoded speech signals into the Real-time Transport Protocol 86 (RTP). The payload format supports transmission of multiple frames per 87 payload and introduced redundancy for robustness against packet loss. 89 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 90 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 91 document are to be interpreted as described in RFC 2119 [RFC 2119]. 93 2. IP-MR Codec Description 95 The IP-MR codec is scalable adaptive multi-rate wideband speech codec 96 designed by SPIRIT for use in IP based networks. These codec is suitable 97 for real time communications such as telephony and videoconferencing. 99 The codec operates on 20 ms frames at 16 kHz sampling rate and has an 100 algorithmic delay of 25ms. 102 The IP-MR supports six wide band speech coding modes with respective bit 103 rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be 104 changed at any 20 ms frame boundary making possible to dynamically 105 adjust the speech encoding rate during a session to adapt to the varying 106 transmission conditions. 108 The coded frame consists of multiple coding layers - base (or core) 109 layer and several enhancement layers which are coded independently. 110 Only the core layer is mandatory to decode understandable speech and 111 upper layers provide quality enhancement. These enhancement layers 112 may be omitted and remaining base layer can be meaningfully decoded 113 without artifacts. This makes the bit stream scalable and allows 114 to reduce bit rate during transmission without re-encoding. 116 This memo specifies an optional form of redundancy coding within RTP 117 for protection against packet loss. It is based on commonly known 118 scheme when previously transmitted frames are aggregated together 119 with new ones. Each frame is retransmitted once in the following 120 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 121 frames, and p(n-1)...p(n+4) is a sequence of payload packets: 123 --+--------+--------+--------+--------+--------+--------+--------+-- 124 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 125 --+--------+--------+--------+--------+--------+--------+--------+-- 127 <---- p(n-1) ----> 128 <----- p(n) -----> 129 <---- p(n+1) ----> 130 <---- p(n+2) ----> 131 <---- p(n+3) ----> 132 <---- p(n+4) ----> 134 But because of the scalable nature of IP-MR codec there is no need to 135 duplicate the whole previous frame - only the core layer may be 136 retransmitted. This reduces redundancy overhead while keeping 137 efficiency. Moreover, the speech bits encoded in core layer are divided 138 on six classes (from A to F) of perceptual sensitivity to errors. Using 139 these classes as introduced redundancy make possible to adjust trade-off 140 between overhead and robustness against packet loss. 142 The mechanism described does not really require signaling at the session 143 setup. The sender is responsible for selecting an appropriate amount of 144 redundancy based on feedback about the channel conditions. 146 The main codec characteristics can be summarized as follows: 148 o Wideband, 16 kHz, speech codec 150 o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps 152 o Bit rate scalable 154 o Variable bit rate changing in accordance with actual speech 155 content 157 o Discontinuous Transmission (DTX), silence suppression and 158 comfort noise generation 160 o In-band redundancy scheme for protection against packet loss 162 3. Payload Format 164 The main purpose of the payload design for IP-MR is to maximize the 165 potential of the codec with as minimal overhead as possible. The payload 166 format allows changing parameters of the codec (such as bit rate, 167 level of scalability, DTX and redundancy mode) without re-negotiation 168 at any packet boundary. This make possible dynamically adjust streaming 169 parameters in accordance to changing network conditions. The payload 170 format also supports aggregation of multiple consecutive frames 171 (up to 4) in a payload. That allows controlling trade-off between 172 delay and header overhead. 174 3.1. RTP Header Usage 176 The RTP timestamp corresponds to the sampling instant of the first 177 sample encoded for the first frame-block in the packet. The timestamp 178 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 179 corresponding to 320 samples at 16 kHz. Thus the timestamp is increased 180 by 320 for each consecutive frame. The timestamp is also used to recover 181 the correct decoding order of the frame-blocks. 183 The RTP header marker bit (M) SHALL be set to 1 whenever the first 184 frame-block carried in the packet is the first frame-block in a 185 talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). 186 For all other packets, the marker bit SHALL be set to zero (M=0). 188 The assignment of an RTP payload type for the format defined in this 189 memo is outside the scope of this document. The RTP profiles in use 190 currently mandate binding the payload type dynamically for this payload 191 format. This is basically necessary because the payload type expresses 192 the configuration of the payload itself, i.e. basic or interleaved mode, 193 and the number of channels carried. 195 The remaining RTP header fields are used as specified in [RFC 3550]. 197 3.2. Payload Format Structure 199 The IP-MR payload format consists of a payload header with general 200 information about packet, a speech table of contents (TOC), and speech 201 data. An optional redundancy section follows after speech data. The 202 redundancy section consists of redundancy header, redundancy TOC and 203 redundancy data payload. 205 The following diagram shows the standard payload format layout: 207 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 208 | payload | speech | speech | redundancy | redundancy | redundancy | 209 | header | TOC | data | header | TOC | data | 210 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 212 3.3. Payload Header 214 The payload header has the following format: 216 0 1 217 0 1 2 3 4 5 6 7 8 9 0 1 218 +-+-+-+-+-+-+-+-+-+-+-+-+ 219 |T| CR | BR |D|A|GR |R| 220 +-+-+-+-+-+-+-+-+-+-+-+-+ 222 o T (1 bit): Reserved compatibility with future extensions. MUST 223 be set to 0. 225 o CR (3 bits): coding rate of frame(s) in this packet, as per the 226 following table: 228 +-------+--------------+ 229 | CR | avg. bitrate | 230 +-------+--------------+ 231 | 0 | 7.7 kbps | 232 | 1 | 9.8 kbps | 233 | 2 | 14.3 kbps | 234 | 3 | 20.8 kbps | 235 | 4 | 27.9 kbps | 236 | 5 | 34.2 kbps | 237 | 6 | (reserved) | 238 | 7 | NO_DATA | 239 +-------+--------------+ 241 The CR value 7 (NO_DATA) indicates that there is no speech data (and 242 speech TOC accordingly) in the payload. This MAY be used to transmit 243 redundancy data only. The value 6 is reserved. If receiving this value 244 the packet MUST be discarded. 246 o BR (3 bits): base rate for core layer of frame(s) in this packet 247 using the table for CR. The base rate is the lowest rate for 248 scalability, so speech payload can be scaled down not lower than BR 249 value. Packets with BR = 6 or BR > CR MUST be discarded. 251 o D (1 bit): reserved. Must be always set to 1. 252 Previously, this bit indicated DTX mode availability, but in fact 253 payload dublicates this information. 255 o A (1 bit): reserved. Must be always set to 1. 256 Previously, this bit indicated aligned mode, but this mode has 257 never been used and was always set to 1. 259 o GR (2 bits): number of frames in packet (grouping size). Actual 260 grouping size is GR + 1, thus maximum grouping supported is 4. 262 o R (1 bit): redundancy presence bit. If R=1 then the packet 263 contains redundancy information for lost packets recovery. 264 In this case after speech data the redundancy section is present. 266 3.4. Speech Table of Contents 268 The speech TOC contains entries for each frame in packet (grouping size 269 in total). Each entry contains a single field: 271 0 272 +-+ 273 |E| 274 +-+ 276 o E (1 bit): frame existence indicator. If set to 0, this indicates 277 the corresponding frame is absent and the receiver should set 278 special LOST_FRAME flag for decoder. This can be followed by the 279 lost frame itself or by empty frames generated by the encoder 280 during silence intervals in DTX mode. 282 Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 283 is empty. 285 3.5. Speech Data 287 Speech data of a payload contains one or more speech frames or comfort 288 noise frames, as specified in the speech TOC of the payload. 290 Each speech frame represents 20 ms of speech encoded with the rate 291 indicated in the CR and base rate indicated in BR field of the payload 292 header. 294 The size of coded speech frame is variable due to the nature of codec. 295 The Encoder's algorithm decides what size of each frame is and returns 296 it after encoding. In order to save bandwidth the size is not placed 297 into payload obviously. The frame size can be determined by frame's 298 content using a special service function specified in Appendix A. 299 This function provides complete information about coded frame including 300 size, number of layers, size of each layer and size of perceptual 301 sensitive classes. 303 3.6. Redundancy Header 305 If a packet contains redundancy (R field of payload header is 1) the 306 speech data is followed by redundancy header: 308 0 1 2 3 4 5 309 +-+-+-+-+-+-+ 310 | CL1 | CL2 | 311 +-+-+-+-+-+-+ 313 Redundancy header consists of two fields. Each field contains class 314 specifier for amount of redundancy partly taken from the preceding 315 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 316 current packet by 1 and 2 packets accordingly. The values are listed 317 in the table below: 319 +-------+-------------------+ 320 | CL | amount redundancy | 321 +-------+-------------------+ 322 | 0 | NONE | 323 | 1 | CLASS A | 324 | 2 | CLASS B | 325 | 3 | CLASS C | 326 | 4 | CLASS D | 327 | 5 | CLASS E | 328 | 6 | CLASS F | 329 | 7 | (reserved) | 330 +-------+-------------------+ 332 Each specifier takes 3 bits, thus the total redundancy header size is 6 333 bits. 335 These classes indicate subjective importance of bits from core layer. 336 Class A contains the bits most sensitive to errors and lost of these 337 bits results in a corrupted speech frame which should not be decoded 338 without applying packet loss concealment (PLC) procedure. Class B is 339 less sensitive than class A and so on to F. Sum of all bit classes 340 from A to F composes core layer. 342 Putting some part (classes of bits) from previous frame into current 343 packet makes possible to partially decode previous frame in case of 344 it's lost. Than more information is delivered than less speech quality 345 degradation will be. Flags CL1 and CL2 specify how many classes from 346 previous frames current packet contain. E.g. CL1=3 (class C), it means 347 that packet contains bits from classes A, B and C of previous frame. 348 If CL1=6 (class F) then whole core layer is included. 350 3.7. Redundancy Table of Contents 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | Pkt1 Entries| Pkt2 Entries| 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 The redundancy TOC contains entries for redundancy frames from preceding 357 and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 358 (3.3): 360 0 361 +-+ 362 |E| 363 +-+ 365 o E (1 bit): frame existence indicator. If set to 0, this indicates 366 the corresponding frame is absent. 368 o For each preceding and pre-preceding packet the number of entries 369 is equal to the grouping size of the current packet. E.g. maximum 370 number of entries is 4*2 = 8. 372 o If class specifier in the redundancy header is CL=0 (NO_DATA) 373 then there is no entries for corresponding packet redundancy. 375 3.8. Redundancy Data 377 Redundancy data of a payload contains redundancy information for one or 378 more speech frames or comfort noise frames that may be lost during 379 transition, as specified in the redundancy TOC of the payload. Actually 380 redundancy is the most important part of preceding frames representing 381 20 ms of speech. This data MAY be used for partial reconstruction of 382 lost frames. The amount of available redundancy is specified by CL flag 383 in redundancy header section (3.5). This flag SHOULD be passed to 384 decoder. The size of redundancy frame is variable and can be obtained 385 using service function specified in Appendix A. 387 4. Payload Examples 389 A few examples to highlight the payload format follow. 391 4.1. Payload Carrying a Single Frame 393 The following diagram shows a standard IP-MR payload carrying a single 394 speech frame without redundancy: 396 0 1 2 3 397 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | sp(193)|P| 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 In the payload the speech frame is not damaged at the IP origin (E=1), 415 the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and 416 the DTX mode is off. There is no byte alignment (A=0) and no redundancy 417 (R=0). The encoded speech bits - s(0) to s(193) - are placed immediately 418 after TOC. Finally, one zero bit is added at the end as padding to make 419 the payload byte aligned. 421 4.2. Payload Carrying Multiple Frames with Redundancy 423 The following diagram shows a payload that contains three frames, one of 424 them with no speech data. The coding rate is 7.7 kbps (CR=0), the base 425 rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are 426 byte aligned (A=1), so 1 zero bit is added at the end of the header. 427 Besides the speech frames the payload contains six redundancy frames 428 (three per each delayed packet). 430 The first speech frame consists of bits sp1(0) to sp1(92). After that 3 431 bits are added for byte alignment. The second frame does not contain any 432 speech information that is represented in the payload by its TOC entry. 433 The third frame consists of bits sp3(0) to sp3(171). 435 The redundancy header follows after speech data. The one-packet-delayed 436 redundancy contains class A+B bits (CL1=2), and two-packet-delayed 437 redundancy contains class A bits (Cl2=1). The one-packet-delayed 438 redundancy contains three frames with 20, 39 and 35 bits respectively. 440 The first frame of two-packet-delayed redundancy is absent, it is 441 represented in its TOC entry, and two other frames have sizes 15 and 19 442 bits. 444 Note that all speech frames are padded with zero bits for byte 445 alignment. 447 0 1 2 3 448 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | sp1(92)|P|P|P|sp3(0) | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 | | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | sp3(171)|P|P|P|P| 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)| 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 |red1_2(0) | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | red1_2(38)|red1_3(0) | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | red2_3(18)|P|P|P|P| 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 5. Media Type Registration 481 This section describes the media types and names associated with this 482 payload format. 484 5.1. Registration of media subtype audio/ip-mr_v2.5 486 Type name: audio 488 Subtype name: ip-mr_v2.5 490 Required parameters: none 492 Optional parameters: 493 * ptime: Gives the length of time in milliseconds represented by the 494 media in a packet. Allowed values are: 20, 40, 60 and 80. 496 Encoding considerations: This media type is framed binary data (see RFC 497 4288, Section 4.8). 499 Security considerations: See RFC 3550 [RFC 3550] 501 Interoperability considerations: none 503 Published specification: RFC XXXX 505 Applications that use this media type: Real-time audio applications like 506 voice over IP and teleconference, and multi-media streaming. 508 Additional information: none 510 Person & email address to contact for further information: 511 Yury Morzeev 512 morzeev@spiritdsp.com 514 Intended usage: COMMON 516 Restrictions on usage: This media type depends on RTP framing, and hence 517 is only defined for transfer via RTP [RFC 3550]. 519 Authors: 520 Sergey Ikonin 522 Change controller: IETF Audio/Video Transport working group delegated 523 from the IESG. 525 5.2. Mapping Media Type Parameters into SDP 527 The information carried in the media type specification has a specific 528 mapping to fields in the Session Description Protocol (SDP) [RFC 4566], 529 which is commonly used to describe RTP sessions. When SDP is used to 530 specify sessions employing the IP-MR codec, the mapping is as follows: 532 o The media type ("audio") goes in SDP "m=" as the media name. 534 o The media subtype (payload format name) goes in SDP "a=rtpmap" 535 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 537 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 539 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 540 them directly from the media type parameter string as a semicolon- 541 separated list of parameter=value pairs. 543 Note that the payload format (encoding) names are commonly shown in 544 upper case. Media subtypes are commonly shown in lower case. These 545 names are case-insensitive in both places. 547 6. Security Considerations 549 RTP packets using the payload format defined in this specification 550 are subject to the security considerations discussed in the RTP 551 specification [RFC 3550] and in any applicable RTP profile. The main 552 security considerations for the RTP packet carrying the RTP payload 553 format defined within this memo are confidentiality, integrity, and 554 source authenticity. Confidentiality is achieved by encryption of the 555 RTP payload. Integrity of the RTP packets is achieved through a suitable 556 cryptographic integrity protection mechanism. Such a cryptographic 557 system may also allow the authentication of the source of the payload. 559 A suitable security mechanism for this RTP payload format should 560 provide confidentiality, integrity protection, and at least source 561 authentication capable of determining if an RTP packet is from a 562 member of the RTP session. 564 Note that the appropriate mechanism to provide security to RTP and 565 payloads following this memo may vary. It is dependent on the 566 application, the transport, and the signaling protocol employed. 567 Therefore, a single mechanism is not sufficient, although if suitable, 568 usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is 569 recommended. Other mechanisms that may be used are IPsec [RFC 4301] 570 and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other 571 alternatives may exist. 573 This payload format does not exhibit any significant non-uniformity in 574 the receiver side computational complexity for packet processing, and 575 thus is unlikely to pose a denial-of-service threat due to the receipt 576 of pathological data. 578 7. Congestion Control 580 The general congestion control considerations for transporting RTP data 581 apply; see RTP [RFC 3550] and any applicable RTP profile like AVP 582 [RFC 3551]. However, the multi-rate capability of IP-MR speech coding 583 provides a mechanism that may help to control congestion, since the 584 bandwidth demand can be adjusted by selecting a different encoding mode. 586 The number of frames encapsulated in each RTP payload highly 587 influences the overall bandwidth of the RTP stream due to header 588 overhead constraints. Packetizing more frames in each RTP payload 589 can reduce the number of packets sent and hence the overhead from 590 IP/UDP/RTP headers, at the expense of increased delay. 592 If in-band redundancy scheme is used to protect against packet loss, 593 the amount of introduced redundancy will need to be regulated so that 594 the use of redundancy itself does not cause a congestion problem. In 595 other words, a sender SHALL NOT increase the total bitrate when adding 596 redundancy in response to packet loss, and needs instead to adjust it 597 down in accordance to the congestion control algorithm being run. Thus, 598 when adding redundancy, the media bitrate will need to be reduced to 599 provide room for the redundancy. 601 8. IANA Considerations 603 One media type has been defined and needs registration in the media 604 types registry. 606 9. Normative References 608 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 609 Requirement Levels", BCP 14, RFC 2119, March 1997. 611 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and 612 V. Jacobson, "RTP: A Transport Protocol for Real-Time 613 Applications", STD 64, RFC 3550, July 2003. 615 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio 616 and Video Conferences with Minimal Control", STD 65, 617 RFC 3551, July 2003. 619 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 620 Description Protocol", RFC 4566, July 2006. 622 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 623 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 624 3711, March 2004. 626 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer 627 Security (TLS) Protocol Version 1.2", RFC 5246, 628 August 2008. 630 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 631 Internet Protocol", RFC 4301, December 2005. 633 10. Author(s) Information: 635 Sergey Ikonin 636 email: info@spiritdsp.com 638 Russia 109004 639 Building 27, A. Solzhenitsyna street 640 Tel: +7 495 661-2178 641 Fax: +7 495 912-6786 643 11. Disclaimer 645 This document may contain material from IETF Documents or IETF 646 Contributions published or made publicly available before November 10, 647 2008. The person(s) controlling the copyright in some of this material 648 may not have granted the IETF Trust the right to allow modifications of 649 such material outside the IETF Standards Process. Without obtaining an 650 adequate license from the person(s) controlling the copyright in such 651 materials, this document may not be modified outside the IETF Standards 652 Process, and derivative works of it may not be created outside the IETF 653 Standards Process, except to format it for publication as an RFC or to 654 translate it into languages other than English. 656 12. Legal Terms 658 All IETF Documents and the information contained therein are provided on 659 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 660 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 661 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 662 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 663 INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 664 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 666 The IETF Trust takes no position regarding the validity or scope of any 667 Intellectual Property Rights or other rights that might be claimed to 668 pertain to the implementation or use of the technology described in any 669 IETF Document or the extent to which any license under such rights might 670 or might not be available; nor does it represent that it has made any 671 independent effort to identify any such rights. 673 Copies of Intellectual Property disclosures made to the IETF Secretariat 674 and any assurances of licenses to be made available, or the result of an 675 attempt made to obtain a general license or permission for the use of 676 such proprietary rights by implementers or users of this specification 677 can be obtained from the IETF on-line IPR repository at 678 http://www.ietf.org/ipr. 680 The IETF invites any interested party to bring to its attention any 681 copyrights, patents or patent applications, or other proprietary rights 682 that may cover technology that may be required to implement any standard 683 or specification contained in an IETF Document. Please address the 684 information to the IETF at ietf-ipr@ietf.org. 686 The definitive version of an IETF Document is that published by, or 687 under the auspices of, the IETF. Versions of IETF Documents that are 688 published by third parties, including those that are translated into 689 other languages, should not be considered to be definitive versions of 690 IETF Documents. The definitive version of these Legal Provisions is that 691 published by, or under the auspices of, the IETF. Versions of these 692 Legal Provisions that are published by third parties, including those 693 that are translated into other languages, should not be considered to be 694 definitive versions of these Legal Provisions. 696 For the avoidance of doubt, each Contributor to the IETF Standards 697 Process licenses each Contribution that he or she makes as part of the 698 IETF Standards Process to the IETF Trust pursuant to the provisions of 699 RFC 5378. No language to the contrary, or terms, conditions or rights 700 that differ from or are inconsistent with the rights and licenses 701 granted under RFC 5378, shall have any effect and shall be null and 702 void, whether published or posted by such Contributor, or included with 703 or in such Contribution. 705 APPENDIX A. RETRIEVING FRAME INFORMATION 707 This appendix contains the c-code for implementation of frame parsing 708 function. This function extracts information about coded frame including 709 frame size, number of layers, size of each layer and size of perceptual 710 sensitive classes. 712 A.1. get_frame_info.c 714 /* 715 Copyright (c) 2010 IETF Trust and the persons identified as authors 716 of the code. All rights reserved. 718 Redistribution and use in source and binary forms, with or without 719 modification, are permitted provided that the following conditions 720 are met: 722 - Redistributions of source code must retain the above copyright 723 notice, this list of conditions and the following disclaimer. 725 - Redistributions in binary form must reproduce the above copyright 726 notice, this list of conditions and the following disclaimer in the 727 documentation and/or other materials provided with the distribution. 729 - Neither the name of the Xiph.org Foundation nor the names of its 730 contributors may be used to endorse or promote products derived from 731 this software without specific prior written permission. 733 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 734 ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 735 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 736 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 737 FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 738 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 739 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 740 OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 741 AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 742 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF 743 THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH 744 DAMAGE. 745 */ 747 /****************************************************************** 749 get_frame_info.c 751 Retrieving frame information for IP-MR Speech Codec 753 ******************************************************************/ 755 #define RATES_NUM 6 // number of codec rates 756 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 758 // frame types 759 #define FT_SPEECH 0 // active speech 760 #define FT_DTX_SID 1 // silence insertion descriptor 762 // get specified bit from coded data 763 int GetBit(unsigned char *data, int curBit) 764 { 765 return ((data[curBit >> 3] >> (curBit % 8)) & 1); 766 } 768 // retrieve frame information 769 int GetFrameInfo( // o: frame size in bits 770 short rate, // i: encoding rate (0..5) 771 short base_rate, // i: base (core) layer rate, 772 // if base_rate > rate, then assumed 773 // that base_rate = rate. 774 unsigned char *pCoded, // i: coded bit frame 775 short pLayerBits // o: number of bits in layers 776 [RATES_NUM], 777 short pSenseBits // o: number of bits in sensitivity classes 778 [SENSE_CLASSES], 779 short *nLayers // o: number of layers 780 ) 781 { 782 static const short Bits_1[4] = {0, 9, 9, 15}; 783 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44, 784 45,43,44,47,36}; 786 static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, 787 {25, 0, 23, 32, 36, 31},}; 789 int FrType; 790 int i,nBits; 792 if (rate < 0 || rate > 5) { 793 return 0; // incorrect stream 794 } 796 for(i = 0; i < SENSE_CLASSES; i++) { 797 pSenseBits[i] = 0; 798 } 800 nBits = 0; 801 // extract frame type bit if required 802 FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID; 804 { 805 int cw_0; 806 int b[14]; 808 // extract meaning bits 809 for(i = 0 ; i < 14; i++) { 810 b[i] = GetBit(pCoded, nBits++); 811 } 813 // parse 814 if(FrType == FT_DTX_SID) { 815 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 816 rate = 0; 817 pSenseBits[0] = 10 + Bits_2[cw_0]; 818 } else { 820 int i, idx; 821 int nFlag_1, nFlag_2, cw_1, cw_2; 823 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 824 cw_1 = (cw_1 << 1) | b[0]; 825 cw_1 = (cw_1 << 1) | b[2]; 826 cw_1 = (cw_1 << 1) | b[4]; 827 cw_1 = (cw_1 << 1) | b[6]; 829 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 830 cw_2 = (cw_2 << 1) | b[1]; 831 cw_2 = (cw_2 << 1) | b[3]; 832 cw_2 = (cw_2 << 1) | b[5]; 833 cw_2 = (cw_2 << 1) | b[7]; 835 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 836 if (base_rate < 0) base_rate = 0; 837 if (base_rate > rate) base_rate = rate; 838 idx = base_rate == 0 ? 0 : 1; 840 pSenseBits[0] = 15+Bits_2[cw_0]; 841 pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; 842 pSenseBits[2] = nFlag_1*5; 843 pSenseBits[3] = nFlag_2*30; 844 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 846 for (i = 1; i < rate+1; i++) { 847 pLayerBits[i] = 4*(Bits_3[idx][i]); 848 } 849 } 851 pLayerBits[0] = 0; 852 for (i = 0; i < SENSE_CLASSES; i++) { 853 pLayerBits[0] += pSenseBits[i]; 854 } 856 *nLayers = rate+1; 857 } 859 { 860 // count total frame size 861 int payloadBitCount = 0; 862 for (i = 0; i < *nLayers; i++) { 863 payloadBitCount += pLayerBits[i]; 864 } 865 return payloadBitCount; 866 } 867 } 869 Authors' Addresses 871 SPIRIT DSP 872 Building 27, A. Solzhenitsyna street 873 109004, Moscow, RUSSIA 875 Tel: +7 495 661-2178 876 Fax: +7 495 912-6786 877 EMail: info@spiritdsp.com