idnits 2.17.1 draft-ietf-avt-rtp-ipmr-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 13, 2010) is 4943 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '6' on line 844 -- Looks like a reference, but probably isn't: '4' on line 843 -- Looks like a reference, but probably isn't: '16' on line 800 -- Looks like a reference, but probably isn't: '2' on line 860 -- Looks like a reference, but probably isn't: '14' on line 823 -- Looks like a reference, but probably isn't: '0' on line 871 -- Looks like a reference, but probably isn't: '1' on line 858 -- Looks like a reference, but probably isn't: '3' on line 861 -- Looks like a reference, but probably isn't: '5' on line 862 -- Looks like a reference, but probably isn't: '7' on line 850 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Audio/Video Transport Working Group S. Ikonin 3 Internet Draft SPIRIT DSP 4 Intended status: Proposed Standard October 13, 2010 6 RTP Payload Format for IP-MR Speech Codec 7 draft-ietf-avt-rtp-ipmr-14.txt 9 Abstract 11 This document specifies the payload format for packetization of 12 SPIRIT IP-MR encoded speech signals into the real-time transport 13 protocol (RTP). The payload format supports transmission of multiple 14 frames per packet and introduced redundancy for robustness against 15 packet loss and bit errors. 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that other 24 groups may also distribute working documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/1id-abstracts.html 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html 37 This Internet-Draft will expire on December 18, 2010. 39 Copyright Notice 41 Copyright (c) 2010 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 The source codes included in this document are provided under BSD 55 license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf). 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. IP-MR Codec Description . . . . . . . . . . . . . . . . . . . . 3 61 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 4 63 3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . . . 5 64 3.3. Speech Payload Header . . . . . . . . . . . . . . . . . . . 5 65 3.4. Speech Payload Table of Contents . . . . . . . . . . . . . 6 66 3.5. Speech Payload Data . . . . . . . . . . . . . . . . . . . . 6 67 3.6. Redundancy Payload Header . . . . . . . . . . . . . . . . . 7 68 3.7. Redundancy Payload Table of Contents . . . . . . . . . . . 8 69 3.8. Redundancy Payload Data . . . . . . . . . . . . . . . . . . 8 70 4. Payload Examples . . . . . . . . . . . . . . . . . . . . . . . . 9 71 4.1. Payload Carrying a Single Frame . . . . . . . . . . . . . . 9 72 4.2. Payload Carrying Multiple Frames with Redundancy . . . . 10 73 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11 74 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 75 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 12 76 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 13 77 7.2. Mapping Media Type Parameters into SDP . . . . . . . . . 14 78 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 79 9. Normative References . . . . . . . . . . . . . . . . . . . . . 14 80 10. Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . 15 81 11. Legal Terms . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 16 83 APPENDIX A. RETRIEVING FRAME INFORMATION . . . . . . . . . . . . 17 84 A.1. get_frame_info.c . . . . . . . . . . . . . . . . . . . . 17 86 1. Introduction 88 This document specifies the payload format for packetization of 89 SPIRIT IP-MR encoded speech signals into the real-time transport 90 protocol (RTP). The payload format supports transmission of multiple 91 frames per packet and introduced redundancy for robustness against 92 packet loss and bit errors. 94 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 95 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 96 document are to be interpreted as described in RFC 2119 [RFC 2119]. 98 2. IP-MR Codec Description 100 IP-MR is a wideband speech codec designed by SPIRIT for conferencing 101 services over packet-switched networks such as the Internet. 103 IP-MR is a scalable codec. It means that not only source has the 104 ability to change transmission rate on a fly, but the gateway is also 105 able to decrease bandwidth at any time without performance overhead. 106 There are 6 coding rates from 7.7 to 34.2 kbps available. 108 Codec operates on a frame-by-frame basis with a frame size of 20 ms 109 at 16 kHz sampling rate with the total end-to-end delay of 25ms. Each 110 compressed frame represented as a sequence of layers. The first 111 (base) layer is mandatory while the other (enhancement) can be safely 112 discarded. Information about particular frame structure is available 113 from the payload header. In order to adjust outgoing bandwidth the 114 gateway MUST read frame(s) structure from the payload header, define 115 which enhancement layers to discard and compose new RTP packet 116 according to this specification. 118 In fact, not all of bits within a frame are equally tolerant to 119 distortion. IP-MR defines 6 classes ('A'-'F') of sensitivity to bit 120 errors. Any damage of class 'A' bits cause significant reconstruction 121 artifacts while the lost in class 'F' may be even not perceived by 122 the listener. Note, only base layer in a bitstream is represented as 123 a set of classes. 125 The IP-MR payload format allows frame duplicate through the packets 126 to improve robustness against packet loss (Section 3.6). Base layer 127 can be retransmitted completely or in several sensitive classes. 128 Enchantment layers are not retransmittable. 130 The fine-grained redundancy in conjunction with bitrate scalability 131 allows application adjust the trade-off between overhead and 132 robustness against packet loss. Note, this approach supported 133 natively within a packet and requires no out-of-band signals or 134 session initialization procedures. 136 Main IP-MR features are as the following: 138 o High quality wideband speech codec. 140 o Bitrate scalable with 6 average rates from 7.7 to 34.2 kbps. 142 o Built-in discontinuous transmission (DTX) and comfort noise 143 generation (CNG) support. 145 o Flexible in-band redundancy control scheme for packet loss 146 protection. 148 3. Payload Format 150 The payload format consists of the RTP header, and IP-MR payload. 152 3.1. RTP Header Usage 154 The format of the RTP header is specified in RFC 1889. This payload 155 format uses the fields of the header in a manner consistent with that 156 specification. 158 The RTP timestamp corresponds to the sampling instant of the first 159 sample encoded for the first frame-block in the packet. The timestamp 160 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 161 this corresponding to 320 samples per frame. Thus the timestamp is 162 increased by 320 for each consecutive frame. The timestamp is also 163 used to recover the correct decoding order of the frame-blocks. 165 The RTP header marker bit (M) SHALL be set to 1 whenever the first 166 frame-block carried in the packet is the first frame-block in a 167 talkspurt (see definition of the talkspurt in Section 4.1 168 [RFC 3551]). For all other packets, the marker bit SHALL be set to 169 zero (M=0). 171 The assignment of an RTP payload type for the format defined in this 172 memo is outside the scope of this document. The RTP profiles in use 173 currently mandate binding the payload type dynamically for this 174 payload format. This is basically necessary because the payload type 175 expresses the configuration of the payload itself, i.e. basic or 176 interleaved mode, and the number of channels carried. 178 The remaining RTP header fields are used as specified in [RFC 3550]. 180 3.2. RTP Payload Structure 182 The IP-MR payload composed of two payloads, one for current (speech) 183 speech and one for redundancy. Both of payloads are represented in a 184 form of: Header, Table of contents (TOC) and Data. Redundancy payload 185 carries data for preceding and pre-preceding packets. 187 +--------+-----+----------------------+- - - - +- - +- - - - - + 188 | Header | TOC | Data | Header | TOC | Data | 189 +--------+-----+----------------------+- - - - +- - +- - - - - + 190 |<- Speech -------------------------->|<- Redundancy (opt) ---->| 192 3.3. Speech Payload Header 194 This header carries parameters which are common for all frames in the 195 packet: 197 0 1 198 0 1 2 3 4 5 6 7 8 9 0 1 199 +-+-+-+-+-+-+-+-+-+-+-+-+ 200 |T| CR | BR |D|A|GR |R| 201 +-+-+-+-+-+-+-+-+-+-+-+-+ 203 o T (1 bit): Reserved. MUST be always set to 0. Receiver SHOULD 204 discard packet if 'T' bit is not equal to 0. 206 o CR (3 bits): Coding rate index - top enchantment layer 207 available. The CR value 7 (NO_DATA) indicates that there is no 208 speech data (and speech TOC accordingly) in the payload. This MAY 209 be used to transmit redundancy data only. 211 o BR (3 bits): Base rate index - base layer bitrate. Speech 212 payload can be scaled to any rate index between BR and CR. Packets 213 with BR = 6 or BR > CR MUST be discarded. Redundancy data is also 214 considered as having a base rate of BR. 216 o D (1 bit): Reserved. MUST be always set to 1. Receiver MAY 217 discard packet if 'D' bit is zero. 219 o A (1 bit): Byte-alignment. The value of 1 specifies that padding 220 bits were added to enable each compressed frame (3.5) starts with 221 the byte (8 bit) boundary. The value of 0 specifies unaligned 222 frames. Note, speech payload is always padded to byte boundary 223 independently on 'A' bit value. 225 o GR (2 bits): Number of frames in packet (grouping size). Actual 226 grouping size is GR + 1, thus maximum grouping supported is 4. 228 o R (1 bit): Redundancy presence. Value of 1 indicates redundancy 229 payload presence. 231 Note, the values of 'T' and 'D' bits are fixed, any other values are 232 not allowed by specification. Note, the values of padding bit is not 233 specified. 235 The following table defines mapping between rate index and rate 236 value: 238 +------------+--------------+ 239 | rate index | avg. bitrate | 240 +------------+--------------+ 241 | 0 | 7.7 kbps | 242 | 1 | 9.8 kbps | 243 | 2 | 14.3 kbps | 244 | 3 | 20.8 kbps | 245 | 4 | 27.9 kbps | 246 | 5 | 34.2 kbps | 247 | 6 | (reserved) | 248 | 7 | NO_DATA | 249 +------------+--------------+ 251 The value of 6 is reserved. If receiving this value the packet MUST 252 be discarded. 254 3.4. Speech Payload Table of Contents 256 The speech TOC is a bit mask indicating the presence of each frame in 257 the packet. TOC is only available if 'CR' value is not equal to 7 258 (NO_DATA). 260 0 1 2 3 261 +-+-+-+-+ 262 |E|E|E|E| 263 +-+-+-+-+ 264 |<----->| <-- #(GR+1) 266 o E (1 bit): Frame existence indicator. The value of 0 indicates 267 speech data does not present for corresponding frame. IP-MR 268 encoder sets E flag to 0 for the periods of silence in DTX mode. 269 Application MUST set this bit to 0 if the frame is known to be 270 damaged. 272 3.5. Speech Payload Data 274 Speech data contains (GR+1) compressed IP-MR frames (20ms of data). 275 Compressed frame have zero length if corresponding TOC flag is zero. 277 The beginning of each compressed frame is aligned if 'A' bit is 278 nonzero, while the end of speech payload is always aligned to a byte 279 (8 bit) boundary: 281 +- - -+------------+------------+------------+------------+ 282 | TOC | Frame1 | Frame2 | Frame3 | Frame4 | 283 +- - -+------------+------------+------------+------------+ ALWAYS 284 |<- aligned |<- aligned |<- aligned |<- aligned |<- ALIGNED 286 Marked regions MUST be padded only if 'A' bit is set to '1'. 288 The compressed frame structure is the following: 290 |<---- sensitive classes ------>|<----- enchantment layers -------->| 291 +-------------------------------+----+-----+------+- - - - - +------+ 292 | L1 (Base Layer) | L2 | L3 | L4 | | LN | 293 +-------------------------------+----+-----+------+- - - - - +------+ 294 |<- A --->|<- B ->| ... |<- F ->| | 295 |<- BR rate ------------------->| | 296 |<- CR rate ------------------------------------------------------->| 298 The Annex A of this document provides helper routine written in "C" 299 which MUST be used to extract sensitivity classes and enchantment 300 layers bounds from the compressed frame data. 302 3.6. Redundancy Payload Header 304 The redundancy payload presence is signaled by R bit of speech 305 payload header. Redundancy header composed of two fields of 3 bits 306 each: 308 0 1 2 3 4 5 309 +-+-+-+-+-+-+ 310 | CL1 | CL2 | 311 +-+-+-+-+-+-+ 313 Both of 'CL1' and 'CL2' fields specify the sensitivity classes 314 available for preceding and pre-preceding packets correspondingly. 316 +-------+--------------------+ 317 | CL | Redundancy classes | 318 | | available | 319 +-------+--------------------+ 320 | 0 | NONE | 321 | 1 | A | 322 | 2 | A-B | 323 | 3 | A-C | 324 | 4 | A-D | 325 | 5 | A-E | 326 | 6 | A-F | 327 | 7 | (reserved) | 328 +-------+--------------------+ 330 Receiver can reconstruct base layer of preceding packets completely 331 (CL=6) or partially (0| pre-preceding payload #(GR+1) 349 |<----->| preceding payload #(GR+1) 351 o E (1 bit): Redundancy frame existence indicator. The value of 0 352 indicates redundancy data does not present for corresponding frame. 354 3.8. Redundancy Payload Data 356 IP-MR defines 6 classes ('A'-'F') of sensitivity to bit errors. Any 357 damage of class 'A' bits cause significant reconstruction artifacts 358 while the lost in class 'F' may be even not perceived by the 359 listener. Note, only base layer in a bitstream is represented as a 360 set of classes. Together, the set of sensitivity classes approach and 361 redundancy allows IP-MR duplicate frames through the packets to 362 improve robustness against packet loss. 364 Redundancy data carries a number of sensitivity classes for preceding 365 and pre-preceding packets as indicated by 'CL1' and 'CL2' fields of 366 redundancy header. The sensitivity classes data is available 367 individually for each frame only if corresponding 'E' bit of 368 redundancy TOC is nonzero: 370 +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+ 371 |A-C|A-B|1000|1001|cl_A1|cl_B1|cl_C1|cl_A1|cl_B1|cl_A4|cl_B4| 372 +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+ 373 |<- CL >|<- TOC ->|<- preceding --->|<- pre-preceding ----->| 375 Redundancy data only available if base (BR) and coding (CR) rates of 376 preceding and pre-preceding packets are the same as for the current 377 packet. 379 Receiver MAY use redundancy data to compensate packet loss, note this 380 case the 'CL' field MUST be also passed to decoder. Helper routine 381 provided in Annex A MUST be used to extract sensitivity classes 382 length for each frame. The following pseudo code describes the 383 sequence of operations: 385 int sensitivityBits[numOfRedundancyFrames][6]; 386 int redundancyBits [numOfRedundancyFrames]; 387 for(i = 0 ; i < numOfRedundancyFrames; i++) { 388 GetFrameInfo(CR, BR, pRedundancyPayloadData, dummy, 389 sensitivityBits[i], dummy); 390 redundancyBits[i] = 0; 391 for(j = 0; j < CL[i]; j++ ) { 392 redundancyBits[i] += sensitivityBits[i][j]; 393 } 394 flushBits(pRedundancyPayloadData, redundancyBits[i]); 395 } 397 4. Payload Examples 399 This section provides detailed examples of IP-MR payload format. 401 4.1. Payload Carrying a Single Frame 403 The following diagram shows typical IP-MR payload carrying a one 404 (GR=0) non-aligned (A=0) speech frame without redundancy (R=0). The 405 base layer is coded at 7.8 kbps (BR=0) while the coding rate is 9.7 406 kbps (CR=1). The 'E' bit value of 1 signals that compressed frame 407 bits s(0) - s(193) are present. There is a padding bit 'P' to 408 maintain speech payload size alignment. 410 0 1 2 3 411 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 |0|CR=1 |BR=0 |1|0|0 0|0|1|s(0) | 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 415 | | 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 417 | | 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 | | 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 | | 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 | | 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | s(193)|P| 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 4.2. Payload Carrying Multiple Frames with Redundancy 430 The following diagram shows a payload carrying 3 (GR=2) aligned (A=1) 431 speech frames with redundancy (R=1). The TOC value of '101' indicates 432 speech data presents for a first (bits sp1(0)-sp1(92)) and third 433 frames (bits sp3(0)-sp3(171)). There is no enchantment layers because 434 of base and coding rates are equal (BR=CR=0). Padding bit 'P' is 435 inserted to maintain necessary alignment. 437 The redundancy payload presents for both preceding and pre-preceding 438 payloads (CL1 = A-B, CL2=A), but redundancy data only available for a 439 5 (TOC='111011') of 6 (2*(GR+1)) frames. There are redundancy data of 440 20, 39 and 35 bits for each three frames of preceding packet and 15 441 and 19 bits for two frames of pre-preceding packet. 443 0 1 2 3 444 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | sp1(92)|P|P|P|sp3(0) | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | sp3(171)|P|P|P|P| 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1_AB(0) red1_1_AB(19)| 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 |red1_2_AB(0) | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 |red1_2_AB(38)|red1_3_AB(0) | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | red1_3_AB(34)|red2_2_A(0) red2_2_A(14)|red2_3_A(0) | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | red2_3_A(18)|P|P|P|P| 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 5. Congestion Control The general congestion control considerations for 476 transporting RTP data applicable to IP-MR speech over RTP (see RTP 477 [RFC 3550] and any applicable RTP profile like AVP [RFC 3551]). 478 However, the multi-rate capability of IP-MR speech coding provides a 479 mechanism that may help to control congestion, since the bandwidth 480 demand can be adjusted by selecting a different encoding mode. 482 The number of frames encapsulated in each RTP payload highly 483 influences the overall bandwidth of the RTP stream due to header 484 overhead constraints. Packetizing more frames in each RTP payload can 485 reduce the number of packets sent and hence the overhead from 486 IP/UDP/RTP headers, at the expense of increased delay. 488 Due to scalability nature of IP_MR codec the transmission rate can be 489 reduced at any transport stage to fit channel bandwidth. The minimal 490 rate is specified by BR field of payload header and can be is low as 491 7.7 kbps. It is up to application to keep balance between coding 492 quality (high BR) and bitstream scalability (small BR). Because of 493 coding quality depends rather on coding rate(CR) than base rate (BR), 494 it is NOT RECOMMENDED to use high BR values for real-time 495 communications. 497 Application MAY utilize bitstream redundancy to combat packet loss. 498 But the gateway is free to chose any option to reduce transmission 499 rate - coding layer or redundancy bits can be dropped. Due to this 500 fact it is NOT RECOMMENDED application to increase total bitrate when 501 adding redundancy in a response to packet loss. 503 6. Security Considerations 505 RTP packets using the payload format defined in this specification 506 are subject to the security considerations discussed in the RTP 507 specification [RFC 3550] and in any applicable RTP profile. The main 508 security considerations for the RTP packet carrying the RTP payload 509 format defined within this memo are confidentiality, integrity, and 510 source authenticity. Confidentiality is achieved by encryption of 511 the RTP payload. Integrity of the RTP packets is achieved through a 512 suitable cryptographic integrity protection mechanism. Such a 513 cryptographic system may also allow the authentication of the source 514 of the payload. A suitable security mechanism for this RTP payload 515 format should provide confidentiality, integrity protection, and at 516 least source authentication capable of determining if an RTP packet 517 is from a member of the RTP session. 519 Note that the appropriate mechanism to provide security to RTP and 520 payloads following this memo may vary. It is dependent on the 521 application, the transport, and the signaling protocol employed. 522 Therefore, a single mechanism is not sufficient, although if 523 suitable, usage of the Secure Real-time Transport Protocol (SRTP) 524 [RFC 3711] is recommended. Other mechanisms that may be used are 525 IPsec [RFC 4301] and Transport Layer Security (TLS) [RFC 5246] (RTP 526 over TCP); other alternatives may exist. 528 This payload format does not exhibit any significant non-uniformity 529 in the receiver side computational complexity for packet processing 530 and thus is unlikely to pose a denial-of-service threat due to the 531 receipt of pathological data. 533 7. Payload Format Parameters 535 This section describes the media types and names associated with this 536 payload format. 538 The IP-MR media subtype is defined as 'ip-mr_v2.5'. This subtype was 539 registered to specify internal codec version. Later, this version was 540 accepted as the final, the bitstream was frozen and IP-MR v2.5 was 541 published under the name of IP-MR. Currently 'IP-MR' and 'IP-MR v2.5' 542 terms are synonyms. The subtype name ip-mr_v2.5 is being uses in 543 implementations. 545 7.1. Media Type Registration 547 Media Type name: audio 549 Media Subtype name: ip-mr_v2.5 551 Required parameters: none 553 Optional parameters: 554 These parameters apply to RTP transfer only. 556 ptime: The media packet length in milliseconds. Allowed values 557 are: 20, 40, 60 and 80. 559 Encoding considerations: 560 This media type is framed binary data (see RFC 4288, Section 4.8). 562 Security considerations: 563 See section 6 of RFC XXXX (RFC editor please replace with this RFC 564 number). 566 Interoperability considerations: 567 none 569 Published specification: 570 RFC XXXX (RFC editor please replace with this RFC number) 572 Applications that use this media type: 573 Real-time audio applications like voice over IP and 574 teleconference, and multi-media streaming. 576 Additional information: 577 none 579 Person & email address to contact for further information: 580 Dmitry Yudin 582 Intended usage: 583 COMMON 585 Restrictions on usage: 586 This media type depends on RTP framing, and hence is only defined 587 fortransfer via RTP [RFC 3550]. 589 Authors: 590 Sergey Ikonin Dmitry Yudin 591 593 Change controller: 594 IETF Audio/Video Transport working group delegated from the IESG. 596 7.2. Mapping Media Type Parameters into SDP 598 The information carried in the media type specification has a 599 specific mapping to fields in the Session Description Protocol (SDP) 600 [RFC 4566], which is commonly used to describe RTP sessions. When SDP 601 is used to specify sessions employing the IP-MR codec, the mapping is 602 as follows: 603 o The media type ("audio") goes in SDP "m=" as the media name. 605 o The media subtype (payload format name) goes in SDP "a=rtpmap" 606 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 608 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 610 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 611 them directly from the media type parameter string as a semicolon- 612 separated list of parameter=value pairs. 614 Note that the payload format (encoding) names are commonly shown in 615 upper case. Media subtypes are commonly shown in lower case. These 616 names are case-insensitive in both places. 618 8. IANA Considerations 620 One media type has been defined and needs registration in the media 621 types registry. 623 9. Normative References 625 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 626 Requirement Levels", BCP 14, RFC 2119, March 1997. 628 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 629 Jacobson, "RTP: A Transport Protocol for Real-Time 630 Applications", STD 64, RFC 3550, July 2003. 632 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 633 Video Conferences with Minimal Control", STD 65, RFC 3551, 634 July 2003. 636 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 637 Description Protocol", RFC 4566, July 2006. 639 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., 640 Norrman, K., "The Secure Real-Time Transport Protocol 641 (SRTP)", RFC 3711, March 2004. 643 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 644 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 646 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 647 Internet Protocol", RFC 4301, December 2005. 649 10. Disclaimer 651 This document may contain material from IETF Documents or IETF 652 Contributions published or made publicly available before November 653 10, 2008. The person(s) controlling the copyright in some of this 654 material may not have granted the IETF Trust the right to allow 655 modifications of such material outside the IETF Standards Process. 656 Without obtaining an adequate license from the person(s) controlling 657 the copyright in such materials, this document may not be modified 658 outside the IETF Standards Process, and derivative works of it may 659 not be created outside the IETF Standards Process, except to format 660 it for publication as an RFC or to translate it into languages other 661 than English. 663 11. Legal Terms 665 All IETF Documents and the information contained therein are provided 666 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 667 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 668 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 669 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 670 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 671 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 672 FOR A PARTICULAR PURPOSE. 674 The IETF Trust takes no position regarding the validity or scope of 675 any Intellectual Property Rights or other rights that might be 676 claimed to pertain to the implementation or use of the technology 677 described in any IETF Document or the extent to which any license 678 under such rights might or might not be available; nor does it 679 represent that it has made any independent effort to identify any 680 such rights. 682 Copies of Intellectual Property disclosures made to the IETF 683 Secretariat and any assurances of licenses to be made available, or 684 the result of an attempt made to obtain a general license or 685 permission for the use of such proprietary rights by implementers or 686 users of this specification can be obtained from the IETF on-line IPR 687 repository at http://www.ietf.org/ipr. 689 The IETF invites any interested party to bring to its attention any 690 copyrights, patents or patent applications, or other proprietary 691 rights that may cover technology that may be required to implement 692 any standard or specification contained in an IETF Document. Please 693 address the information to the IETF at ietf-ipr@ietf.org. 695 The definitive version of an IETF Document is that published by, or 696 under the auspices of, the IETF. Versions of IETF Documents that are 697 published by third parties, including those that are translated into 698 other languages, should not be considered to be definitive versions 699 of IETF Documents. The definitive version of these Legal Provisions 700 is that published by, or under the auspices of, the IETF. Versions of 701 these Legal Provisions that are published by third parties, including 702 those that are translated into other languages, should not be 703 considered to be definitive versions of these Legal Provisions. 705 For the avoidance of doubt, each Contributor to the IETF Standards 706 Process licenses each Contribution that he or she makes as part of 707 the IETF Standards Process to the IETF Trust pursuant to the 708 provisions of RFC 5378. No language to the contrary, or terms, 709 conditions or rights that differ from or are inconsistent with the 710 rights and licenses granted under RFC 5378, shall have any effect and 711 shall be null and void, whether published or posted by such 712 Contributor, or included with or in such Contribution. 714 12. Authors' Addresses 716 SPIRIT DSP 717 Building 27, A. Solzhenitsyna street 718 109004, Moscow, RUSSIA 720 Tel: +7 495 661-2178 721 Fax: +7 495 912-6786 722 EMail: yudin@spiritdsp.com 724 APPENDIX A. RETRIEVING FRAME INFORMATION 726 This appendix contains the c-code for implementation of frame parsing 727 function. This function extracts information about coded frame 728 including frame size, number of layers, size of each layer and size 729 of perceptual sensitive classes. 731 A.1. get_frame_info.c 733 /* 735 Copyright (c) 2010 736 IETF Trust and the persons identified as authors of the code. 737 All rights reserved. 739 Redistribution and use in source and binary forms, with or without 740 modification, are permitted provided that the following conditions 741 are met: 742 - Redistributions of source code must retain the above copyright 743 notice, this list of conditions and the following disclaimer. 744 - Redistributions in binary form must reproduce the above copyright 745 notice, this list of conditions and the following disclaimer in 746 the documentation and/or other materials provided with the 747 distribution. 748 - Neither the name of Internet Society, IETF or IETF Trust, nor 749 the names of specific contributors, may be used to endorse or 750 promote products derived from this software without specific 751 prior written permission. 753 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 754 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 755 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 756 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 757 OWNER OR CONTRIBUTORS BELIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 758 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 759 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 760 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 761 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 762 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 763 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 765 */ 767 /****************************************************************** 769 get_frame_info.c 770 Retrieving frame information for IP-MR Speech Codec 772 ******************************************************************/ 774 #define RATES_NUM 6 // number of codec rates 775 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 777 // frame types 778 #define FT_SPEECH 0 // active speech 779 #define FT_DTX_SID 1 // silence insertion descriptor 781 // get specified bit from coded data 782 int GetBit(unsigned char *buf, int curBit) 783 { 784 return (buf[curBit>>3]>>(curBit%8))&1; 785 } 787 // retrieve frame information 788 int GetFrameInfo( // o: frame size in bits 789 short rate, // i: encoding rate (0..5) 790 short base_rate, // i: base (core) layer rate, 791 unsigned char *buf, // i: coded bit frame 792 int size, // i: coded bit frame size in bytes 793 short pLayerBits[RATES_NUM], // o: number of bits in layers 794 short pSenseBits[SENSE_CLASSES], // o: number of bits in 795 // sensitivity classes 796 short *nLayers // o: number of layers 797 ) 798 { 799 static const short Bits_1[4] = { 0, 9, 9,15}; 800 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44, 801 47,43,44,45,43,44,47,36}; 802 static const short Bits_3[2][6] = {{13,11,23,33,36,31}, 803 {25, 0,23,32,36,31},}; 804 int FrType; 805 int i,nBits = 0; 807 if (rate < 0 || rate > 5) { 808 return 0; // incorrect stream 809 } 811 // extract frame type bit if required 812 FrType = GetBit(buf, nBits++) ? FT_SPEECH : FT_DTX_SID; 814 if((FrType != FT_DTX_SID && size < 2) || size < 1) { 815 return 0; // not enough input data 816 } 817 for(i = 0; i < SENSE_CLASSES; i++) { 818 pSenseBits[i] = 0; 819 } 821 { 822 int cw_0; 823 int b[14]; 825 // extract meaning bits 826 for(i = 0 ; i < 14; i++) { 827 b[i] = GetBit(buf, nBits++); 828 } 830 // parse 831 if(FrType == FT_DTX_SID) { 832 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 833 rate = 0; 834 pSenseBits[0] = 10 + Bits_2[cw_0]; 835 } else { 837 int i, idx; 838 int nFlag_1, nFlag_2, cw_1, cw_2; 840 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 841 cw_1 = (cw_1 << 1) | b[0]; 842 cw_1 = (cw_1 << 1) | b[2]; 843 cw_1 = (cw_1 << 1) | b[4]; 844 cw_1 = (cw_1 << 1) | b[6]; 846 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 847 cw_2 = (cw_2 << 1) | b[1]; 848 cw_2 = (cw_2 << 1) | b[3]; 849 cw_2 = (cw_2 << 1) | b[5]; 850 cw_2 = (cw_2 << 1) | b[7]; 852 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 853 if (base_rate < 0) base_rate = 0; 854 if (base_rate > rate) base_rate = rate; 855 idx = base_rate == 0 ? 0 : 1; 857 pSenseBits[0] = 15+Bits_2[cw_0]; 858 pSenseBits[1] = Bits_1[(cw_1>>0)&0x3] + 859 Bits_1[(cw_1>>2)&0x3]; 860 pSenseBits[2] = nFlag_1*5; 861 pSenseBits[3] = nFlag_2*30; 862 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 864 for (i = 1; i < rate+1; i++) { 865 pLayerBits[i] = 4*Bits_3[idx][i]; 866 } 867 } 869 pLayerBits[0] = 0; 870 for (i = 0; i < SENSE_CLASSES; i++) { 871 pLayerBits[0] += pSenseBits[i]; 872 } 874 *nLayers = rate+1; 875 } 877 { 878 // count total frame size 879 int payloadBitCount = 0; 880 for (i = 0; i < *nLayers; i++) { 881 payloadBitCount += pLayerBits[i]; 882 } 883 return payloadBitCount; 884 } 885 }