idnits 2.17.1 draft-ietf-avt-rtp-ipmr-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 28, 2009) is 5317 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 797 -- Looks like a reference, but probably isn't: '16' on line 751 -- Looks like a reference, but probably isn't: '2' on line 813 -- Looks like a reference, but probably isn't: '6' on line 798 -- Looks like a reference, but probably isn't: '14' on line 777 -- Looks like a reference, but probably isn't: '0' on line 824 -- Looks like a reference, but probably isn't: '1' on line 812 -- Looks like a reference, but probably isn't: '3' on line 814 -- Looks like a reference, but probably isn't: '5' on line 815 -- Looks like a reference, but probably isn't: '7' on line 804 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Working Group S. Ikonin 2 Internet Draft SPIRIT DSP 3 Intended status: Informational September 28, 2009 5 RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-09.txt 7 Status of this Memo 9 This Internet-Draft is submitted to IETF in full conformance with the 10 provisions of BCP 78 and BCP 79. 12 Copyright (c) 2009 IETF Trust and the persons identified as the document 13 authors. All rights reserved. 15 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 16 Relating to IETF Documents in effect on the date of publication of this 17 document (http://trustee.ietf.org/license-info). Please review these 18 documents carefully, as they describe your rights and restrictions with 19 respect to this document. 21 The source codes included in this document are provided under BSD 22 license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf). 24 Internet-Drafts are working documents of the Internet Engineering Task 25 Force (IETF), its areas, and its working groups. Note that other groups 26 may also distribute working documents as Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference material 31 or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/1id-abstracts.html 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html 39 This Internet-Draft will expire on March 28, 2010. 41 Abstract 43 This document specifies the payload format for packetization of SPIRIT 44 IP-MR encoded speech signals into the Real-time Transport Protocol 45 (RTP). The payload format supports transmission of multiple frames per 46 payload and introduced redundancy for robustness against packet loss. 48 Table of Contents 50 1. Introduction......................................................3 51 2. IP-MR Codec Description...........................................3 52 3. Payload Format....................................................4 53 3.1. RTP Header Usage.............................................4 54 3.2. Payload Format Structure.....................................5 55 3.3. Payload Header...............................................5 56 3.4. Speech Table of Contents.....................................6 57 3.5. Speech Data..................................................7 58 3.6. Redundancy Header............................................7 59 3.7. Redundancy Table of Contents.................................8 60 3.8. Redundancy Data..............................................9 61 4. Payload Examples..................................................9 62 4.1. Payload Carrying a Single Frame..............................9 63 4.2. Payload Carrying Multiple Frames with Redundancy............10 64 5. Media Type Registration..........................................11 65 5.1. Registration of media subtype audio/ip-mr_v2.5..............11 66 5.2. Mapping Media Type Parameters into SDP......................12 67 6. Security Considerations..........................................13 68 7. Congestion Control...............................................13 69 8. IANA Considerations..............................................14 70 9. Normative References.............................................14 71 10. Author(s) Information...........................................15 72 11. Disclaimer......................................................15 73 12. Legal Terms.....................................................15 74 APPENDIX A. RETRIEVING FRAME INFORMATION............................17 75 A.1. get_frame_info.c...............................................17 76 Authors' Addresses..................................................19 78 1. Introduction 80 This document specifies the payload format for packetization of SPIRIT 81 IP-MR encoded speech signals into the Real-time Transport Protocol 82 (RTP). The payload format supports transmission of multiple frames per 83 payload and introduced redundancy for robustness against packet loss. 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 87 document are to be interpreted as described in RFC 2119 [RFC 2119]. 89 2. IP-MR Codec Description 91 The IP-MR codec is scalable adaptive multi-rate wideband speech codec 92 designed by SPIRIT for use in IP based networks. These codec is suitable 93 for real time communications such as telephony and videoconferencing. 95 The codec operates on 20 ms frames at 16 kHz sampling rate and has an 96 algorithmic delay of 25ms. 98 The IP-MR supports six wide band speech coding modes with respective bit 99 rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be 100 changed at any 20 ms frame boundary making possible to dynamically 101 adjust the speech encoding rate during a session to adapt to the varying 102 transmission conditions. 104 The coded frame consists of multiple coding layers - base (or core) 105 layer and several enhancement layers which are coded independently. 106 Only the core layer is mandatory to decode understandable speech and 107 upper layers provide quality enhancement. These enhancement layers 108 may be omitted and remaining base layer can be meaningfully decoded 109 without artifacts. This makes the bit stream scalable and allows 110 to reduce bit rate during transmission without re-encoding. 112 This memo specifies an optional form of redundancy coding within RTP 113 for protection against packet loss. It is based on commonly known 114 scheme when previously transmitted frames are aggregated together 115 with new ones. Each frame is retransmitted once in the following 116 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 117 frames, and p(n-1)...p(n+4) is a sequence of payload packets: 119 --+--------+--------+--------+--------+--------+--------+--------+-- 120 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 121 --+--------+--------+--------+--------+--------+--------+--------+-- 123 <---- p(n-1) ----> 124 <----- p(n) -----> 125 <---- p(n+1) ----> 126 <---- p(n+2) ----> 127 <---- p(n+3) ----> 128 <---- p(n+4) ----> 130 But because of the scalable nature of IP-MR codec there is no need to 131 duplicate the whole previous frame - only the core layer may be 132 retransmitted. This reduces redundancy overhead while keeping 133 efficiency. Moreover, the speech bits encoded in core layer are divided 134 on six classes (from A to F) of perceptual sensitivity to errors. Using 135 these classes as introduced redundancy make possible to adjust trade-off 136 between overhead and robustness against packet loss. 138 The mechanism described does not really require signaling at the session 139 setup. The sender is responsible for selecting an appropriate amount of 140 redundancy based on feedback about the channel conditions. 142 The main codec characteristics can be summarized as follows: 144 o Wideband, 16 kHz, speech codec 146 o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps 148 o Bit rate scalable 150 o Variable bit rate changing in accordance with actual speech 151 content 153 o Discontinuous Transmission (DTX), silence suppression and 154 comfort noise generation 156 o In-band redundancy scheme for protection against packet loss 158 3. Payload Format 160 The main purpose of the payload design for IP-MR is to maximize the 161 potential of the codec with as minimal overhead as possible. The payload 162 format allows changing parameters of the codec (such as bit rate, 163 level of scalability, DTX and redundancy mode) without re-negotiation 164 at any packet boundary. This make possible dynamically adjust streaming 165 parameters in accordance to changing network conditions. The payload 166 format also supports aggregation of multiple consecutive frames 167 (up to 4) in a payload. That allows controlling trade-off between 168 delay and header overhead. 170 3.1. RTP Header Usage 172 The RTP timestamp corresponds to the sampling instant of the first 173 sample encoded for the first frame-block in the packet. The timestamp 174 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 175 corresponding to 320 samples at 16 kHz. Thus the timestamp is increased 176 by 320 for each consecutive frame. The timestamp is also used to recover 177 the correct decoding order of the frame-blocks. 179 The RTP header marker bit (M) SHALL be set to 1 whenever the first 180 frame-block carried in the packet is the first frame-block in a 181 talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). 182 For all other packets, the marker bit SHALL be set to zero (M=0). 184 The assignment of an RTP payload type for the format defined in this 185 memo is outside the scope of this document. The RTP profiles in use 186 currently mandate binding the payload type dynamically for this payload 187 format. This is basically necessary because the payload type expresses 188 the configuration of the payload itself, i.e. basic or interleaved mode, 189 and the number of channels carried. 191 The remaining RTP header fields are used as specified in [RFC 3550]. 193 3.2. Payload Format Structure 195 The IP-MR payload format consists of a payload header with general 196 information about packet, a speech table of contents (TOC), and speech 197 data. An optional redundancy section follows after speech data. The 198 redundancy section consists of redundancy header, redundancy TOC and 199 redundancy data payload. 201 The following diagram shows the standard payload format layout: 203 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 204 | payload | speech | speech | redundancy | redundancy | redundancy | 205 | header | TOC | data | header | TOC | data | 206 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 208 3.3. Payload Header 210 The payload header has the following format: 212 0 1 213 0 1 2 3 4 5 6 7 8 9 0 1 214 +-+-+-+-+-+-+-+-+-+-+-+-+ 215 |T| CR | BR |D|A|GR |R| 216 +-+-+-+-+-+-+-+-+-+-+-+-+ 218 o T (1 bit): Reserved compatibility with future extensions. SHOULD 219 be set to 0. 221 o CR (3 bits): coding rate of frame(s) in this packet, as per the 222 following table: 224 +-------+--------------+ 225 | CR | avg. bitrate | 226 +-------+--------------+ 227 | 0 | 7.7 kbps | 228 | 1 | 9.8 kbps | 229 | 2 | 14.3 kbps | 230 | 3 | 20.8 kbps | 231 | 4 | 27.9 kbps | 232 | 5 | 34.2 kbps | 233 | 6 | (reserved) | 234 | 7 | NO_DATA | 235 +-------+--------------+ 237 The CR value 7 (NO_DATA) indicates that there is no speech data (and 238 speech TOC accordingly) in the payload. This MAY be used to transmit 239 redundancy data only. The value 6 is reserved. If receiving this value 240 the packet SHOULD be discarded. 242 o BR (3 bits): base rate for core layer of frame(s) in this packet 243 using the table for CR. Values in the range 0-5 indicate bitrates 244 for core layer, same as for packet SHOULD be discarded. The base 245 rate is the lowest rate for scalability, so speech payload can 246 be scaled down not lower than BR value. If a received packet has 247 BR > CR then during decoding it will be assumed that BR = CR. 249 o D (1 bit): indicates if the DTX mode is active or not. This 250 parameter is retained for backward interoperability with previous 251 codec releases and required for payload parsing. The 252 decoder implementation MUST always include DTX mode 253 support and update internal states properly. The decoder cannot 254 assume that DTX will be constantly inactive during a session. 256 o A (1 bit): reserved. Must be always set to 1. 258 o GR (2 bits): number of frames in packet (grouping size). Actual 259 grouping size is GR + 1, thus maximum grouping supported is 4. 261 o R (1 bit): redundancy presence bit. If R=1 then the packet 262 contains redundancy information for lost packets recovery. 263 In this case after speech data the redundancy section is present. 265 3.4. Speech Table of Contents 267 The speech TOC contains entries for each frame in packet (grouping size 268 in total). Each entry contains a single field: 270 0 271 +-+ 272 |E| 273 +-+ 275 o E (1 bit): frame existence indicator. If set to 0, this indicates 276 the corresponding frame is absent and the receiver should set 277 special LOST_FRAME flag for decoder. This can be followed by the 278 lost frame itself or by empty frames generated by the encoder 279 during silence intervals in DTX mode. 281 Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 282 is empty. 284 3.5. Speech Data 286 Speech data of a payload contains one or more speech frames or comfort 287 noise frames, as specified in the speech TOC of the payload. 289 Each speech frame represents 20 ms of speech encoded with the rate 290 indicated in the CR and base rate indicated in BR field of the payload 291 header. 293 The size of coded speech frame is variable due to the nature of codec. 294 The Encoder's algorithm decides what size of each frame is and returns 295 it after encoding. In order to save bandwidth the size is not placed 296 into payload obviously. The frame size can be determined by frame's 297 content using a special service function specified in Appendix A. 298 This function provides complete information about coded frame including 299 size, number of layers, size of each layer and size of perceptual 300 sensitive classes. 302 3.6. Redundancy Header 304 If a packet contains redundancy (R field of payload header is 1) the 305 speech data is followed by redundancy header: 307 0 1 2 3 4 5 308 +-+-+-+-+-+-+ 309 | CL1 | CL2 | 310 +-+-+-+-+-+-+ 312 Redundancy header consists of two fields. Each field contains class 313 specifier for amount of redundancy partly taken from the preceding 314 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 315 current packet by 1 and 2 packets accordingly. The values are listed 316 in the table below: 318 +-------+-------------------+ 319 | CL | amount redundancy | 320 +-------+-------------------+ 321 | 0 | NONE | 322 | 1 | CLASS A | 323 | 2 | CLASS B | 324 | 3 | CLASS C | 325 | 4 | CLASS D | 326 | 5 | CLASS E | 327 | 6 | CLASS F | 328 | 7 | (reserved) | 329 +-------+-------------------+ 331 Each specifier takes 3 bits, thus the total redundancy header size is 6 332 bits. 334 These classes indicate subjective importance of bits from core layer. 335 Class A contains the bits most sensitive to errors and lost of these 336 bits results in a corrupted speech frame which should not be decoded 337 without applying packet loss concealment (PLC) procedure. Class B is 338 less sensitive than class A and so on to F. Sum of all bit classes 339 from A to F composes core layer. 341 Putting some part (classes of bits) from previous frame into current 342 packet makes possible to partially decode previous frame in case of 343 it's lost. Than more information is delivered than less speech quality 344 degradation will be. Flags CL1 and CL2 specify how many classes from 345 previous frames current packet contain. E.g. CL1=3 (class C), it means 346 that packet contains bits from classes A, B and C of previous frame. 347 If CL1=6 (class F) then whole core layer is included. 349 3.7. Redundancy Table of Contents 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 352 | Pkt1 Entries| Pkt2 Entries| 353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 The redundancy TOC contains entries for redundancy frames from preceding 356 and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 357 (3.3): 359 0 360 +-+ 361 |E| 362 +-+ 364 o E (1 bit): frame existence indicator. If set to 0, this indicates 365 the corresponding frame is absent. 367 o For each preceding and pre-preceding packet the number of entries 368 is equal to the grouping size of the current packet. E.g. maximum 369 number of entries is 4*2 = 8. 371 o If class specifier in the redundancy header is CL=0 (NO_DATA) 372 then there is no entries for corresponding packet redundancy. 374 3.8. Redundancy Data 376 Redundancy data of a payload contains redundancy information for one or 377 more speech frames or comfort noise frames that may be lost during 378 transition, as specified in the redundancy TOC of the payload. Actually 379 redundancy is the most important part of preceding frames representing 380 20 ms of speech. This data MAY be used for partial reconstruction of 381 lost frames. The amount of available redundancy is specified by CL flag 382 in redundancy header section (3.5). This flag SHOULD be passed to 383 decoder. The size of redundancy frame is variable and can be obtained 384 using service function specified in Appendix A. 386 4. Payload Examples 388 A few examples to highlight the payload format follow. 390 4.1. Payload Carrying a Single Frame 392 The following diagram shows a standard IP-MR payload carrying a single 393 speech frame without redundancy: 395 0 1 2 3 396 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 398 |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | 399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 400 | | 401 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 402 | | 403 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 404 | | 405 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 406 | | 407 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 408 | | 409 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 | sp(193)|P| 411 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 In the payload the speech frame is not damaged at the IP origin (E=1), 414 the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and 415 the DTX mode is off. There is no byte alignment (A=0) and no redundancy 416 (R=0). The encoded speech bits - s(0) to s(193) - are placed immediately 417 after TOC. Finally, one zero bit is added at the end as padding to make 418 the payload byte aligned. 420 4.2. Payload Carrying Multiple Frames with Redundancy 422 The following diagram shows a payload that contains three frames, one of 423 them with no speech data. The coding rate is 7.7 kbps (CR=0), the base 424 rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are 425 byte aligned (A=1), so 1 zero bit is added at the end of the header. 426 Besides the speech frames the payload contains six redundancy frames 427 (three per each delayed packet). 429 The first speech frame consists of bits sp1(0) to sp1(92). After that 3 430 bits are added for byte alignment. The second frame does not contain any 431 speech information that is represented in the payload by its TOC entry. 432 The third frame consists of bits sp3(0) to sp3(171). 434 The redundancy header follows after speech data. The one-packet-delayed 435 redundancy contains class A+B bits (CL1=2), and two-packet-delayed 436 redundancy contains class A bits (Cl2=1). The one-packet-delayed 437 redundancy contains three frames with 20, 39 and 35 bits respectively. 439 The first frame of two-packet-delayed redundancy is absent, it is 440 represented in its TOC entry, and two other frames have sizes 15 and 19 441 bits. 443 Note that all speech frames are padded with zero bits for byte 444 alignment. 446 0 1 2 3 447 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 449 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 450 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 451 | | 452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 | | 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 | sp1(92)|P|P|P|sp3(0) | 456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 | | 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 459 | | 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 | | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | | 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 | sp3(171)|P|P|P|P| 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)| 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 |red1_2(0) | 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 471 | red1_2(38)|red1_3(0) | 472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 473 | red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) | 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 | red2_3(18)|P|P|P|P| 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 5. Media Type Registration 480 This section describes the media types and names associated with this 481 payload format. 483 5.1. Registration of media subtype audio/ip-mr_v2.5 485 Type name: audio 487 Subtype name: ip-mr_v2.5 489 Required parameters: none 491 Optional parameters: 492 * ptime: Gives the length of time in milliseconds represented by the 493 media in a packet. Allowed values are: 20, 40, 60 and 80. 495 Encoding considerations: This media type is framed binary data (see RFC 496 4288, Section 4.8). 498 Security considerations: See RFC 3550 [RFC 3550] 500 Interoperability considerations: none 502 Published specification: RFC XXXX 504 Applications that use this media type: Real-time audio applications like 505 voice over IP and teleconference, and multi-media streaming. 507 Additional information: none 509 Person & email address to contact for further information: 510 Yury Morzeev 511 morzeev@spiritdsp.com 513 Intended usage: COMMON 515 Restrictions on usage: This media type depends on RTP framing, and hence 516 is only defined for transfer via RTP [RFC 3550]. 518 Authors: 519 Sergey Ikonin 521 Change controller: IETF Audio/Video Transport working group delegated 522 from the IESG. 524 5.2. Mapping Media Type Parameters into SDP 526 The information carried in the media type specification has a specific 527 mapping to fields in the Session Description Protocol (SDP) [RFC 4566], 528 which is commonly used to describe RTP sessions. When SDP is used to 529 specify sessions employing the IP-MR codec, the mapping is as follows: 531 o The media type ("audio") goes in SDP "m=" as the media name. 533 o The media subtype (payload format name) goes in SDP "a=rtpmap" 534 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 536 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 538 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 539 them directly from the media type parameter string as a semicolon- 540 separated list of parameter=value pairs. 542 Note that the payload format (encoding) names are commonly shown in 543 upper case. Media subtypes are commonly shown in lower case. These 544 names are case-insensitive in both places. 546 6. Security Considerations 548 RTP packets using the payload format defined in this specification 549 are subject to the security considerations discussed in the RTP 550 specification [RFC 3550] and in any applicable RTP profile. The main 551 security considerations for the RTP packet carrying the RTP payload 552 format defined within this memo are confidentiality, integrity, and 553 source authenticity. Confidentiality is achieved by encryption of the 554 RTP payload. Integrity of the RTP packets is achieved through a suitable 555 cryptographic integrity protection mechanism. Such a cryptographic 556 system may also allow the authentication of the source of the payload. 558 A suitable security mechanism for this RTP payload format should 559 provide confidentiality, integrity protection, and at least source 560 authentication capable of determining if an RTP packet is from a 561 member of the RTP session. 563 Note that the appropriate mechanism to provide security to RTP and 564 payloads following this memo may vary. It is dependent on the 565 application, the transport, and the signaling protocol employed. 566 Therefore, a single mechanism is not sufficient, although if suitable, 567 usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is 568 recommended. Other mechanisms that may be used are IPsec [RFC 4301] 569 and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other 570 alternatives may exist. 572 This payload format does not exhibit any significant non-uniformity in 573 the receiver side computational complexity for packet processing, and 574 thus is unlikely to pose a denial-of-service threat due to the receipt 575 of pathological data. 577 7. Congestion Control 579 The general congestion control considerations for transporting RTP data 580 apply; see RTP [RFC 3550] and any applicable RTP profile like AVP 581 [RFC 3551]. However, the multi-rate capability of IP-MR speech coding 582 provides a mechanism that may help to control congestion, since the 583 bandwidth demand can be adjusted by selecting a different encoding mode. 585 The number of frames encapsulated in each RTP payload highly 586 influences the overall bandwidth of the RTP stream due to header 587 overhead constraints. Packetizing more frames in each RTP payload 588 can reduce the number of packets sent and hence the overhead from 589 IP/UDP/RTP headers, at the expense of increased delay. 591 If in-band redundancy scheme is used to protect against packet loss, 592 the amount of introduced redundancy will need to be regulated so that 593 the use of redundancy itself does not cause a congestion problem. In 594 other words, a sender SHALL NOT increase the total bitrate when adding 595 redundancy in response to packet loss, and needs instead to adjust it 596 down in accordance to the congestion control algorithm being run. Thus, 597 when adding redundancy, the media bitrate will need to be reduced to 598 provide room for the redundancy. 600 8. IANA Considerations 602 One media type has been defined and needs registration in the media 603 types registry. 605 9. Normative References 607 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 608 Requirement Levels", BCP 14, RFC 2119, March 1997. 610 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and 611 V. Jacobson, "RTP: A Transport Protocol for Real-Time 612 Applications", STD 64, RFC 3550, July 2003. 614 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio 615 and Video Conferences with Minimal Control", STD 65, 616 RFC 3551, July 2003. 618 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 619 Description Protocol", RFC 4566, July 2006. 621 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 622 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 623 3711, March 2004. 625 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer 626 Security (TLS) Protocol Version 1.2", RFC 5246, 627 August 2008. 629 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 630 Internet Protocol", RFC 4301, December 2005. 632 10. Author(s) Information: 634 Sergey Ikonin 635 email: info@spiritdsp.com 637 Russia 109004 638 Building 27, A. Solzhenitsyna street 639 Tel: +7 495 661-2178 640 Fax: +7 495 912-6786 642 11. Disclaimer 644 This document may contain material from IETF Documents or IETF 645 Contributions published or made publicly available before November 10, 646 2008. The person(s) controlling the copyright in some of this material 647 may not have granted the IETF Trust the right to allow modifications of 648 such material outside the IETF Standards Process. Without obtaining an 649 adequate license from the person(s) controlling the copyright in such 650 materials, this document may not be modified outside the IETF Standards 651 Process, and derivative works of it may not be created outside the IETF 652 Standards Process, except to format it for publication as an RFC or to 653 translate it into languages other than English. 655 12. Legal Terms 657 All IETF Documents and the information contained therein are provided on 658 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 659 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 660 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 661 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 662 INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 663 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 665 The IETF Trust takes no position regarding the validity or scope of any 666 Intellectual Property Rights or other rights that might be claimed to 667 pertain to the implementation or use of the technology described in any 668 IETF Document or the extent to which any license under such rights might 669 or might not be available; nor does it represent that it has made any 670 independent effort to identify any such rights. 672 Copies of Intellectual Property disclosures made to the IETF Secretariat 673 and any assurances of licenses to be made available, or the result of an 674 attempt made to obtain a general license or permission for the use of 675 such proprietary rights by implementers or users of this specification 676 can be obtained from the IETF on-line IPR repository at 677 http://www.ietf.org/ipr. 679 The IETF invites any interested party to bring to its attention any 680 copyrights, patents or patent applications, or other proprietary rights 681 that may cover technology that may be required to implement any standard 682 or specification contained in an IETF Document. Please address the 683 information to the IETF at ietf-ipr@ietf.org. 685 The definitive version of an IETF Document is that published by, or 686 under the auspices of, the IETF. Versions of IETF Documents that are 687 published by third parties, including those that are translated into 688 other languages, should not be considered to be definitive versions of 689 IETF Documents. The definitive version of these Legal Provisions is that 690 published by, or under the auspices of, the IETF. Versions of these 691 Legal Provisions that are published by third parties, including those 692 that are translated into other languages, should not be considered to be 693 definitive versions of these Legal Provisions. 695 For the avoidance of doubt, each Contributor to the IETF Standards 696 Process licenses each Contribution that he or she makes as part of the 697 IETF Standards Process to the IETF Trust pursuant to the provisions of 698 RFC 5378. No language to the contrary, or terms, conditions or rights 699 that differ from or are inconsistent with the rights and licenses 700 granted under RFC 5378, shall have any effect and shall be null and 701 void, whether published or posted by such Contributor, or included with 702 or in such Contribution. 704 APPENDIX A. RETRIEVING FRAME INFORMATION 706 This appendix contains the c-code for implementation of frame parsing 707 function. This function extracts information about coded frame including 708 frame size, number of layers, size of each layer and size of perceptual 709 sensitive classes. 711 A.1. get_frame_info.c 713 /****************************************************************** 715 get_frame_info.c 717 Retrieving frame information for IP-MR Speech Codec 719 ******************************************************************/ 721 #define RATES_NUM 6 // number of codec rates 722 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 724 // frame types 725 #define FT_DTX_SPEECH 0 // active speech in DTX mode 726 #define FT_DTX_SID 1 // silence insertion descriptor 727 #define FT_NO_DTX 2 // no DTX frame 729 // get specified bit from coded data 730 int GetBit(unsigned char *data, int curBit) 731 { 732 return ((data[curBit >> 3] >> (curBit % 8)) & 1); 733 } 735 // retrieve frame information 736 int GetFrameInfo( // o: frame size in bits 737 short rate, // i: encoding rate (0..5) 738 short base_rate, // i: base (core) layer rate, 739 // if base_rate > rate, then assumed 740 // that base_rate = rate. 741 short allow_DTX, // i: flag of DTX mode 742 unsigned char *pCoded, // i: coded bit frame 743 short pLayerBits // o: number of bits in layers 744 [RATES_NUM], 745 short pSenseBits // o: number of bits in sensitivity classes 746 [SENSE_CLASSES], 747 short *nLayers // o: number of layers 748 ) 749 { 750 static const short Bits_1[4] = {0, 9, 9, 15}; 751 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44, 752 45,43,44,47,36}; 754 static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, 755 {25, 0, 23, 32, 36, 31},}; 757 int FrType; 758 int i,nBits; 760 if (rate < 0 || rate > 5) { 761 return 0; // incorrect stream 762 } 764 for(i = 0; i < SENSE_CLASSES; i++) { 765 pSenseBits[i] = 0; 766 } 768 nBits = 0; 769 // extract frame type bit if required 770 if (allow_DTX) { 771 FrType = GetBit(pCoded, nBits++) ? FT_DTX_SPEECH : FT_DTX_SID; 772 } else { 773 FrType = FT_NO_DTX; 774 } 775 { 776 int cw_0; 777 int b[14]; 779 // extract meaning bits 780 for(i = 0 ; i < 14; i++) { 781 b[i] = GetBit(pCoded, nBits++); 782 } 784 // parse 785 if(FrType == FT_DTX_SID) { 786 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 787 rate = 0; 788 pSenseBits[0] = 10 + Bits_2[cw_0]; 789 } else { 791 int i, idx; 792 int nFlag_1, nFlag_2, cw_1, cw_2; 794 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 795 cw_1 = (cw_1 << 1) | b[0]; 796 cw_1 = (cw_1 << 1) | b[2]; 797 cw_1 = (cw_1 << 1) | b[4]; 798 cw_1 = (cw_1 << 1) | b[6]; 800 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 801 cw_2 = (cw_2 << 1) | b[1]; 802 cw_2 = (cw_2 << 1) | b[3]; 803 cw_2 = (cw_2 << 1) | b[5]; 804 cw_2 = (cw_2 << 1) | b[7]; 806 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 807 if (base_rate < 0) base_rate = 0; 808 if (base_rate > rate) base_rate = rate; 809 idx = base_rate == 0 ? 0 : 1; 811 pSenseBits[0] = (FrType == FT_DTX_SPEECH ? 1:0)+14+Bits_2[cw_0]; 812 pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; 813 pSenseBits[2] = nFlag_1*5; 814 pSenseBits[3] = nFlag_2*30; 815 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 817 for (i = 1; i < rate+1; i++) { 818 pLayerBits[i] = 4*(Bits_3[idx][i]); 819 } 820 } 822 pLayerBits[0] = 0; 823 for (i = 0; i < SENSE_CLASSES; i++) { 824 pLayerBits[0] += pSenseBits[i]; 825 } 827 *nLayers = rate+1; 828 } 830 { 831 // count total frame size 832 int payloadBitCount = 0; 833 for (i = 0; i < *nLayers; i++) { 834 payloadBitCount += pLayerBits[i]; 835 } 836 return payloadBitCount; 837 } 838 } 840 Authors' Addresses 842 SPIRIT DSP 843 Building 27, A. Solzhenitsyna street 844 109004, Moscow, RUSSIA 846 Tel: +7 495 661-2178 847 Fax: +7 495 912-6786 848 EMail: info@spiritdsp.com