idnits 2.17.1 draft-ietf-avt-rtp-ipmr-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 05, 2009) is 5316 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 791 -- Looks like a reference, but probably isn't: '16' on line 748 -- Looks like a reference, but probably isn't: '2' on line 807 -- Looks like a reference, but probably isn't: '6' on line 792 -- Looks like a reference, but probably isn't: '14' on line 771 -- Looks like a reference, but probably isn't: '0' on line 818 -- Looks like a reference, but probably isn't: '1' on line 806 -- Looks like a reference, but probably isn't: '3' on line 808 -- Looks like a reference, but probably isn't: '5' on line 809 -- Looks like a reference, but probably isn't: '7' on line 798 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Working Group S. Ikonin 2 Internet Draft SPIRIT DSP 3 Intended status: Informational October 05, 2009 5 RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-10.txt 7 Status of this Memo 9 This Internet-Draft is submitted to IETF in full conformance with the 10 provisions of BCP 78 and BCP 79. 12 Copyright (c) 2009 IETF Trust and the persons identified as the document 13 authors. All rights reserved. 15 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 16 Relating to IETF Documents in effect on the date of publication of this 17 document (http://trustee.ietf.org/license-info). Please review these 18 documents carefully, as they describe your rights and restrictions with 19 respect to this document. 21 The source codes included in this document are provided under BSD 22 license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf). 24 Internet-Drafts are working documents of the Internet Engineering Task 25 Force (IETF), its areas, and its working groups. Note that other groups 26 may also distribute working documents as Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference material 31 or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/1id-abstracts.html 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html 39 This Internet-Draft will expire on April 05, 2010. 41 Abstract 43 This document specifies the payload format for packetization of SPIRIT 44 IP-MR encoded speech signals into the Real-time Transport Protocol 45 (RTP). The payload format supports transmission of multiple frames per 46 payload and introduced redundancy for robustness against packet loss. 48 Table of Contents 50 1. Introduction......................................................3 51 2. IP-MR Codec Description...........................................3 52 3. Payload Format....................................................4 53 3.1. RTP Header Usage.............................................4 54 3.2. Payload Format Structure.....................................5 55 3.3. Payload Header...............................................5 56 3.4. Speech Table of Contents.....................................6 57 3.5. Speech Data..................................................7 58 3.6. Redundancy Header............................................7 59 3.7. Redundancy Table of Contents.................................8 60 3.8. Redundancy Data..............................................9 61 4. Payload Examples..................................................9 62 4.1. Payload Carrying a Single Frame..............................9 63 4.2. Payload Carrying Multiple Frames with Redundancy............10 64 5. Media Type Registration..........................................11 65 5.1. Registration of media subtype audio/ip-mr_v2.5..............11 66 5.2. Mapping Media Type Parameters into SDP......................12 67 6. Security Considerations..........................................13 68 7. Congestion Control...............................................13 69 8. IANA Considerations..............................................14 70 9. Normative References.............................................14 71 10. Author(s) Information...........................................15 72 11. Disclaimer......................................................15 73 12. Legal Terms.....................................................15 74 APPENDIX A. RETRIEVING FRAME INFORMATION............................17 75 A.1. get_frame_info.c...............................................17 76 Authors' Addresses..................................................19 78 1. Introduction 80 This document specifies the payload format for packetization of SPIRIT 81 IP-MR encoded speech signals into the Real-time Transport Protocol 82 (RTP). The payload format supports transmission of multiple frames per 83 payload and introduced redundancy for robustness against packet loss. 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 87 document are to be interpreted as described in RFC 2119 [RFC 2119]. 89 2. IP-MR Codec Description 91 The IP-MR codec is scalable adaptive multi-rate wideband speech codec 92 designed by SPIRIT for use in IP based networks. These codec is suitable 93 for real time communications such as telephony and videoconferencing. 95 The codec operates on 20 ms frames at 16 kHz sampling rate and has an 96 algorithmic delay of 25ms. 98 The IP-MR supports six wide band speech coding modes with respective bit 99 rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be 100 changed at any 20 ms frame boundary making possible to dynamically 101 adjust the speech encoding rate during a session to adapt to the varying 102 transmission conditions. 104 The coded frame consists of multiple coding layers - base (or core) 105 layer and several enhancement layers which are coded independently. 106 Only the core layer is mandatory to decode understandable speech and 107 upper layers provide quality enhancement. These enhancement layers 108 may be omitted and remaining base layer can be meaningfully decoded 109 without artifacts. This makes the bit stream scalable and allows 110 to reduce bit rate during transmission without re-encoding. 112 This memo specifies an optional form of redundancy coding within RTP 113 for protection against packet loss. It is based on commonly known 114 scheme when previously transmitted frames are aggregated together 115 with new ones. Each frame is retransmitted once in the following 116 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 117 frames, and p(n-1)...p(n+4) is a sequence of payload packets: 119 --+--------+--------+--------+--------+--------+--------+--------+-- 120 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 121 --+--------+--------+--------+--------+--------+--------+--------+-- 123 <---- p(n-1) ----> 124 <----- p(n) -----> 125 <---- p(n+1) ----> 126 <---- p(n+2) ----> 127 <---- p(n+3) ----> 128 <---- p(n+4) ----> 130 But because of the scalable nature of IP-MR codec there is no need to 131 duplicate the whole previous frame - only the core layer may be 132 retransmitted. This reduces redundancy overhead while keeping 133 efficiency. Moreover, the speech bits encoded in core layer are divided 134 on six classes (from A to F) of perceptual sensitivity to errors. Using 135 these classes as introduced redundancy make possible to adjust trade-off 136 between overhead and robustness against packet loss. 138 The mechanism described does not really require signaling at the session 139 setup. The sender is responsible for selecting an appropriate amount of 140 redundancy based on feedback about the channel conditions. 142 The main codec characteristics can be summarized as follows: 144 o Wideband, 16 kHz, speech codec 146 o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps 148 o Bit rate scalable 150 o Variable bit rate changing in accordance with actual speech 151 content 153 o Discontinuous Transmission (DTX), silence suppression and 154 comfort noise generation 156 o In-band redundancy scheme for protection against packet loss 158 3. Payload Format 160 The main purpose of the payload design for IP-MR is to maximize the 161 potential of the codec with as minimal overhead as possible. The payload 162 format allows changing parameters of the codec (such as bit rate, 163 level of scalability, DTX and redundancy mode) without re-negotiation 164 at any packet boundary. This make possible dynamically adjust streaming 165 parameters in accordance to changing network conditions. The payload 166 format also supports aggregation of multiple consecutive frames 167 (up to 4) in a payload. That allows controlling trade-off between 168 delay and header overhead. 170 3.1. RTP Header Usage 172 The RTP timestamp corresponds to the sampling instant of the first 173 sample encoded for the first frame-block in the packet. The timestamp 174 clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, 175 corresponding to 320 samples at 16 kHz. Thus the timestamp is increased 176 by 320 for each consecutive frame. The timestamp is also used to recover 177 the correct decoding order of the frame-blocks. 179 The RTP header marker bit (M) SHALL be set to 1 whenever the first 180 frame-block carried in the packet is the first frame-block in a 181 talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). 182 For all other packets, the marker bit SHALL be set to zero (M=0). 184 The assignment of an RTP payload type for the format defined in this 185 memo is outside the scope of this document. The RTP profiles in use 186 currently mandate binding the payload type dynamically for this payload 187 format. This is basically necessary because the payload type expresses 188 the configuration of the payload itself, i.e. basic or interleaved mode, 189 and the number of channels carried. 191 The remaining RTP header fields are used as specified in [RFC 3550]. 193 3.2. Payload Format Structure 195 The IP-MR payload format consists of a payload header with general 196 information about packet, a speech table of contents (TOC), and speech 197 data. An optional redundancy section follows after speech data. The 198 redundancy section consists of redundancy header, redundancy TOC and 199 redundancy data payload. 201 The following diagram shows the standard payload format layout: 203 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 204 | payload | speech | speech | redundancy | redundancy | redundancy | 205 | header | TOC | data | header | TOC | data | 206 +---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + 208 3.3. Payload Header 210 The payload header has the following format: 212 0 1 213 0 1 2 3 4 5 6 7 8 9 0 1 214 +-+-+-+-+-+-+-+-+-+-+-+-+ 215 |T| CR | BR |D|A|GR |R| 216 +-+-+-+-+-+-+-+-+-+-+-+-+ 218 o T (1 bit): Reserved compatibility with future extensions. SHOULD 219 be set to 0. 221 o CR (3 bits): coding rate of frame(s) in this packet, as per the 222 following table: 224 +-------+--------------+ 225 | CR | avg. bitrate | 226 +-------+--------------+ 227 | 0 | 7.7 kbps | 228 | 1 | 9.8 kbps | 229 | 2 | 14.3 kbps | 230 | 3 | 20.8 kbps | 231 | 4 | 27.9 kbps | 232 | 5 | 34.2 kbps | 233 | 6 | (reserved) | 234 | 7 | NO_DATA | 235 +-------+--------------+ 237 The CR value 7 (NO_DATA) indicates that there is no speech data (and 238 speech TOC accordingly) in the payload. This MAY be used to transmit 239 redundancy data only. The value 6 is reserved. If receiving this value 240 the packet SHOULD be discarded. 242 o BR (3 bits): base rate for core layer of frame(s) in this packet 243 using the table for CR. Values in the range 0-5 indicate bitrates 244 for core layer, same as for packet SHOULD be discarded. The base 245 rate is the lowest rate for scalability, so speech payload can 246 be scaled down not lower than BR value. If a received packet has 247 BR > CR then during decoding it will be assumed that BR = CR. 249 o D (1 bit): reserved. Must be always set to 1. 250 Previously, this bit indicated DTX mode availability, but in fact 251 payload dublicates this information. 253 o A (1 bit): reserved. Must be always set to 1. 254 Previously, this bit indicated aligned mode, but this mode has 255 never been used and was always set to 1. 257 o GR (2 bits): number of frames in packet (grouping size). Actual 258 grouping size is GR + 1, thus maximum grouping supported is 4. 260 o R (1 bit): redundancy presence bit. If R=1 then the packet 261 contains redundancy information for lost packets recovery. 262 In this case after speech data the redundancy section is present. 264 3.4. Speech Table of Contents 266 The speech TOC contains entries for each frame in packet (grouping size 267 in total). Each entry contains a single field: 269 0 270 +-+ 271 |E| 272 +-+ 274 o E (1 bit): frame existence indicator. If set to 0, this indicates 275 the corresponding frame is absent and the receiver should set 276 special LOST_FRAME flag for decoder. This can be followed by the 277 lost frame itself or by empty frames generated by the encoder 278 during silence intervals in DTX mode. 280 Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 281 is empty. 283 3.5. Speech Data 285 Speech data of a payload contains one or more speech frames or comfort 286 noise frames, as specified in the speech TOC of the payload. 288 Each speech frame represents 20 ms of speech encoded with the rate 289 indicated in the CR and base rate indicated in BR field of the payload 290 header. 292 The size of coded speech frame is variable due to the nature of codec. 293 The Encoder's algorithm decides what size of each frame is and returns 294 it after encoding. In order to save bandwidth the size is not placed 295 into payload obviously. The frame size can be determined by frame's 296 content using a special service function specified in Appendix A. 297 This function provides complete information about coded frame including 298 size, number of layers, size of each layer and size of perceptual 299 sensitive classes. 301 3.6. Redundancy Header 303 If a packet contains redundancy (R field of payload header is 1) the 304 speech data is followed by redundancy header: 306 0 1 2 3 4 5 307 +-+-+-+-+-+-+ 308 | CL1 | CL2 | 309 +-+-+-+-+-+-+ 311 Redundancy header consists of two fields. Each field contains class 312 specifier for amount of redundancy partly taken from the preceding 313 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 314 current packet by 1 and 2 packets accordingly. The values are listed 315 in the table below: 317 +-------+-------------------+ 318 | CL | amount redundancy | 319 +-------+-------------------+ 320 | 0 | NONE | 321 | 1 | CLASS A | 322 | 2 | CLASS B | 323 | 3 | CLASS C | 324 | 4 | CLASS D | 325 | 5 | CLASS E | 326 | 6 | CLASS F | 327 | 7 | (reserved) | 328 +-------+-------------------+ 330 Each specifier takes 3 bits, thus the total redundancy header size is 6 331 bits. 333 These classes indicate subjective importance of bits from core layer. 334 Class A contains the bits most sensitive to errors and lost of these 335 bits results in a corrupted speech frame which should not be decoded 336 without applying packet loss concealment (PLC) procedure. Class B is 337 less sensitive than class A and so on to F. Sum of all bit classes 338 from A to F composes core layer. 340 Putting some part (classes of bits) from previous frame into current 341 packet makes possible to partially decode previous frame in case of 342 it's lost. Than more information is delivered than less speech quality 343 degradation will be. Flags CL1 and CL2 specify how many classes from 344 previous frames current packet contain. E.g. CL1=3 (class C), it means 345 that packet contains bits from classes A, B and C of previous frame. 346 If CL1=6 (class F) then whole core layer is included. 348 3.7. Redundancy Table of Contents 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | Pkt1 Entries| Pkt2 Entries| 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 The redundancy TOC contains entries for redundancy frames from preceding 355 and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 356 (3.3): 358 0 359 +-+ 360 |E| 361 +-+ 363 o E (1 bit): frame existence indicator. If set to 0, this indicates 364 the corresponding frame is absent. 366 o For each preceding and pre-preceding packet the number of entries 367 is equal to the grouping size of the current packet. E.g. maximum 368 number of entries is 4*2 = 8. 370 o If class specifier in the redundancy header is CL=0 (NO_DATA) 371 then there is no entries for corresponding packet redundancy. 373 3.8. Redundancy Data 375 Redundancy data of a payload contains redundancy information for one or 376 more speech frames or comfort noise frames that may be lost during 377 transition, as specified in the redundancy TOC of the payload. Actually 378 redundancy is the most important part of preceding frames representing 379 20 ms of speech. This data MAY be used for partial reconstruction of 380 lost frames. The amount of available redundancy is specified by CL flag 381 in redundancy header section (3.5). This flag SHOULD be passed to 382 decoder. The size of redundancy frame is variable and can be obtained 383 using service function specified in Appendix A. 385 4. Payload Examples 387 A few examples to highlight the payload format follow. 389 4.1. Payload Carrying a Single Frame 391 The following diagram shows a standard IP-MR payload carrying a single 392 speech frame without redundancy: 394 0 1 2 3 395 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 | | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | sp(193)|P| 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 412 In the payload the speech frame is not damaged at the IP origin (E=1), 413 the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and 414 the DTX mode is off. There is no byte alignment (A=0) and no redundancy 415 (R=0). The encoded speech bits - s(0) to s(193) - are placed immediately 416 after TOC. Finally, one zero bit is added at the end as padding to make 417 the payload byte aligned. 419 4.2. Payload Carrying Multiple Frames with Redundancy 421 The following diagram shows a payload that contains three frames, one of 422 them with no speech data. The coding rate is 7.7 kbps (CR=0), the base 423 rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are 424 byte aligned (A=1), so 1 zero bit is added at the end of the header. 425 Besides the speech frames the payload contains six redundancy frames 426 (three per each delayed packet). 428 The first speech frame consists of bits sp1(0) to sp1(92). After that 3 429 bits are added for byte alignment. The second frame does not contain any 430 speech information that is represented in the payload by its TOC entry. 431 The third frame consists of bits sp3(0) to sp3(171). 433 The redundancy header follows after speech data. The one-packet-delayed 434 redundancy contains class A+B bits (CL1=2), and two-packet-delayed 435 redundancy contains class A bits (Cl2=1). The one-packet-delayed 436 redundancy contains three frames with 20, 39 and 35 bits respectively. 438 The first frame of two-packet-delayed redundancy is absent, it is 439 represented in its TOC entry, and two other frames have sizes 15 and 19 440 bits. 442 Note that all speech frames are padded with zero bits for byte 443 alignment. 445 0 1 2 3 446 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | sp1(92)|P|P|P|sp3(0) | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 | sp3(171)|P|P|P|P| 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)| 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 |red1_2(0) | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | red1_2(38)|red1_3(0) | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | red2_3(18)|P|P|P|P| 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 5. Media Type Registration 479 This section describes the media types and names associated with this 480 payload format. 482 5.1. Registration of media subtype audio/ip-mr_v2.5 484 Type name: audio 486 Subtype name: ip-mr_v2.5 488 Required parameters: none 490 Optional parameters: 491 * ptime: Gives the length of time in milliseconds represented by the 492 media in a packet. Allowed values are: 20, 40, 60 and 80. 494 Encoding considerations: This media type is framed binary data (see RFC 495 4288, Section 4.8). 497 Security considerations: See RFC 3550 [RFC 3550] 499 Interoperability considerations: none 501 Published specification: RFC XXXX 503 Applications that use this media type: Real-time audio applications like 504 voice over IP and teleconference, and multi-media streaming. 506 Additional information: none 508 Person & email address to contact for further information: 509 Yury Morzeev 510 morzeev@spiritdsp.com 512 Intended usage: COMMON 514 Restrictions on usage: This media type depends on RTP framing, and hence 515 is only defined for transfer via RTP [RFC 3550]. 517 Authors: 518 Sergey Ikonin 520 Change controller: IETF Audio/Video Transport working group delegated 521 from the IESG. 523 5.2. Mapping Media Type Parameters into SDP 525 The information carried in the media type specification has a specific 526 mapping to fields in the Session Description Protocol (SDP) [RFC 4566], 527 which is commonly used to describe RTP sessions. When SDP is used to 528 specify sessions employing the IP-MR codec, the mapping is as follows: 530 o The media type ("audio") goes in SDP "m=" as the media name. 532 o The media subtype (payload format name) goes in SDP "a=rtpmap" 533 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. 535 o The parameter "ptime" goes in the SDP "a=ptime" attributes. 537 Any remaining parameters go in the SDP "a=fmtp" attribute by copying 538 them directly from the media type parameter string as a semicolon- 539 separated list of parameter=value pairs. 541 Note that the payload format (encoding) names are commonly shown in 542 upper case. Media subtypes are commonly shown in lower case. These 543 names are case-insensitive in both places. 545 6. Security Considerations 547 RTP packets using the payload format defined in this specification 548 are subject to the security considerations discussed in the RTP 549 specification [RFC 3550] and in any applicable RTP profile. The main 550 security considerations for the RTP packet carrying the RTP payload 551 format defined within this memo are confidentiality, integrity, and 552 source authenticity. Confidentiality is achieved by encryption of the 553 RTP payload. Integrity of the RTP packets is achieved through a suitable 554 cryptographic integrity protection mechanism. Such a cryptographic 555 system may also allow the authentication of the source of the payload. 557 A suitable security mechanism for this RTP payload format should 558 provide confidentiality, integrity protection, and at least source 559 authentication capable of determining if an RTP packet is from a 560 member of the RTP session. 562 Note that the appropriate mechanism to provide security to RTP and 563 payloads following this memo may vary. It is dependent on the 564 application, the transport, and the signaling protocol employed. 565 Therefore, a single mechanism is not sufficient, although if suitable, 566 usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is 567 recommended. Other mechanisms that may be used are IPsec [RFC 4301] 568 and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other 569 alternatives may exist. 571 This payload format does not exhibit any significant non-uniformity in 572 the receiver side computational complexity for packet processing, and 573 thus is unlikely to pose a denial-of-service threat due to the receipt 574 of pathological data. 576 7. Congestion Control 578 The general congestion control considerations for transporting RTP data 579 apply; see RTP [RFC 3550] and any applicable RTP profile like AVP 580 [RFC 3551]. However, the multi-rate capability of IP-MR speech coding 581 provides a mechanism that may help to control congestion, since the 582 bandwidth demand can be adjusted by selecting a different encoding mode. 584 The number of frames encapsulated in each RTP payload highly 585 influences the overall bandwidth of the RTP stream due to header 586 overhead constraints. Packetizing more frames in each RTP payload 587 can reduce the number of packets sent and hence the overhead from 588 IP/UDP/RTP headers, at the expense of increased delay. 590 If in-band redundancy scheme is used to protect against packet loss, 591 the amount of introduced redundancy will need to be regulated so that 592 the use of redundancy itself does not cause a congestion problem. In 593 other words, a sender SHALL NOT increase the total bitrate when adding 594 redundancy in response to packet loss, and needs instead to adjust it 595 down in accordance to the congestion control algorithm being run. Thus, 596 when adding redundancy, the media bitrate will need to be reduced to 597 provide room for the redundancy. 599 8. IANA Considerations 601 One media type has been defined and needs registration in the media 602 types registry. 604 9. Normative References 606 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 607 Requirement Levels", BCP 14, RFC 2119, March 1997. 609 [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and 610 V. Jacobson, "RTP: A Transport Protocol for Real-Time 611 Applications", STD 64, RFC 3550, July 2003. 613 [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio 614 and Video Conferences with Minimal Control", STD 65, 615 RFC 3551, July 2003. 617 [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 618 Description Protocol", RFC 4566, July 2006. 620 [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 621 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 622 3711, March 2004. 624 [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer 625 Security (TLS) Protocol Version 1.2", RFC 5246, 626 August 2008. 628 [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the 629 Internet Protocol", RFC 4301, December 2005. 631 10. Author(s) Information: 633 Sergey Ikonin 634 email: info@spiritdsp.com 636 Russia 109004 637 Building 27, A. Solzhenitsyna street 638 Tel: +7 495 661-2178 639 Fax: +7 495 912-6786 641 11. Disclaimer 643 This document may contain material from IETF Documents or IETF 644 Contributions published or made publicly available before November 10, 645 2008. The person(s) controlling the copyright in some of this material 646 may not have granted the IETF Trust the right to allow modifications of 647 such material outside the IETF Standards Process. Without obtaining an 648 adequate license from the person(s) controlling the copyright in such 649 materials, this document may not be modified outside the IETF Standards 650 Process, and derivative works of it may not be created outside the IETF 651 Standards Process, except to format it for publication as an RFC or to 652 translate it into languages other than English. 654 12. Legal Terms 656 All IETF Documents and the information contained therein are provided on 657 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 658 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 659 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 660 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 661 INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 662 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 664 The IETF Trust takes no position regarding the validity or scope of any 665 Intellectual Property Rights or other rights that might be claimed to 666 pertain to the implementation or use of the technology described in any 667 IETF Document or the extent to which any license under such rights might 668 or might not be available; nor does it represent that it has made any 669 independent effort to identify any such rights. 671 Copies of Intellectual Property disclosures made to the IETF Secretariat 672 and any assurances of licenses to be made available, or the result of an 673 attempt made to obtain a general license or permission for the use of 674 such proprietary rights by implementers or users of this specification 675 can be obtained from the IETF on-line IPR repository at 676 http://www.ietf.org/ipr. 678 The IETF invites any interested party to bring to its attention any 679 copyrights, patents or patent applications, or other proprietary rights 680 that may cover technology that may be required to implement any standard 681 or specification contained in an IETF Document. Please address the 682 information to the IETF at ietf-ipr@ietf.org. 684 The definitive version of an IETF Document is that published by, or 685 under the auspices of, the IETF. Versions of IETF Documents that are 686 published by third parties, including those that are translated into 687 other languages, should not be considered to be definitive versions of 688 IETF Documents. The definitive version of these Legal Provisions is that 689 published by, or under the auspices of, the IETF. Versions of these 690 Legal Provisions that are published by third parties, including those 691 that are translated into other languages, should not be considered to be 692 definitive versions of these Legal Provisions. 694 For the avoidance of doubt, each Contributor to the IETF Standards 695 Process licenses each Contribution that he or she makes as part of the 696 IETF Standards Process to the IETF Trust pursuant to the provisions of 697 RFC 5378. No language to the contrary, or terms, conditions or rights 698 that differ from or are inconsistent with the rights and licenses 699 granted under RFC 5378, shall have any effect and shall be null and 700 void, whether published or posted by such Contributor, or included with 701 or in such Contribution. 703 APPENDIX A. RETRIEVING FRAME INFORMATION 705 This appendix contains the c-code for implementation of frame parsing 706 function. This function extracts information about coded frame including 707 frame size, number of layers, size of each layer and size of perceptual 708 sensitive classes. 710 A.1. get_frame_info.c 712 /****************************************************************** 714 get_frame_info.c 716 Retrieving frame information for IP-MR Speech Codec 718 ******************************************************************/ 720 #define RATES_NUM 6 // number of codec rates 721 #define SENSE_CLASSES 6 // number of sensitivity classes (A..F) 723 // frame types 724 #define FT_SPEECH 0 // active speech 725 #define FT_DTX_SID 1 // silence insertion descriptor 727 // get specified bit from coded data 728 int GetBit(unsigned char *data, int curBit) 729 { 730 return ((data[curBit >> 3] >> (curBit % 8)) & 1); 731 } 733 // retrieve frame information 734 int GetFrameInfo( // o: frame size in bits 735 short rate, // i: encoding rate (0..5) 736 short base_rate, // i: base (core) layer rate, 737 // if base_rate > rate, then assumed 738 // that base_rate = rate. 739 unsigned char *pCoded, // i: coded bit frame 740 short pLayerBits // o: number of bits in layers 741 [RATES_NUM], 742 short pSenseBits // o: number of bits in sensitivity classes 743 [SENSE_CLASSES], 744 short *nLayers // o: number of layers 745 ) 746 { 747 static const short Bits_1[4] = {0, 9, 9, 15}; 748 static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44, 749 45,43,44,47,36}; 751 static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, 752 {25, 0, 23, 32, 36, 31},}; 754 int FrType; 755 int i,nBits; 757 if (rate < 0 || rate > 5) { 758 return 0; // incorrect stream 759 } 761 for(i = 0; i < SENSE_CLASSES; i++) { 762 pSenseBits[i] = 0; 763 } 765 nBits = 0; 766 // extract frame type bit if required 767 FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID; 769 { 770 int cw_0; 771 int b[14]; 773 // extract meaning bits 774 for(i = 0 ; i < 14; i++) { 775 b[i] = GetBit(pCoded, nBits++); 776 } 778 // parse 779 if(FrType == FT_DTX_SID) { 780 cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); 781 rate = 0; 782 pSenseBits[0] = 10 + Bits_2[cw_0]; 783 } else { 785 int i, idx; 786 int nFlag_1, nFlag_2, cw_1, cw_2; 788 nFlag_1 = b[0] + b[2] + b[4] + b[6]; 789 cw_1 = (cw_1 << 1) | b[0]; 790 cw_1 = (cw_1 << 1) | b[2]; 791 cw_1 = (cw_1 << 1) | b[4]; 792 cw_1 = (cw_1 << 1) | b[6]; 794 nFlag_2 = b[1] + b[3] + b[5] + b[7]; 795 cw_2 = (cw_2 << 1) | b[1]; 796 cw_2 = (cw_2 << 1) | b[3]; 797 cw_2 = (cw_2 << 1) | b[5]; 798 cw_2 = (cw_2 << 1) | b[7]; 800 cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); 801 if (base_rate < 0) base_rate = 0; 802 if (base_rate > rate) base_rate = rate; 803 idx = base_rate == 0 ? 0 : 1; 805 pSenseBits[0] = 15+Bits_2[cw_0]; 806 pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; 807 pSenseBits[2] = nFlag_1*5; 808 pSenseBits[3] = nFlag_2*30; 809 pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); 811 for (i = 1; i < rate+1; i++) { 812 pLayerBits[i] = 4*(Bits_3[idx][i]); 813 } 814 } 816 pLayerBits[0] = 0; 817 for (i = 0; i < SENSE_CLASSES; i++) { 818 pLayerBits[0] += pSenseBits[i]; 819 } 821 *nLayers = rate+1; 822 } 824 { 825 // count total frame size 826 int payloadBitCount = 0; 827 for (i = 0; i < *nLayers; i++) { 828 payloadBitCount += pLayerBits[i]; 829 } 830 return payloadBitCount; 831 } 832 } 834 Authors' Addresses 836 SPIRIT DSP 837 Building 27, A. Solzhenitsyna street 838 109004, Moscow, RUSSIA 840 Tel: +7 495 661-2178 841 Fax: +7 495 912-6786 842 EMail: info@spiritdsp.com