idnits 2.17.1 draft-ietf-avt-rtp-isac-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 288 has weird spacing: '... Header spee...' -- The document date (October 18, 2012) is 4198 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4566 (ref. '3') (Obsoleted by RFC 8866) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. le Grand 3 Internet-Draft Google 4 Intended status: Standards Track P. Jones 5 Expires: April 21, 2013 P. Huart 6 Cisco Systems 7 T. Shabestary 8 H. Alvestrand, Ed. 9 Google 10 October 18, 2012 12 RTP Payload Format for the iSAC Codec 13 draft-ietf-avt-rtp-isac-02 15 Abstract 17 iSAC is a proprietary wideband speech and audio codec developed by 18 Global IP Solutions (now part of Google), suitable for use in Voice 19 over IP applications. This document describes the payload format for 20 iSAC generated bit streams within a Real-Time Protocol (RTP) packet. 21 Also included here are the necessary details for the use of iSAC with 22 the Session Description Protocol (SDP). 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in RFC 2119 [1]. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 21, 2013. 47 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. iSAC Codec Description . . . . . . . . . . . . . . . . . . . . 3 65 3. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . . 5 66 3.1. iSAC Wideband Payload Format . . . . . . . . . . . . . . . 5 67 3.2. Payload Header . . . . . . . . . . . . . . . . . . . . . . 6 68 3.3. Encoded Speech Data . . . . . . . . . . . . . . . . . . . 6 69 3.4. iSAC Superwideband Payload Format . . . . . . . . . . . . 7 70 3.5. Encoded Upper-band Speech Data . . . . . . . . . . . . . . 8 71 3.6. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 3.7. Multiple iSAC frames in an RTP packet . . . . . . . . . . 9 73 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 74 5. Mapping to SDP Parameters . . . . . . . . . . . . . . . . . . 10 75 5.1. Example Initial Target Bit Rate . . . . . . . . . . . . . 11 76 5.2. Example Max Bit Rate . . . . . . . . . . . . . . . . . . . 11 77 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 78 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12 81 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 84 1. Introduction 86 This document gives a general description of the iSAC wideband speech 87 codec and specifies the iSAC payload format for usage in RTP packets. 88 Also included here are the necessary details for the use of iSAC with 89 the Session Description Protocol (SDP). 91 2. iSAC Codec Description 93 The iSAC codec is an adaptive wideband/superwideband speech and audio 94 codec that operates with short delay, making it suitable for high 95 quality real time communication. It is specially designed to deliver 96 wideband speech quality in both low and medium bit rate applications. 97 It also handles non-speech audio well, such as music and background 98 noise [5]. 100 The iSAC codec compresses speech frames of 16 kHz, 16-bit sampled 101 input speech, each frame containing 30 or 60 ms of speech. It also 102 has a superwideband mode which allows a 32 kHz sampling rate. In 103 super-wideband mode the input signal is split into wideband (0-8 kHz) 104 and upper (8-16 kHz) signal. Each sub-band is encoded independently, 105 and their associated payloads concatenated, c.f. Figure 2, to 106 construct the overall iSAC super-wideband RTP payload. Note that the 107 same encoder/decoder is used for the wideband part for both wideband 108 and super-wideband modes. 110 The codec runs in one of two different modes called channel-adaptive 111 mode and channel-independent mode. In both modes iSAC is aiming at a 112 target bit rate, which is neither the average nor the maximum bit 113 rate that will be reach by iSAC, but corresponds to the average bit 114 rate during peaks in speech activity. The bit rate will sometimes 115 exceed the target bit rate, but most of the time will be below. The 116 average bit rate obtained is on average about a factor of 1.2 times 117 lower than the target bit rate on continuous speech, and will be 118 lower on speech with pauses. 120 In channel-adaptive mode the target bit rate is adapted to give a bit 121 rate corresponding to the available bandwidth on the channel. The 122 available bandwidth is continuously estimated at the receiving iSAC 123 and signaled in-band in the iSAC bit stream. Even at dial-up modem 124 data rates (including IP, UDP, and RTP overhead) iSAC delivers high 125 quality by automatically adjusting transmission rates to give the 126 best possible listening experience over the available bandwidth. The 127 default initial target bit rate is 20000 bits per second in channel- 128 adaptive mode. 130 In channel-independent mode a target bit rate has to be provided to 131 iSAC prior to encoding; the target bit rate can be changed over the 132 time of the call. 134 After encoding the speech signal the iSAC coder uses lossless coding 135 to further reduce the size of each packet, and hence the total bit 136 rate used. 138 The adaptation and the lossless coding described above both result in 139 a variation of packet size, depending both of the nature of speech 140 and the available bandwidth. Therefore, the iSAC codec, in wideband 141 mode, operates at transmission rates from about 10 kbps to about 32 142 kbps. In super-wideband mode, the transmission rate is in the range 143 of 10 kbps to 56 kbps. If operating in super-wideband mode, the iSAC 144 codec automatically adjusts the effective encoded audio bandwidth for 145 the best experience. 147 Bit Rate | 10 - 32 | 32 - 38 | 38 - 45 | 45 - 50 | 50 - 56 148 [kbps] | | | | | 149 ----------+----------+------------+----------------------+--------- 150 Effective | | 0 - 8 | 0 - 12 | 0 - 12 | 0 - 16 151 Bandwidth | 0 - 8 kHz| operating | | operating | 152 [kHz] | | at 32 kbps | | at 45 kbps | 154 The main characteristics can be summarized as follows: 156 o Wideband or superwideband, 16 kHz or 32 kHz respectively, speech 157 and audio codec 159 o Variable bit rate, which depends on the input signal 161 o Adaptive rate with two modes: channel-adaptive or channel- 162 independent mode 164 o Bit rate range from around 10 kbps to 32 kbps when operating on 165 wideband input. For input audio sampled at 32 kHz, the bit rate 166 range 10 kbps to 56 kbps. 168 o Operates on 30 or 60 ms of speech for wideband inputs, and only 30 169 ms for super-wideband inputs. 171 o In super-wideband mode, depending on the target bit rate, the 172 effective bandwidth is adjusted for the optimal experience. 174 3. RTP Payload Format 176 The iSAC codec in wideband mode uses a sampling rate clock of 16 kHz, 177 so the RTP timestamp MUST be in units of 1/16000 of a second. In 178 super-wideband mode, the iSAC codec uses a sampling rate clock of 32 179 kHz, so the RTP timestamp MUST be in units of 1/32000 of a second. 181 The RTP payload for iSAC has the format shown in Figure 1. No 182 additional header fields specific to this payload format are 183 required. For RTP based transportation of iSAC encoded audio, the 184 standard RTP header [2] is followed by one payload data block. 186 0 1 2 3 187 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 188 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 189 | RTP Header | 190 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 191 | | 192 + iSAC Payload Block + 193 | | 194 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 195 Figure 1: RTP packet format for iSAC 197 3.1. iSAC Wideband Payload Format 199 The iSAC payload block consists of a payload header and one or two 200 encoded 30 ms speech frames. The iSAC payload is generated in the 201 following manner: 203 o Parameters representing one or two 30 ms frames of speech data are 204 determined by the encoder. The parameters are quantized to 205 generate encoded data corresponding to the one or two speech 206 frames. The length of the encoded data is variable and depends on 207 the signal characteristics and the target bit rate. 209 o The payload header is generated (described in Section 3.2) and 210 added before the encoded parameter data for the speech frame(s). 212 o Lossless coding is applied to the complete iSAC payload block, 213 including payload header, to generate a compressed payload. The 214 length depends on the length of the data generated to represent 215 the speech and the effectiveness of the lossless coding. 217 No part of the payload header or the encoded speech data can be 218 retrieved without partly or fully decoding the packet. 220 The following figure shows an iSAC payload block containing 60 ms of 221 encoded speech data. 223 +--------+--------+--------+--------+--------+--------+------+ 224 |Payload | 30 ms Encoded | 30 ms Encoded | 225 |Header | Speech Data | Speech Data | 226 +--------+--------+--------+--------+--------+--------+------+ 227 Figure 2: Payload format for iSAC 229 3.2. Payload Header 231 The payload header holds information for the receiver about the 232 available bandwidth, in the form of a Bandwidth Estimation Index 233 (BEI), and the length of the speech data in the current payload 234 (frame length, FL). The header has the format defined in Figure 3. 235 Note that the size of the header can vary due to the lossless 236 encoding described in section 2 and in section 3.1. Also note that 237 the BEI is always estimated and transmitted, even if iSAC runs in 238 channel-independent mode. 240 +-+-+-+-+-+-+ 241 | BEI | FL | 242 +-+-+-+-+-+-+ 243 Figure 3: Payload Header 245 o BEI: Bandwidth Estimation Index. The bandwidth estimate is 246 quantized into one out of 24 values. Valid values are 0 to 23. 248 o FL: The length of the speech data (Frame Length) present in the 249 payload, given in number of speech samples. Valid frame lengths 250 are 480 (30 ms) and 960 (60 ms) samples. 252 3.3. Encoded Speech Data 254 The iSAC encoded speech data consist of parameters representing one 255 or two frames of 30 ms speech. The length of the speech data is 256 signaled in the header (in number of samples), and the length may 257 change at any time during a session. In channel-adaptive mode the 258 length is changed to best utilize the available bandwidth, and extra 259 padding is added to some packets as a bandwidth probe. 261 The iSAC payload is padded to whole octets, and has a variable length 262 depending on the input source signal, number of 30 ms speech frames, 263 and target bit rate. 265 The number of octets used to describe one frame of 30 ms speech 266 typically varies from around 50 to around 120 octets. For the case 267 of 60 ms speech (two 30 ms speech frames), the number of octets 268 varies from around 100 to around 240 octets. The absolute maximum 269 allowed payload length is 400 octets. The user can choose to lower 270 the maximum allowed payload length. Minimum value is 100 octets. It 271 is possible for the user to choose a maximum bit rate (averaged over 272 a frame) instead of a maximum payload length. The maximum payload 273 length is then dependent on the length of the speech data represented 274 in the payload (30 or 60 ms). Possible maximum rates are in the 275 range of 32000 to 53400 bits per second. 277 The sensitivity to bit errors is equal for all bits in the payload. 279 3.4. iSAC Superwideband Payload Format 281 In super-wideband mode, payloads associated with each sub-band 282 (wideband 0-8 kHz and upper-band 8-16 kHz) are constructed 283 independently and concatenated as depicted in Figure 2. Note that in 284 super-wideband mode only one 30 ms frame is encoded in each payload. 286 +--------------------------------+---+------------------------+-----+ 287 | Payload +30 ms Encoded wideband|LEN|30 ms Encoded upper-band| CRC | 288 | Header speech data | |speech data |check| 289 +--------------------------------+---+------------------------+-----+ 290 |<--- CRC checked data ->| 292 Figure 4: Super-Wideband payload format 294 Payloads of wideband and upper-band are encoded independently, 295 allowing the encoder to simply concatenate two payloads to construct 296 one iSAC super-wideband payload. The RTP payload of the iSAC super- 297 wideband codec starts with the payload of the wideband part, which is 298 padded to whole octets, followed by one byte (LEN in Figure 4) 299 representing the length of the remaining sequence, payload of the 300 upper-band plus 4 bytes for CRC sequence. 302 If LEN_UB denotes the length of the upper-band payload, then LEN = 1 303 + LEN_UB + 4. This value should not exceed 255, otherwise upper-band 304 payload is omitted. 306 The CRC check is added to distinguish between upper-band payload and 307 random bit-stream padding that can be added for probing available 308 network bandwidth. 310 At the receive side, a super-wideband payload is first given to the 311 wideband decoder. The wideband decoder decodes as many parameters as 312 required to uniquely reproduce the encoded wideband audio. The next 313 byte in the payload should hold the value of LEN. This provides a 314 sanity check that the decoding process has not failed. Thereafter, 315 the receiver runs a CRC check over the upper-band payload and 316 compares the results with the last 4 bytes in the packet. 318 If the computed CRC and the last four bytes of the payload don't 319 match, the remaining bits are assumed to be added for probing the 320 network. Hence, the upper-band signal is replaced by zeros and 321 combined with the wideband signal to generate the super-wideband 322 signal. 324 If the two CRCs match, then the upper-band payload is given to the 325 upper-band decoder. Thereby, the output of the upper-band decoder is 326 combined with the wide-band decoded audio to generate the super- 327 wideband signal. 329 It might be that for a given packet, the wideband decoder uses all 330 the given payload. This can be the case when a super-wideband 331 encoder is operating at low rates and has adjusted the effective 332 bandwidth to wideband. In this case, the decoder inserts zeros as 333 the reconstructed upper-band and combines both bands to reproduce the 334 super-wideband signal. 336 3.5. Encoded Upper-band Speech Data 338 The iSAC encoded upper-band speech data consists of parameters 339 representing one frame of 30 ms speech. Depending on the target rate 340 the upper-band encoder might choose to only encode the sub-band of 8 341 kHz to 12 kHz. This is signaled inband to the receiver. 343 3.6. Padding 345 Padding, which consists of randomly generated bits, may be added at 346 the end of the payload in both wideband and superwideband modes. It 347 can be used by the sender for bandwidth probing, and is always 348 ignored by the receiver. 350 In wideband mode, padding simply follows the payload, preceded by a 351 length field. 353 +----------+---+--------+ 354 | Wideband |LEN|Padding | 355 | payload | | | 356 +----------+---+--------+ 358 Figure 5: Wideband payload format with padding. 360 LEN is the length of the padding in bytes + 1: LEN = LEN_PAD + 1 362 In superwideband mode, the format of a packet with padding looks like 363 the following. 365 +----------+---+-------------+--+--------+-----+ 366 | Wideband |LEN|Upper-band |L2|Padding |CRC | 367 | payload | |speech data | | |check| 368 +----------+---+-------------+--+--------+-----+ 369 |<-- CRC checked data --->| 371 Figure 6: Super-Wideband payload format 373 LEN is 1 + LEN_UB + 1 + LEN_PAD + 4, where LEN_UB is the length of 374 the upper-band speech data in bytes, and LEN_PAD is the length of the 375 padding in bytes. 377 L2 is LEN_PAD + 1. 379 The CRC check runs over the upper-band speech data, L2 and the 380 padding. 382 3.7. Multiple iSAC frames in an RTP packet 384 More than one iSAC payload block MUST NOT be included in an RTP 385 packet by a sender. 387 Further, iSAC payload blocks MUST NOT be split between RTP packets. 389 4. IANA Considerations 391 This document defines the iSAC media type, and requests IANA to 392 register it. 394 Media type name: audio 396 Media subtype: isac 398 Required parameters: None 400 Optional parameters: 402 * ibitrate: The parameter indicates the upper bound of the 403 initial target bit rate the device would like to receive. For 404 channel-adaptive mode, the target bit rate may vary with time; 405 for channel-independent mode, the target bit rate will remain 406 at that level unless instructed otherwise. An acceptable value 407 for ibitrate is in the range of 20000 to 32000 (bits per 408 second). 410 * maxbitrate: The parameter indicates the maximum bit rate the 411 endpoint expects to receive. The recipient of this parameter 412 SHOULD NOT transmit at a higher bit rate. 414 Encoding considerations: 415 This media format is framed and binary. 417 Security considerations: See Section 6 419 Interoperability considerations: None 421 Published specification: RFC XXXX 423 Applications which use this media type: 424 This media type is suitable for use in numerous applications 425 needing to transport encoded voice or other audio. Some examples 426 include Voice over IP, Streaming Media, Voice Messaging, and 427 Conferencing. 429 Additional information: None 431 Intended usage: COMMON 433 Other Information/General Comment: 434 iSAC is a proprietary speech and audio codec owned by Google. The 435 codec operates on 30 or 60 ms speech frames at a sampling rate 436 clock of 16 kHz or 32 kHz. 438 Person to contact for further information: 439 Tina le Grand [tlegrand@google.com] 441 Restrictions on usage: 442 This media type depends on RTP framing, and hence is only defined 443 for transfer via RTP [2]. Transport within other framing 444 protocols is not defined at this time. 446 Change controller: 447 IETF Audio/Video Transport working group delegated from the IESG. 449 Note to the RFC Editor / IANA: Please replace "RFC XXXX" above with 450 the number of this RFC when published, and remove this note. 452 5. Mapping to SDP Parameters 454 The information carried in the media type specification has a 455 specific mapping to fields in the Session Description Protocol (SDP) 456 [3], which is commonly used to describe RTP sessions. When SDP is 457 used to specify sessions employing the iSAC codec, the mapping is as 458 follows: 460 o The media type ("audio") goes in SDP "m=" as the media name. 462 o The media subtype (payload format name) goes in SDP "a=rtpmap" as 463 the encoding name. 465 o Any remaining parameters go in the SDP "a=fmtp" attribute by 466 copying them directly from the media type string as a semicolon 467 separated list of parameter=value pairs. 469 The optional parameter ibitrate MUST NOT be higher than the parameter 470 maxbitrate. 472 The iSAC parameters in an SDP offer are completely independent from 473 those in the SDP answer. For both ibitrate and maxbitrate it is 474 legal for the answer to contain a value that is different than what 475 is provided in an offer. The parameter may be present in the answer, 476 even if absent in the offer. 478 When conveying information by SDP, the encoding name SHALL be "isac" 479 (the same as the media subtype). 481 5.1. Example Initial Target Bit Rate 483 The offer indicates that it wishes to receive a wideband bitstream 484 with an initial target rate of 20000 bits per second. The remote 485 party MAY change its initial target rate to the requested value. 486 m=audio 10000 RTP/AVP 98 487 a=rtpmap: 98 isac/16000 488 a=fmtp:98 ibitrate=20000 490 5.2. Example Max Bit Rate 492 The offer indicates that it wishes to receive a superwideband 493 bitstream with an initial target rate of 20000 bits per second, and a 494 maximum bit rate of 45000 bits per second. The remote party MAY 495 change its initial target rate and SHOULD NOT transmit at a higher 496 rate than 45000. 497 m=audio 10000 RTP/AVP 98 498 a=rtpmap: 98 isac/32000 499 a=fmtp:98 ibitrate=20000;maxbitrate=45000 501 6. Security Considerations 503 RTP packets using the payload format defined in this specification 504 are subject to the general security considerations discussed in RFC 505 3550 section 8.1. 507 As this format transports encoded speech, the main security issues 508 include confidentiality and authentication of the speech itself. The 509 payload format itself does not have any built-in security mechanisms. 510 External mechanisms, such as SRTP [4], MAY be used. 512 7. Acknowledgments 514 This document was originally prepared using 2-Word-v2.0.template.dot. 516 The present version is prepared using xml2rfc and xxe-xml2rfc. 518 8. References 520 8.1. Normative References 522 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 523 Levels", BCP 14, RFC 2119, March 1997. 525 [2] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 526 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 527 RFC 3550, July 2003. 529 [3] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 530 Description Protocol", RFC 4566, July 2006. 532 [4] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 533 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 534 RFC 3711, March 2004. 536 8.2. Informative References 538 [5] GIPS / Google, "iSAC reference implementation". 540 Available at http://code.google.com/p/webrtc/source - directory 541 src/modules/audio_coding/codecs/isac 543 Authors' Addresses 545 Tina le Grand 546 Google 547 Kungsbron 2 548 Stockholm, 11122 549 Sweden 551 Paul E. Jones 552 Cisco Systems 553 7025 Kit Creek Rd. 554 Research Triangle Park, NC 27709 555 USA 557 Phone: +1 919 476 2048 558 Fax: 559 Email: paulej@packetizer.com 560 URI: 562 Pascal Huart 563 Cisco Systems 564 400, Avenue Roumanille, Batiment T3 565 Biot - Sophia Antipolis, 06410 566 France 568 Phone: +33 4 9723 2643 569 Fax: 570 Email: phuart@cisco.com 571 URI: 573 Turaj Zakizadeh Shabestary 574 Google 575 1950 Charleston Road 576 Mountain View, CA 94043 577 USA 579 Phone: 580 Fax: 581 Email: turajs@google.com 582 URI: 584 Harald Alvestrand (editor) 585 Google 586 Kungsbron 2 587 Stockholm, 11122 588 Sweden 590 Phone: 591 Fax: 592 Email: hta@google.com 593 URI: