idnits 2.17.1 draft-ietf-avt-rtp-g719-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1239. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1250. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1257. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1263. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 664 has weird spacing: '... Packet n: 1...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Nov 17, 2008) is 5631 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 4288 (Obsoleted by RFC 6838) -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU-T-G719' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft I. Johansson 4 Intended status: Standards Track Ericsson AB 5 Expires: May 21, 2009 Nov 17, 2008 7 RTP Payload format for G.719 8 draft-ietf-avt-rtp-g719-04 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on May 21, 2009. 35 Abstract 37 This document specifies the payload format for packetization of the 38 G.719 full-band codec encoded audio signals into the Real-time 39 Transport Protocol (RTP). The payload format supports transmission 40 of multiple channels, multiple frames per payload, and interleaving. 42 Table of Contents 44 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 45 2. Definitions and Conventions . . . . . . . . . . . . . . . . . 3 46 3. G.719 Description . . . . . . . . . . . . . . . . . . . . . . 3 47 4. Payload format Capabilities . . . . . . . . . . . . . . . . . 4 48 4.1. Multi-rate Encoding and Rate Adaptation . . . . . . . . . 4 49 4.2. Support for Multi-Channel Sessions . . . . . . . . . . . . 5 50 4.3. Robustness against Packet Loss . . . . . . . . . . . . . . 5 51 4.3.1. Use of Forward Error Correction (FEC) . . . . . . . . 5 52 4.3.2. Use of Frame Interleaving . . . . . . . . . . . . . . 6 53 5. Payload format . . . . . . . . . . . . . . . . . . . . . . . . 7 54 5.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 8 55 5.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 8 56 5.2.1. Basic ToC element . . . . . . . . . . . . . . . . . . 9 57 5.3. Basic mode . . . . . . . . . . . . . . . . . . . . . . . . 10 58 5.4. Interleaved mode . . . . . . . . . . . . . . . . . . . . . 10 59 5.5. Audio Data . . . . . . . . . . . . . . . . . . . . . . . . 11 60 5.6. Implementation Considerations . . . . . . . . . . . . . . 12 61 5.6.1. Receiving Redundant Frames . . . . . . . . . . . . . . 12 62 5.6.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 12 63 5.6.3. Decoding Validation . . . . . . . . . . . . . . . . . 13 64 6. Payload Examples . . . . . . . . . . . . . . . . . . . . . . . 13 65 6.1. 3 mono frames with 2 different bitrates . . . . . . . . . 14 66 6.2. 2 stereo frame-blocks of the same bitrate . . . . . . . . 14 67 6.3. 4 mono frames interleaved . . . . . . . . . . . . . . . . 15 68 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16 69 7.1. Media Type Definition . . . . . . . . . . . . . . . . . . 16 70 7.2. Mapping to SDP . . . . . . . . . . . . . . . . . . . . . . 19 71 7.2.1. Offer/Answer Considerations . . . . . . . . . . . . . 20 72 7.2.2. Declarative SDP Considerations . . . . . . . . . . . . 23 73 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 74 9. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 23 75 10. Security Considerations . . . . . . . . . . . . . . . . . . . 24 76 10.1. Confidentiality . . . . . . . . . . . . . . . . . . . . . 24 77 10.2. Authentication and Integrity . . . . . . . . . . . . . . . 25 78 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 79 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 80 12.1. Informative References . . . . . . . . . . . . . . . . . . 25 81 12.2. Normative References . . . . . . . . . . . . . . . . . . . 26 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 83 Intellectual Property and Copyright Statements . . . . . . . . . . 28 85 1. Introduction 87 This document specifies the payload format for packetization of the 88 G.719 full-band (FB) codec encoded audio signals into the Real-time 89 Transport Protocol (RTP) [RFC3550]. The payload format supports 90 transmission of multiple channels, multiple frames per payload, 91 packet loss robustness methods using redundancy or interleaving. 93 This document starts with conventions, a brief description of the 94 codec, and the payload formats capabilities. The payload format is 95 specified in Section 5. Examples can be found in Section 6. The 96 media type and its mappings to SDP, usage in SDP offer/answer is then 97 specified. The document ends with considerations around congestion 98 control and security. 100 2. Definitions and Conventions 102 The term "frame-block" is used in this document to describe the time- 103 synchronized set of audio frames in a multi-channel audio session. 104 In particular, in an N-channel session, a frame-block will contain N 105 audio frames, one from each of the channels, and all N speech frames 106 represents exactly the same time period. 108 This document contains depictions of bit fields. The most 109 significant bit is always leftmost in the figure on each row and have 110 the lowest enumeration. For fields that are depicted over multiple 111 rows the upper row is more significant than the next. 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in RFC 2119 [RFC2119]. 117 3. G.719 Description 119 The ITU-T G.719 full-band codec is a transform coder based on 120 Modulated Lapped Transform (MLT). G.719 is a low complexity full 121 bandwidth codec for conversational speech and audio coding. The 122 encoder input and decoder output are sampled at 48 kHz. The codec 123 enables full bandwidth, from 20 Hz to 20 kHz, encoding of speech, 124 music and general audio content at rates from 32 kbit/s up to 128 125 kbit/s. The codec operates on 20ms frames and has an algorithmic 126 delay of 40 ms. 128 The codec provides excellent quality for speech, music and other 129 types of audio. Some of the applications for which this coder is 130 suitable are: 132 o Real-time communications such as video conferencing and telephony. 134 o Streaming audio 136 o Archival and messaging 138 The encoding and decoding algorithm can change the bit rate at any 139 20ms frame boundary. The encoder receives the audio sampled at 140 48kHz. The support of other sampling rates is possible by re- 141 sampling the input signal to the codec's sampling rate, i.e. 48kHz, 142 however, this functionality is not part of the standard. 144 The encoding is performed on equally sized frames. For each frame, 145 the encoder decides between two encoding modes, a transient mode and 146 a stationary mode. The decision is based on statistics derived from 147 the input signal. The stationary mode uses a long MLT that leads to 148 a spectrum of 960 coefficients while the transient encoding mode uses 149 a short MLT (higher time resolution transform) which results in 4 150 spectra (4 x 240 = 960 coefficients). The encoding of the spectrum 151 is done in two steps. First, the spectral envelope is computed, 152 quantized and Huffman encoded. The envelope is computed on a non- 153 uniform frequency subdivision. From the coded spectral envelope, a 154 weighted spectral envelope is derived and is used for bit-allocation, 155 this process is also repeated at the decoder, thus only the spectral 156 envelope is transmitted. The output of the bit-allocation is used in 157 order to quantize the spectra. In addition, for stationary frames 158 the encoder estimates the amount of noise level. The decoder applies 159 the reverse operation upon reception of the bit stream. The non- 160 coded coefficients (i.e. no bits allocated) are replaced by entries 161 of a noise codebook which is built based on the decoded coefficients. 163 4. Payload format Capabilities 165 This payload format have a number of capabilities and this section 166 discuss them in some detail. 168 4.1. Multi-rate Encoding and Rate Adaptation 170 G.719 supports multi-rate encoding capability that enables on a per 171 frame basis variation of the encoding rate. This enables support for 172 bit-rate adaptation and congestion control. The possibility to 173 aggregate multiple audio frames into a single RTP payload is another 174 dimension of adaptation. The RTP and payload format overhead can 175 thus be reduced by the aggregation at the cost of increased delay and 176 reduced packet-loss robustness. 178 4.2. Support for Multi-Channel Sessions 180 The RTP payload format defined in this document supports multi- 181 channel audio content (e.g. stereophonic or surround audio sessions). 182 Although the G.719 codec itself does not support encoding of multi- 183 channel audio content into a single bit stream, it can be used to 184 separately encode and decode each of the individual channels. To 185 transport (or store) the separately encoded multi-channel content, 186 the audio frames for all channels that are framed and encoded for the 187 same 20 ms period are logically collected in a "frame-block". 189 At the session setup, out-of-band signaling must be used to indicate 190 the number of channels in the payload type. The order of the audio 191 frames within the frame-block depends on the number of the channels 192 and follows the definition in Section 4.1 of the RTP/AVP Profile 193 [RFC3551]. When using SDP for signaling, the number of channels is 194 specified in the rtpmap attribute. 196 4.3. Robustness against Packet Loss 198 The payload format supports several means, including forward error 199 correction (FEC) and frame interleaving, to increase robustness 200 against packet loss. 202 4.3.1. Use of Forward Error Correction (FEC) 204 Generic forward error correction within RTP is defined, for example, 205 in RFC 5109 [RFC5109]. Audio redundancy coding is defined in RFC 206 2198 [RFC2198]. Either scheme can be used to add redundant 207 information to the RTP packet stream and make it more resilient to 208 packet losses, at the expense of a higher bit rate. Please see 209 either RFCs for a discussion of the implications of the higher bit 210 rate to network congestion. 212 In addition to these media-unaware mechanisms, this memo specifies an 213 optional G.719 specific form of audio redundancy coding, which may be 214 beneficial in terms of packetization overhead. Conceptually, 215 previously transmitted transport frames are aggregated together with 216 new ones. A sliding window can be used to group the frames to be 217 sent in each payload. However, irregular or non-consecutive patterns 218 are also possible by inserting NO_DATA frames between primary and 219 redundant transmissions. Figure 1 below shows an example. 221 --+--------+--------+--------+--------+--------+--------+--------+-- 222 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 223 --+--------+--------+--------+--------+--------+--------+--------+-- 225 <---- p(n-1) ----> 226 <----- p(n) -----> 227 <---- p(n+1) ----> 228 <---- p(n+2) ----> 229 <---- p(n+3) ----> 230 <---- p(n+4) ----> 232 Figure 1: An example of redundant transmission 234 Here, each frame is retransmitted once in the following RTP payload 235 packet. f(n-2)...f(n+4) denote a sequence of audio frames, and p(n- 236 1)...p(n+4) a sequence of payload packets. 238 The mechanism described does not really require signaling at the 239 session setup. However, signalling has been defined to allow for the 240 sender to voluntarily bounding the buffering and delay requirements. 241 If nothing is signalled the use of this mechanism is allowed and 242 unbounded. For a certain timestamp, the receiver may receive 243 multiple copies of a frame containing encoded audio data, even at 244 different encoding rates. The cost of this scheme is bandwidth and 245 the receiver delay necessary to allow the redundant copy to arrive. 247 This redundancy scheme provides a functionality similar to the one 248 described in RFC 2198, but it works only if both original frames and 249 redundant representations are G.719 frames. When the use of other 250 media coding schemes is desirable, one has to resort to RFC 2198. 252 The sender is responsible for selecting an appropriate amount of 253 redundancy based on feedback about the channel conditions, e.g., in 254 the RTP Control Protocol (RTCP) [RFC3550] receiver reports. The 255 sender is also responsible for avoiding congestion, which may be 256 exacerbated by redundancy (see Section 9 for more details). 258 4.3.2. Use of Frame Interleaving 260 To decrease protocol overhead, the payload design allows several 261 audio transport frames to be encapsulated into a single RTP packet. 262 One of the drawbacks of such an approach is that in case of packet 263 loss several consecutive frames are lost. Consecutive frame loss 264 normally renders error concealment less efficient and usually causes 265 clearly audible and annoying distortions in the reconstructed audio. 266 Interleaving of transport frames can improve the audio quality in 267 such cases by distributing the consecutive losses into a number of 268 isolated frame losses, which are easier to conceal. However, 269 interleaving and bundling several frames per payload also increases 270 end-to-end delay and sets higher buffering requirements. Therefore, 271 interleaving is not appropriate for all use cases or devices. 272 Streaming applications should most likely be able to exploit 273 interleaving to improve audio quality in lossy transmission 274 conditions. 276 Note that this payload design supports the use of frame interleaving 277 as an option. The usage of this feature needs to be negotiated in 278 the session setup. 280 The interleaving supported by this format is rather flexible. For 281 example, a continuous pattern can be defined, as depicted in 282 Figure 2. 284 --+--------+--------+--------+--------+--------+--------+--------+-- 285 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 286 --+--------+--------+--------+--------+--------+--------+--------+-- 288 [ p(n) ] 289 [ p(n+1) ] [ p(n+1) ] 290 [ p(n+2) ] [ p(n+2) ] 291 [ p(n+3) ] 292 [ p(n+4) ] 294 Figure 2: An example of interleaving pattern that has constant delay 296 In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are 297 aggregated into packets p(n) to p(n+4), each packet carrying two 298 frames. This approach provides an interleaving pattern that allows 299 for constant delay in both the interleaving and de-interleaving 300 processes. The de-interleaving buffer needs to have room for at 301 least three frames, including the one that is ready to be consumed. 302 The storage space for three frames is needed, for example, when f(n) 303 is the next frame to be decoded: since frame f(n) was received in 304 packet p(n+2), which also carried frame f(n+3), both these frames are 305 stored in the buffer. Furthermore, frame f(n+1) received in the 306 previous packet, p(n+1), is also in the de-interleaving buffer. Note 307 also that in this example the buffer occupancy varies: when frame 308 f(n+1) is the next one to be decoded, there are only two frames, 309 f(n+1) and f(n+3), in the buffer. 311 5. Payload format 313 The main purpose of the payload design for G.719 is to maximize the 314 potential of the codec to its fullest degree with an as minimal 315 overhead as possible. In the design both basic and interleaved modes 316 have been included as the codec is suitable both for conversational 317 and other low delay applications as well as streaming, where more 318 delay is acceptable. 320 The main structural difference between the basic and interleaved 321 modes is the extension of the table of content entries with frame 322 displacement fields in the interleaved mode. The basic mode supports 323 aggregation of multiple consecutive frames in a payload. The 324 interleaved mode supports aggregation of multiple frames that are 325 non-consecutive in time. In both modes it is possible to have frames 326 encoded with different frame types in the same payload. 328 The payload format also supports the usage of G.719 for carrying 329 multi-channel content using one discrete encoder per channel all 330 using the same bit-rate. In this case a complete frame-block with 331 data from all channels are included in the RTP payload. The data is 332 the concatenation of all the encoded audio frames in the order 333 specified for that number of included channels. Also interleaving is 334 done on complete frame-blocks rather than individual audio frames. 336 5.1. RTP Header Usage 338 The RTP timestamp corresponds to the sampling instant of the first 339 sample encoded for the first frame-block in the packet. The 340 timestamp clock frequency SHALL be 48000 Hz. The timestamp is also 341 used to recover the correct decoding order of the frame-blocks. 343 The RTP header marker bit (M) SHALL be set to 1 whenever the first 344 frame-block carried in the packet is the first frame-block in a 345 talkspurt (see definition of the talkspurt in section 4.1 of 346 [RFC3551]). For all other packets the marker bit SHALL be set to 347 zero (M=0). 349 The assignment of an RTP payload type for the format defined in this 350 memo is outside the scope of this document. The RTP profiles in use 351 currently mandates binding the payload type dynamically for this 352 payload format. This is basically necessary due to that the payload 353 type expresses the configuration of the payload itself, i.e. basic or 354 interleaved mode and the number of channels carried. 356 The remaining RTP header fields are used as specified in RFC 3550 357 [RFC3550]. 359 5.2. Payload Structure 361 The payload consists of one or more table of contents (ToC) entries 362 followed by the audio data corresponding to the ToC entries. The 363 following sections describe both the basic mode and the interleaved 364 mode. Each ToC entry MUST be padded to a byte boundary to ensure 365 octet alignment. The rules regarding maximum payload size given in 366 Section 3.2 of [I-D.ietf-tsvwg-udp-guidelines] SHOULD be followed. 368 5.2.1. Basic ToC element 370 All the different formats and modes in this draft use a common basic 371 ToC which may be extended in the different options described below. 373 0 1 2 3 4 5 6 7 374 +-+-+-+-+-+-+-+-+ 375 |F| L |R|R| 376 +-+-+-+-+-+-+-+-+ 378 Figure 3: Basic TOC element 380 F (1 bit): If set to 1, indicates that this ToC entry is followed by 381 another ToC entry; if set to 0, indicates that this ToC entry is 382 the last one in the ToC. 384 L (5 bits): A field that gives the frame length of each individual 385 frame within the frame-block. 387 L length(bytes) 388 ============================ 389 0 0 NO_DATA 390 1-7 N/A (reserved) 391 8-22 80+10*(L-8) 392 23-27 240+20*(L-23) 393 28-31 N/A (reserved) 395 Figure 4: How to map L values to frame lengths 397 L=0 (NO_DATA) is used to indicate an empty frame, this is useful 398 if frames are missing e.g at re-packetization or to insert gaps 399 when sending redundant frames together with primary frames in the 400 same payload. 401 The value range [1..7] and [28..31] inclusive is reserved for 402 future use in this draft version, if these values occur in a ToC 403 the entire packet SHOULD be treated as invalid and discarded. 404 A few examples are given below where the frame size and the 405 corresponding codec bitrate is computed based on the value L. 407 L Bytes Codec Bitrate(kbps) 408 =================================== 409 8 80 32 410 9 90 36 411 10 100 40 412 12 120 48 413 16 160 64 414 22 220 88 415 23 240 96 416 25 280 112 417 27 320 128 419 Figure 5: Examples of L values and corresponding frame lengths 421 This encoding yields a granularity of 4kbps between 32 and 88kbps 422 and a granularity of 8kbps between 88 and 128kbps with a defined 423 range of 32-128kbps for the codec data. 425 R (2bits): Reserved bits. SHALL be set to 0 on sending and SHALL be 426 ignored on reception. 428 5.3. Basic mode 430 The basic ToC element Figure 3 is followed by a one octet field for 431 the number of frame-blocks (#frames) to form the ToC entry. The 432 frame-blocks field tells how many frame-blocks of the same length the 433 ToC entry relates to. 435 0 1 2 3 4 5 6 7 436 +-+-+-+-+-+-+-+-+ 437 | #frames | 438 +-+-+-+-+-+-+-+-+ 440 Figure 6: Number of frame-blocks field 442 5.4. Interleaved mode 444 The basic ToC is followed by a one octet field for the number of 445 frame-blocks (#frames) and then the DIS fields to form a ToC entry in 446 interleaved mode. The frame-blocks field tells how many frame-blocks 447 of the same length the ToC relates to. The DIS fields, one for each 448 frame-block indicated by the #frames field, express the interleaving 449 distance between audio frames carried in the payload. If necessary 450 to achieve octet alignment, a 4-bit padding is added. 452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 | #frames | DIS1 | ... | DISi | ... | DISn | Padd | 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 Figure 7: Number of frame-block + interleave fields 458 DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields 459 indicating the displacement of the i:th (i=1..n) audio frame-block 460 relative to the preceding frame-block in the payload, in units of 461 20 ms long audio frame-blocks). The four-bit unsigned integer 462 displacement values may be between 0 and 15 indicating the number 463 of audio frame-blocks in decoding order between the (i-1):th and 464 the i:th frame in the payload. Note that for the first ToC entry 465 of the payload the value of DIS1 is meaningless. It SHALL be set 466 to zero by a sender, and SHALL be ignored by a receiver. This 467 frame-block's location in the decoding order is uniquely defined 468 by the RTP timestamp. Note that for subsequent ToC entries DIS1 469 indicates the number of frames between the last frame of the 470 previous group and the first frame of this group. 472 Padd (4 bits): To ensure octet alignment, four padding bits SHALL be 473 included at the end of the ToC entry in case there is an odd 474 number of frame-blocks in the group referenced by this ToC entry. 475 These bits SHALL be set to zero and SHALL be ignored by the 476 receiver. If a group containing an even number of frames is 477 referenced by this ToC entry, these padding bits SHALL NOT be 478 included in the payload. 480 5.5. Audio Data 482 The audio data part follows the table of contents. All the octets 483 comprising an audio frame SHALL be appended to the payload as a unit. 484 For each frame-block the audio frames are concatenated in order 485 indicated by table in Section 4.1 of [RFC3551] for the number of 486 channels configured for the payload type in use. So the first 487 channel (left most) indicated comes first followed by the next 488 channel. The audio frame-blocks are packetized in increasing 489 timestamp order within each group of frame-blocks (per ToC entry), 490 i.e. oldest frame-block first. The groups of frame-blocks are 491 packetized in the same order as their corresponding ToC entries. 493 The audio frames are specified in ITU recommendation [ITU-T-G719]. 495 The G.719 bit stream is split into a sequence of octets and 496 transmitted in order from the left most (most significant-MSB) bit to 497 the right most (least significant -LSB) bit. 499 5.6. Implementation Considerations 501 An application implementing this payload format MUST understand all 502 the payload parameters specified in this specification. Any mapping 503 of the parameters to a signaling protocol MUST support all 504 parameters. So an implementation of this payload format in an 505 application using SDP is required to understand all the payload 506 parameters in their SDP-mapped form. This requirement ensures that 507 an implementation always can decide whether it is capable of 508 communicating when the communicating entities support this version of 509 the specification. 511 Basic mode SHALL be implemented and the interleaved mode SHOULD be 512 implemented. The implementation burden of both is rather small, and 513 supporting both ensures interoperability. However, interleaving is 514 not mandated as it has limited applicability for conversational 515 application that requires tight delay boundaries. 517 5.6.1. Receiving Redundant Frames 519 The reception of redundant audio frames, i.e. more than one audio 520 frame from the same source for the same time slot, MUST be supported 521 by the implementation. In the case that the receiver gets multiple 522 audio frames in different bit-rates for the same time slot it is 523 RECOMMENDED that the receiver keeps the one with the highest bit- 524 rate. 526 5.6.2. Interleaving 528 The use of interleaving requires further considerations. As 529 presented in the example in Section 4.3.2, a given interleaving 530 pattern requires a certain amount of the de-interleaving buffer. 531 This buffer space, expressed in a number of transport frame slots, is 532 indicated by the "interleaving" media type parameter. The number of 533 frame slots needed can be converted into actual memory requirements 534 by considering the 320 bytes per frame used by the highest bit-rate 535 rate of G.719. 537 The information about the frame buffer size is not always sufficient 538 to determine when it is appropriate to start consuming frames from 539 the interleaving buffer. Additional information is needed when the 540 interleaving pattern changes. The "int-delay" media type parameter 541 is defined to convey this information. It allows a sender to 542 indicate the minimal media time that needs to be present in the 543 buffer before the decoder can start consuming frames from the buffer. 544 Because the sender has full control over the interleaving pattern, it 545 can calculate this value. In certain cases (for example, if joining 546 a multicast session with interleaving mid-session), a receiver may 547 initially receive only part of the packets in the interleaving 548 pattern. This initial partial reception (in frame sequence order) of 549 frames can yield too few frames for acceptable quality from the audio 550 decoding. This problem also arises when using encryption for access 551 control, and the receiver does not have the previous key. Although 552 the G.719 is robust and thus tolerant to a high random frame erasure 553 rate, it would have difficulties handling consecutive frame losses at 554 startup. Thus, some special implementation considerations are 555 described. 557 In order to handle this type of startup efficiently, decoding can 558 start provided that: 560 1. There are at least two consecutive frames available. 562 2. More than or equal to half the frames are available in the time 563 period from where decoding was planned to start and the most 564 forward received decoding. 566 After receiving a number of packets, in the worst case as many 567 packets as the interleaving pattern covers, the previously described 568 effects disappear and normal decoding is resumed. Similar issues 569 arise when a receiver leaves a session or has lost access to the 570 stream. If the receiver leaves the session, this would be a minor 571 issue since playout is normally stopped. The sender can avoid this 572 type of problem in many sessions by starting and ending interleaving 573 patterns correctly when risks of losses occur. One such example is a 574 key-change done for access control to encrypted streams. If only 575 some keys are provided to clients and there is a risk of they 576 receiving content for which they do not have the key, it is 577 recommended that interleaving patterns do not overlap key changes. 579 5.6.3. Decoding Validation 581 If the receiver finds a mismatch between the size of a received 582 payload and the size indicated by the ToC of the payload, the 583 receiver SHOULD discard the packet. This is recommended because 584 decoding a frame parsed from a payload based on erroneous ToC data 585 could severely degrade the audio quality. 587 6. Payload Examples 589 A few examples to highlight the payload format 591 6.1. 3 mono frames with 2 different bitrates 593 The first example is a payload consisting of 3 mono frames where the 594 2 first frames correspond to a bitrate of 32kbps (80byte/frame) and 595 the last is 48kbps (120byte/frame). 597 The first 32 bits are ToC fields. 598 Bit 0 is '1' as another ToC field follow. 599 Bits 1..5 is 01000 = 80bytes/frame 600 Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame 601 Bit 16 is '0', no more ToC follows 602 Bits 17..21 is 01100 = 120 bytes/frame 603 Bits 24..31 = 00000001 = 1 frame-block with 120bytes/frame 605 0 1 2 3 606 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 |1|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0|0|0 1 1 0 0|0 0|0 0 0 0 0 0 0 1| 609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 610 |d(0) frame 1 | 611 . . 612 | d(639)| 613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 |d(0) frame 2 | 615 . . 616 | d(639)| 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 |d(0) frame 3 | 619 . . 620 | d(959)| 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 6.2. 2 stereo frame-blocks of the same bitrate 625 A payload consisting of 2 stereo frames corresponding to a bitrate of 626 32kbps (80byte/frame) per channel. The receiver calculates the 627 number of frames in the audio block by multiplying the value of the 628 channels parameter (2) with the #frames field value (2) to derive 629 that there are 4 audio frames in the payload. 631 The first 16 bits is the ToC field. 632 Bit 0 is '0' as no ToC field follow. 633 Bits 1..5 is 01000 = 80bytes/frame 634 Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame 635 0 1 2 3 636 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 638 |0|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0| d(0) frame 1 left ch. | 639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 640 . . 641 | d(639)| d(0) frame 1 right ch. | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 . . 644 | d(639)| d(0) frame 2 left ch. | 645 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 646 . . 647 | d(639)| d(0) frame 2 right ch. | 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 | d(639)| 650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 652 6.3. 4 mono frames interleaved 654 A payload consisting of 4 mono frames corresponding to a bitrate of 655 32kbps (80byte/frame) interleaved. A pattern of interleaving for 656 constant delay when aggregating 4 frames is used in the below 657 example. The actual packet illustrated is packet n, while the 658 previous and following packets frame-block content is shown to 659 illustrate the pattern. 661 Packet n-3: 1, 6, 11, 16 662 Packet n-2: 5, 10, 15, 20 663 Packet n-1: 9, 14, 19, 24 664 Packet n: 13, 18, 23, 28 665 Packet n+1: 17, 22, 27, 32 666 Packet n+2: 21, 26, 31, 36 668 The first 16 bits is the ToC field. 669 Bit 0 is '0' as there are no ToC field following. 670 Bits 1..5 is 01000 = 80bytes/frame 671 Bits 8..15 is 00000100 = 4 frame-blocks with 80bytes/frame 672 Bits 16..19 is 0000 = DIS1 (0) 673 Bits 20..23 is 0100 = DIS2 (4) 674 Bits 24..27 is 0100 = DIS3 (4) 675 Bits 28..31 is 0100 = DIS4 (4) 676 0 1 2 3 677 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 |0|0 1 0 0 0|0 0|0 0 0 0 0 1 0 0|0 0 0 0|0 1 0 0|0 1 0 0|0 1 0 0| 680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 | d(0) frame 13 | 682 . . 683 | d(639)| 684 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 685 | d(0) frame 18 | 686 . . 687 | d(639)| 688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 | d(0) frame 23 | 690 . . 691 | d(639)| 692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 693 | d(0) frame 28 | 694 . . 695 | d(639)| 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 7. Payload Format Parameters 700 This RTP payload format is identified using the media type audio/g719 701 which is registered in accordance with [RFC4855] and using the 702 template of [RFC4288]. 704 7.1. Media Type Definition 706 The media type for the G.719 codec is allocated from the IETF tree 707 since G.719 is a has the potential to become a widely used audio 708 codec in general VoIP, teleconferencing and streaming applications. 709 This media type registration covers real-time transfer via RTP. 711 Note, any unspecified parameter MUST be ignored by the receiver to 712 ensure that additional parameters can be added in any future revision 713 of this specification. 715 Type name: audio 717 Subtype name: g719 719 Required parameters: none 721 Optional parameters: 723 interleaving: Indicates that interleaved mode SHALL be used for the 724 payload. The parameter specifies the number of frame-block slots 725 available in a de-interleaving buffer (including the frame that is 726 ready to be consumed). Its value is equal to one plus the maximum 727 number of frames that can precede any frame in transmission order 728 and follow the frame in RTP timestamp order. The value MUST be 729 greater than zero. If this parameter is not present, interleaved 730 mode SHALL NOT be used. 732 int-delay: The minimal media time delay in milliseconds that is 733 needed to avoid underrun in the de-interleaving buffer before 734 starting decoding, i.e., the difference in RTP timestamp ticks 735 between the earliest and latest audio frame present in the de- 736 interleaving buffer expressed in milliseconds. The value is a 737 stream property and provided per source. The allowed values are 0 738 to the largest value expressible by a unsigned 16 bit integer 739 (65535). Please note that the in practice largest value that can 740 be used is equal to the declared size of the interleaving buffer 741 of the receiver. If the value for some reason is larger than the 742 receiver buffer declared by or for the receiver this value 743 defaults to the size of the receiver buffer. For sources for 744 which this value hasn't been provided the value defaults to the 745 size of the receiver buffer. The format is comma separated list 746 of SSRC ":" delay in ms pairs which in ABNF [RFC5234] is expressed 747 as: 749 int-delay = "int-delay:" source-delay *("," source-delay) 751 source-delay = SSRC ":" delay-value 753 SSRC = 1*8HEXDIG ; The 32-bit SSRC encoded in hex format 755 delay-value = 1*5DIGIT ; The delay value in milliseconds 757 Example: int-delay=ABCD1234:1000,4321DCB:640 759 NOTE: No white space allowed in the parameter before the end of 760 all the value pairs 762 max-red: The maximum duration in milliseconds that elapses between 763 the primary (first) transmission of a frame and any redundant 764 transmission that the sender will use. This parameter allows a 765 receiver to have a bounded delay when redundancy is used. Allowed 766 values are between 0 (no redundancy will be used) and 65535. If 767 the parameter is omitted, no limitation on the use of redundancy 768 is present. 770 channels: The number of audio channels. The possible values (1-6) 771 and their respective channel order is specified in Section 4.1 in 772 [RFC3551]. If omitted, it has the default value of 1. 774 CBR: Constant Bit Rate (CBR), indicates the exact codec-bitrate in 775 bits per second (not including the overhead from packetization, 776 RTP header or lower layers) that the codec MUST use. CBR is to be 777 used when dynamic rate cannot be supported (one case is e.g 778 gateway to H.320). CBR is mostly used for gateways to circuit 779 switch networks. Therefore the CBR rate is the rate not including 780 any FEC as specified in Section 4.3.1. If FEC is to be used the 781 b= parameter MUST be used to allow the extra bit rate needed to 782 send the redundant information. It is RECOMMENDED that this 783 parameter is only used when necessary to establish a working 784 communication. The usage of this parameter have implications on 785 congestion control that needs to be considered, see Section 9. 787 ptime: see [RFC4566]. 789 maxptime: see [RFC4566]. 791 Encoding considerations: 793 This media type is framed and binary, see section 4.8 in RFC4288 794 [RFC4288]. 796 Security considerations: 798 See Section 10 of RFC XXXX. 800 Interoperability considerations: 802 The support of the Interleaving mode is not mandatory and needs to 803 be negotiated. See Section 7.2 for how to that for SDP based 804 protocols. 806 Published specification: 808 RFC XXXX 810 Applications that use this media type: 812 Real-time audio applications like voice over IP and 813 teleconference, and multi-media streaming. 815 Additional information: none 817 Person & email address to contact for further information: 819 Payload format: IngemarJohansson 820 822 Intended usage: COMMON 824 Restrictions on usage: 826 This media type depends on RTP framing, and hence is only defined 827 for transfer via RTP [RFC3550]. Transport within other framing 828 protocols is not defined at this time. 830 Author: 832 Ingemar Johansson 834 Magnus Westerlund 836 Change controller: 838 IETF Audio/Video Transport working group delegated from the IESG. 840 Additional Information: 842 File storage of G.719 encoded audio in ISO base media file format 843 is specified in Annex A of [ITU-T-G719]. Thus media file formats 844 such as MP4 (audio/mp4 or video/mp4) [RFC4337] and 3GP (audio/3GPP 845 and video/3GPP) [RFC3839] can contain G.719 encoded audio. 847 7.2. Mapping to SDP 849 The information carried in the media type specification has a 850 specific mapping to fields in the Session Description Protocol (SDP) 851 [RFC4566], which is commonly used to describe RTP sessions. When SDP 852 is used to specify sessions employing the G.719 codec, the mapping is 853 as follows: 855 o The media type ("audio") goes in SDP "m=" as the media name. 857 o The media subtype (payload format name) goes in SDP "a=rtpmap" as 858 the encoding name. The RTP clock rate in "a=rtpmap" MUST be 859 48000, and the encoding parameter "channels" (Section 7.1) MUST 860 either be explicitly set to N or omitted, implying a default value 861 of 1. The values of N that are allowed are specified in Section 862 4.1 in [RFC3551]. 864 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 865 "a=maxptime" attributes, respectively. 867 o Any remaining parameters go in the SDP "a=fmtp" attribute by 868 copying them directly from the media type parameter string as a 869 semicolon-separated list of parameter=value pairs. 871 7.2.1. Offer/Answer Considerations 873 The following considerations apply when using SDP Offer-Answer 874 procedures to negotiate the use of G.719 payload in RTP: 876 o Each combination of the RTP payload transport format configuration 877 parameters (interleaving, and channels) is unique in its bit- 878 pattern and not compatible with any other combination. When 879 creating an offer in an application desiring to use the more 880 advanced features (interleaving, or more than one channel), the 881 offerer is RECOMMENDED to also offer a payload type containing 882 only the configuration with a single channel. If multiple 883 configurations are of interest to the application, they may all be 884 offered; however, care should be taken not to offer too many 885 payload types. An SDP answerer MUST include, in the SDP answer 886 for a payload type, the following parameters unmodified from the 887 SDP offer (unless it removes the payload type): "interleaving"; 888 and "channels". However, the value of the Interleaving parameter 889 MAY be changed. The SDP offerer and answerer MUST generate G.719 890 packets as described by these parameters. 892 o The "interleaving" and "int-delay" parameter's values have a 893 specific relationship that needs to be considered. It also 894 depends on the directionality of the streams and their delivery 895 method. The high level explanation that can be understood from 896 the definition is that the value of "interleaving" declares the 897 size of the receiver buffer, while int-delay is a stream property 898 provided by the sender to inform how much buffer space it in 899 practice is using for the stream it sends. 901 * For media streams which is sent over multicast the value of 902 "interleaving" SHALL NOT be changed by the answerer. It shall 903 either be accepted or the payload type deleted. The value of 904 the "int-delay" parameter is a stream property and provided by 905 the offer/answer agent that intends to send media with this 906 payload type, and for each stream coming from that agent (one 907 or more). The value MUST be between 0 and what corresponds to 908 the buffer size declared by the value of the "interleaving" 909 parameter. 911 * For unicast streams which the offerer declares as send-only the 912 value of the "interleaving" parameter is the size that the 913 answerer is RECOMMENDED to use by the offerer. The answerer 914 MAY change it to any allowed value. The int-delay parameter 915 value will be the one the offerer intends to use unless the 916 answerer reduce the value of the interleaving parameter below 917 what is needed for that int-delay value. If the interleaving 918 value in the answer is smaller than the offer's int-delay, the 919 int-delay value is per default reduced to be corresponding to 920 the interleaving value. If the offerer is not satisfied with 921 this he will need to perform another round of offer/answer. As 922 the answerer will not send any media it doesn't include any 923 int-delay in the answer. 925 * For unicast streams which the offerer declares as recvonly the 926 value of interleaving in the offer will be the offerer's size 927 of the interleaving buffer. The answerer indicate its 928 preferred size of the interleaving buffer for any future round 929 of offer/answer. The offerer will not provide any int-delay 930 parameter as it is not sending any media. The answerer is 931 recommended in its answer include a int-delay parameter to 932 declare what the property is for the stream it is going to 933 send. As it already know the receivers interleaving buffer 934 size, there should be no issue with providing a value that is 935 between 0 and corresponding to a full de-interleaving buffer. 937 * For unicast streams which the offer declares as sendrecv 938 streams the value of the interleaving parameter in the offer 939 will be offerer's size of the interleaving buffer. The 940 answerer will in the answer indicate the size of its actual 941 interleaving buffer. It is recommended that this value is as 942 least as big as the offer's. The offerer is recommended to 943 include a int-delay parameter that is selected based on that 944 the answerer has at least as much interleaving space as the 945 offerer unless nothing else is known. As the offerer's 946 interleaving buffer size is not yet known this may fail, in 947 which cases the default rule is to downgrade the value of the 948 int-delay to correspond to the full size of the answerer's 949 interleaving buffer. If the offerer isn't satisfied with this 950 it will need to initiate another round of offer/answer. The 951 answerer is recommended in its answer include a int-delay 952 parameter to declare what the property is for the stream(s) it 953 is going to send. As it already know the receivers 954 interleaving buffer size, there should be no issue with 955 providing a value that is between 0 and corresponding to a full 956 de-interleaving buffer. 958 o In most cases, the parameters "maxptime" and "ptime" will not 959 affect interoperability; however, the setting of the parameters 960 can affect the performance of the application. The SDP offer- 961 answer handling of the "ptime" parameter is described in 962 [RFC3264]. The "maxptime" parameter MUST be handled in the same 963 way. 965 o The parameter "max-red" is a stream property parameter. For 966 sendonly or sendrecv unicast media streams, the parameter declares 967 the limitation on redundancy that the stream sender will use. For 968 recvonly streams, it indicates the desired value for the stream 969 sent to the receiver. The answerer MAY change the value, but is 970 RECOMMENDED to use the same limitation as the offer declares. In 971 the case of multicast, the offerer MAY declare a limitation; this 972 SHALL be answered using the same value. A media sender using this 973 payload format is RECOMMENDED to always include the "max-red" 974 parameter. This information is likely to simplify the media 975 stream handling in the receiver. This is especially true if no 976 redundancy will be used, in which case "max-red" is set to 0. 978 o Any unknown parameter in an offer SHALL be removed in the answer. 980 o The b= SDP parameter SHOULD be used to negotiate the maximum 981 bandwidth to be used for the audio stream. The offerer may offer 982 a maximum rate and the answer may contain a lower rate. If no b= 983 parameter is present in the offer or answer it implies a rate up 984 to 128kbps 986 o The parameter "CBR" is a receiver capability, i.e. only receivers 987 that really requires constant bit-rate should use it. Usage of 988 this parameter have negative impact on the possibility to perform 989 congestion control, see Section 9. For recvonly and sendrecv 990 streams, it indicates the desired constant bit rate that the 991 receiver wants to accept. A sender MUST be able to send constant 992 bit rate stream since it is a subset of the variable bit rate 993 capability. If the offer includes this parameter the answerer 994 MUST send G.719 audio at the constant bit rate if it is within the 995 allowed session bit rate (b= parameter). If the answerer can not 996 support the stated CBR this payload type must be refused in the 997 answer. The answerer SHOULD only include this parameter if it 998 self requires to receive at a constant bit rate, even if the offer 999 did not include the CBR parameter. In this case, the offerer 1000 SHALL send at the constant bit rate but SHALL be able to accept 1001 media at variable bit rate. An answerer is RECOMMEND to use the 1002 same CBR rate as in the offer, as symmetric usage is more likely 1003 to work. If both sides requires a particular CBR rate there is 1004 the possibility of communication failure when one or both sides 1005 can't transmit the requested rate. In this case the agent 1006 detecting this issue will have to perform a second round of offer/ 1007 answer to try to find another working configuration or end the 1008 established session. In case the offer contained a CBR parameter 1009 but the answer does not, then the offerer is free to transmit at 1010 any rate to the answerer, but the answerer is restricted to the 1011 declared rate. 1013 7.2.2. Declarative SDP Considerations 1015 In declarative usage, like SDP in RTSP [RFC2326] or SAP [RFC2974], 1016 the parameters SHALL be interpreted as follows: 1018 o The payload format configuration parameters (interleaving, and 1019 channels) are all declarative, and a participant MUST use the 1020 configuration(s) that is provided for the session. More than one 1021 configuration may be provided if necessary by declaring multiple 1022 RTP payload types; however, the number of types should be kept 1023 small. 1025 o It might not be possible to know the SSRC values that are going to 1026 be used by the sources at the time of sending the SDP. This is 1027 not a major issues as the size of the interleaving buffer can be 1028 tailored towards the values actually going to be used. Thus 1029 ensuring that the default values for int-delay is not resulting in 1030 to much extra buffering. 1032 o Any "maxptime" and "ptime" values should be selected with care to 1033 ensure that the session's participants can achieve reasonable 1034 performance. 1036 o The parameter "CBR" if included applies to all RTP streams using 1037 that payload type for which a particular CBR rate is declared. 1038 Usage of this parameter have negative impact on the possibility to 1039 perform congestion control, see Section 9. 1041 8. IANA Considerations 1043 One media type (audio/g719) has been defined and needs registration 1044 in the media types registry; see Section 7.1. 1046 9. Congestion Control 1048 The general congestion control considerations for transporting RTP 1049 data apply; see RTP [RFC3550] and any applicable RTP profile like AVP 1050 [RFC3551]. However, the multi-rate capability of G.719 audio coding 1051 provides a mechanism that may help to control congestion, since the 1052 bandwidth demand can be adjusted (within the limits of the codec) by 1053 selecting a different encoding bit-rate. 1055 The number of frames encapsulated in each RTP payload highly 1056 influences the overall bandwidth of the RTP stream due to header 1057 overhead constraints. Packetizing more frames in each RTP payload 1058 can reduce the number of packets sent and hence the header overhead, 1059 at the expense of increased delay and reduced error robustness. If 1060 forward error correction (FEC) is used, the amount of FEC-induced 1061 redundancy needs to be regulated such that the use of FEC itself does 1062 not cause a congestion problem. In other words a sender SHALL NOT 1063 increase the total bit-rate when adding redundancy in response to 1064 packet loss, and needs instead to adjust it down in accordance to the 1065 congestion control algorithm being run. Thus when adding redundancy 1066 the media bit-rate will generally be needed to reduced to free up the 1067 bit-rate that is used for redundancy. 1069 The CBR signalling parameter allows a receiver to lock down a RTP 1070 payload type to use a single encoding rate. As this prevents the 1071 codec rate from being lowered when congestion is experienced, the 1072 sender is constrained to either change the packetization or abort the 1073 transmission. Since these responses to congestion are severely 1074 limited, implementations SHOULD NOT use the CBR parameter unless they 1075 are interacting with a device that cannot support variable bit rate 1076 (e.g. a gateway to H.320 systems). When using CBR mode, a receiver 1077 MUST monitor the packet loss rate to ensure congestion is not caused, 1078 following the guidelines in Section 2 of RFC 3551. 1080 10. Security Considerations 1082 RTP packets using the payload format defined in this specification 1083 are subject to the general security considerations discussed in RTP 1084 [RFC3550] and any applicable profile such as AVP [RFC3551] or SAVP 1085 [RFC3711]. As this format transports encoded audio, the main 1086 security issues include confidentiality, integrity protection, and 1087 data origin authentication of the audio itself. The payload format 1088 itself does not have any built-in security mechanisms. Any suitable 1089 external mechanisms, such as SRTP [RFC3711], MAY be used. 1091 This payload format and the G.719 decoder do not exhibit any 1092 significant non-uniformity in the receiver-side computational 1093 complexity for packet processing, and thus are unlikely to pose a 1094 denial-of-service threat due to the receipt of pathological data. 1095 The payload format or the codec data does not contain any type of 1096 active content such as scripts. 1098 10.1. Confidentiality 1100 In order to ensure confidentiality of the encoded audio, all audio 1101 data bits MUST be encrypted. There is less need to encrypt the 1102 payload header or the table of contents since they only carry 1103 information about the frame type. This information could also be 1104 useful to a third party, for example, for quality monitoring. 1105 However, as there currently don't exist any mechanism supporting 1106 differential protection, this behavior isn't expected to be supported 1107 and requirement of the audio data will be what governs the protection 1108 of the RTP payload. 1110 The use of interleaving in conjunction with encryption can have a 1111 negative impact on confidentiality, for a short period of time. 1112 Consider the following packets (in brackets) containing frame numbers 1113 as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular 1114 continuous diagonal interleaving pattern). The originator wishes to 1115 deny some participants the ability to hear material starting at time 1116 16. Simply changing the key on the packet with the timestamp at or 1117 after 16, and denying that new key to those participants, does not 1118 achieve this; frames 17, 18, and 21 have been supplied in prior 1119 packets under the prior key, and error concealment may make the audio 1120 intelligible at least as far as frame 18 or 19, and possibly further. 1122 10.2. Authentication and Integrity 1124 To authenticate the sender of the audio-stream, an external mechanism 1125 MUST be used. It is RECOMMENDED that such a mechanism protects both 1126 the complete RTP header and the payload (audio and data bits). Data 1127 tampering by a man-in-the-middle attacker could replace audio content 1128 and also result in erroneous depacketization/decoding that could 1129 lower the audio quality. 1131 11. Acknowledgements 1133 The authors would like to thank Roni Even and Anisse Taleb for their 1134 help with this draft. We would also like to thank the people that 1135 has provided feedback; Colin Perkins, Mark Baker and Stephen Botzko. 1137 12. References 1139 12.1. Informative References 1141 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1142 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1143 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1144 September 1997. 1146 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 1147 Streaming Protocol (RTSP)", RFC 2326, April 1998. 1149 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 1150 Announcement Protocol", RFC 2974, October 2000. 1152 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1153 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1154 RFC 3711, March 2004. 1156 [RFC3839] Castagno, R. and D. Singer, "MIME Type Registrations for 1157 3rd Generation Partnership Project (3GPP) Multimedia 1158 files", RFC 3839, July 2004. 1160 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1161 Registration Procedures", BCP 13, RFC 4288, December 2005. 1163 [RFC4337] Y Lim and D. Singer, "MIME Type Registration for MPEG-4", 1164 RFC 4337, March 2006. 1166 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1167 Formats", RFC 4855, February 2007. 1169 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1170 Correction", RFC 5109, December 2007. 1172 12.2. Normative References 1174 [I-D.ietf-tsvwg-udp-guidelines] 1175 Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 1176 for Application Designers", 1177 draft-ietf-tsvwg-udp-guidelines-11 (work in progress), 1178 October 2008. 1180 [ITU-T-G719] 1181 ITU-T, "Specification : ITU-T G.719 extension for 20 kHz 1182 fullband audio", April 2008. 1184 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1185 Requirement Levels", BCP 14, RFC 2119, March 1997. 1187 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1188 with Session Description Protocol (SDP)", RFC 3264, 1189 June 2002. 1191 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1192 Jacobson, "RTP: A Transport Protocol for Real-Time 1193 Applications", STD 64, RFC 3550, July 2003. 1195 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1196 Video Conferences with Minimal Control", STD 65, RFC 3551, 1197 July 2003. 1199 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1200 Description Protocol", RFC 4566, July 2006. 1202 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1203 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1205 Authors' Addresses 1207 Magnus Westerlund 1208 Ericsson AB 1209 Torshamnsgatan 21-23 1210 SE-164 83 Stockholm 1211 SWEDEN 1213 Phone: +46 8 7190000 1214 Email: magnus.westerlund@ericsson.com 1216 Ingemar Johansson 1217 Ericsson AB 1218 Laboratoriegrand 11 1219 SE-971 28 Lulea 1220 SWEDEN 1222 Phone: +46 73 0783289 1223 Email: ingemar.s.johansson@ericsson.com 1225 Full Copyright Statement 1227 Copyright (C) The IETF Trust (2008). 1229 This document is subject to the rights, licenses and restrictions 1230 contained in BCP 78, and except as set forth therein, the authors 1231 retain all their rights. 1233 This document and the information contained herein are provided on an 1234 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1235 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1236 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1237 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1238 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1239 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1241 Intellectual Property 1243 The IETF takes no position regarding the validity or scope of any 1244 Intellectual Property Rights or other rights that might be claimed to 1245 pertain to the implementation or use of the technology described in 1246 this document or the extent to which any license under such rights 1247 might or might not be available; nor does it represent that it has 1248 made any independent effort to identify any such rights. Information 1249 on the procedures with respect to rights in RFC documents can be 1250 found in BCP 78 and BCP 79. 1252 Copies of IPR disclosures made to the IETF Secretariat and any 1253 assurances of licenses to be made available, or the result of an 1254 attempt made to obtain a general license or permission for the use of 1255 such proprietary rights by implementers or users of this 1256 specification can be obtained from the IETF on-line IPR repository at 1257 http://www.ietf.org/ipr. 1259 The IETF invites any interested party to bring to its attention any 1260 copyrights, patents or patent applications, or other proprietary 1261 rights that may cover technology that may be required to implement 1262 this standard. Please address the information to the IETF at 1263 ietf-ipr@ietf.org.