idnits 2.17.1 draft-ietf-payload-g7110-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 25, 2015) is 3319 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1' Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Ramalho, Ed. 3 Internet-Draft P. Jones 4 Intended status: Standards Track Cisco Systems 5 Expires: September 26, 2015 N. Harada 6 NTT 7 M. Perumal 8 Ericsson 9 L. Miao 10 Huawei Technologies 11 March 25, 2015 13 RTP Payload Format for G.711.0 14 draft-ietf-payload-g7110-05 16 Abstract 18 This document specifies the Real-Time Transport Protocol (RTP) 19 payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0 20 defines a lossless and stateless compression for G.711 packet 21 payloads typically used in IP networks. This document also defines a 22 storage mode format for G.711.0 and a media type registration for the 23 G.711.0 RTP payload format. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on September 26, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 61 3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 3 62 3.1. General Information and Use of the ITU-T G.711.0 Codec . 3 63 3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 4 64 3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 7 65 3.3.1. Multiple G.711.0 Output Frames per RTP Payload 66 Considerations . . . . . . . . . . . . . . . . . . . 8 67 4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 9 68 4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 9 69 4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 10 70 4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 11 71 4.2.2. G.711.0 RTP Payload Definition . . . . . . . . . . . 12 72 4.2.2.1. G.711.0 RTP Payload Encoding Process . . . . . . 13 73 4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 14 74 4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 16 75 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 18 76 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 18 77 5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 20 78 5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 21 79 5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 21 80 5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 21 81 5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 22 82 6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 22 83 6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 22 84 6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 23 85 6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 24 86 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 87 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 88 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 89 10. Security Considerations . . . . . . . . . . . . . . . . . . . 26 90 11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 27 91 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 92 12.1. Normative References . . . . . . . . . . . . . . . . . . 28 93 12.2. Informative References . . . . . . . . . . . . . . . . . 29 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 96 1. Introduction 98 The International Telecommunication Union (ITU-T) Recommendation 99 G.711.0 [G.711.0] specifies a stateless and lossless compression for 100 G.711 packet payloads typically used in Voice over IP (VoIP) 101 networks. This document specifies the Real-Time Transport Protocol 102 (RTP) RFC 3550 [RFC3550] payload format and storage modes for this 103 compression. 105 2. Requirements Language 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 109 document are to be interpreted as described in RFC 2119 [RFC2119]. 111 3. G.711.0 Codec Background 113 ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless 114 compression mechanism for ITU-T Recommendation G.711 [G.711] and thus 115 is not a "codec" in the sense of "lossy" codecs typically carried by 116 RTP. When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as 117 if it were a codec, with the understanding that ITU-T Rec. G.711.0 118 losslessly encoded the underlying (lossy) G.711 pulse code modulation 119 (PCM) sample representation of an audio signal. For this reason 120 ITU-T Rec. G.711.0 will be interchangeably referred to in this 121 document as a "lossless data compression algorithm" or a "codec", 122 depending on context. Within this document, individual G.711 PCM 123 samples will be referred to as "G.711 symbols" or just "symbols" for 124 brevity. 126 This section describes the ITU-T Recommendation G.711 [G.711] codec, 127 its properties, typical uses cases and its key design properties. 129 3.1. General Information and Use of the ITU-T G.711.0 Codec 131 ITU-T Recommendation G.711 is the benchmark standard for narrowband 132 telephony. It has been successful for many decades because of its 133 proven voice quality, ubiquity and utility. A new ITU-T 134 recommendation, G.711.0, has been established for defining a 135 stateless and lossless compression for G.711 packet payloads 136 typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as 137 ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is 138 effectively a pointer ITU-T Rec. G.711.0. Henceforth in this 139 document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0" 140 and ITU-T Rec. G.711 simply as "G.711". 142 G.711.0 may be employed end-to-end; in which case the RTP payload 143 format specification and use is nearly identical to the G.711 RTP 144 specification found in RFC 3551 [RFC3551]. The only significant 145 difference for G.711.0 is the required use of a dynamic payload type 146 (the static PT of 0 or 8 is presently almost always used with G.711 147 even though dynamic assignment of other payload types is allowed) and 148 the recommendation not to use Voice Activity Detection (see 149 Section 4.1). 151 G.711.0, being both lossless and stateless, may also be employed as a 152 lossless compression mechanism for G.711 payloads anywhere between 153 end systems which have negotiated use of G.711. Because the only 154 significance between the G.711 RTP payload format header and the 155 G.711.0 payload format header defined in this document is the payload 156 type, a G.711 RTP packet can be losslessly converted to a G.711.0 RTP 157 packet simply by compressing the G.711 payload (thus creating a 158 G.711.0 payload), changing the payload type to the dynamic value 159 desired and copying all the remaining G.711 RTP header fields into 160 the corresponding G.711.0 RTP header. In a similar manner, the 161 corresponding decompression of the G.711.0 RTP packet thus created 162 back to the original source G.711 RTP packet can be accomplished by 163 losslessly decompressing the G.711.0 payload back to the original 164 source G.711 payload, changing the payload type back to the payload 165 type of the original G.711 RTP packet and copying all the remaining 166 G.711.0 RTP header fields into the corresponding G.711 RTP header. 167 Negotiation specifics for this lossless G.711 payload compression for 168 RTP use case is not in scope for this document. 170 It is special to note that G.711.0, being both lossless and 171 stateless, can be employed multiple times (e.g., on multiple, 172 individual hops or series of hops) of a given flow with no 173 degradation of quality relative to end-to-end G.711. Stated another 174 way, multiple "lossless transcodes" from/to G.711.0/G.711 do not 175 affect voice quality as typically occurs with lossy transcodes to/ 176 from dissimilar codecs. 178 Lastly, it is expected that G.711.0 will be used as an archival 179 format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode 180 Format is also included in this document. 182 3.2. Key Properties of G.711.0 Design 184 The fundamental design of G.711.0 resulted from the desire to 185 losslessly encode and compress frames of G.711 symbols independent of 186 what types of signals those G.711 frames contained. The primary 187 G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals 188 (such as speech and music). 190 G.711.0 attributes are below: 192 A1 Compression for zero-mean acoustic signals: G.711.0 was designed 193 as its primary use case for the compression of G.711 payloads 194 that contained "speech" or other zero-mean acoustic signals. 195 G.711.0 obtains greater than 50% average compression in service 196 provider environments [ICASSP]. 198 A2 Lossless for any G.711 payload: G.711.0 was designed to be 199 lossless for any valid G.711 payload - even if the payload 200 consisted of apparently random G.711 symbols (e.g., a modem or 201 FAX payload). G.711.0 could be used for "aggregate 64 kbps 202 G.711 channels" carried over IP without explicit concern if a 203 subset of these channels happened to be carrying something 204 other than voice or general audio. To the extent that a 205 particular channel carried something other than voice or 206 general audio, G.711.0 ensured that it was carried losslessly, 207 if not significantly compressed. 209 A3 Stateless: Compression of a frame of G.711 symbols was only to be 210 dependent on that frame and not on any prior frame. Although 211 greater compression is usually available by observing a longer 212 history of past G.711 symbols, it was decided that the 213 compression design would be stateless to completely eliminate 214 error propagation common in many lossy codec designs (e.g., 215 ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]). That is, 216 the decoding process need not be concerned about lost prior 217 packets because the decompression of a given G.711.0 frame is 218 not dependent on potentially lost prior G.711.0 frames. Owing 219 to this stateless property, the frames input to the G.711.0 220 encoder may be changed "on-the-fly" (a 5 ms encoding could be 221 followed by a 20 ms encoding). 223 A4 Self-describing: This property is defined as the ability to 224 determine how many source G.711 samples are contained within 225 the G.711.0 frame solely by information contained within the 226 G.711.0 frame. Generally, the number of source G.711 symbols 227 can be determined by decoding the initial octets of the 228 compressed G.711.0 frame (these octets are called "prefix 229 codes" in the standard). A G.711.0 decoder need not know how 230 many symbols are contained in the original G.711 frame (e.g., 231 parameter ptime in Session Description Protocol, SDP, 232 [RFC4566]), as it is able to decompress the G.711.0 frame 233 presented to it without signaling knowledge. 235 A5 Accommodate G.711 payload sizes typically used in IP: G.711 input 236 frames of length typically found in VoIP applications represent 237 SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms or 40 ms. Since 238 the dominant sampling frequency for G.711 is 8000 samples per 239 second, G.711.0 was designed to compress G.711 input frames of 240 40, 80, 160, 240 or 320 samples. 242 A6 Bounded expansion: Since attribute A2 above requires G.711.0 to 243 be lossless for any payload (which could consist of any 244 combination of octets with each octet spanning the entire space 245 of 2^8 values), by definition there exists at least one 246 potential G.711 payload which must be "uncompressible". Since 247 the quantum of compression is an octet, the minimum expansion 248 of such an uncompressible payload was designed to be the 249 minimum possible of one octet. Thus G.711.0 "compressed" 250 frames can be of length one octet to X+1 octets, where X is the 251 size of the input G.711 frame in octets. G.711.0 can therefore 252 be viewed as a Variable Bit Rate (VBR) encoding in which the 253 size of the G.711.0 output frame is a function of the G.711 254 symbols input to it. 256 A7 Algorithmic delay: G.711.0 was designed to have the algorithmic 257 delay equal to the time represented by the number of samples in 258 the G.711 input frame (i.e., no "look-ahead"). 260 A8 Low Complexity: Less than 1.0 Weighted Million Operations Per 261 Second (WMOPS) average and low memory footprint (~5k octets 262 RAM, ~5.7k octets ROM and ~3.6 basic operations) [ICASSP] 263 [G.711.0]. 265 A9 Both A-law and mu-law supported: G.711 has two operating laws, 266 A-law and mu-law. These two laws are also known as PCMA and 267 PCMU in RTP applications RFC 3551 [RFC3551]. 269 These attributes generally make it trivial to compress a G.711 input 270 frame consisting of 40, 80, 160, 240 or 320 samples. After the input 271 frame is presented to a G.711.0 encoder, a G.711.0 "self-describing" 272 output frame is produced. The number of samples contained within 273 this frame is easily determined at the G.711.0 decoder by virtue of 274 attribute A4. The G.711.0 decoder can decode the G.711.0 frame back 275 to a G.711 frame by using only data within the G.711.0 frame. 277 Lastly we note that losing a G.711.0 encoded packet is identical in 278 effect of losing a G.711 packet (when using RTP); this is because a 279 G.711.0 payload, like the corresponding G.711 payload, is stateless. 280 Thus, it is anticipated that existing G.711 PLC mechanisms will be 281 employed when a G.711.0 packet is lost and an identical MOS 282 degradation relative to G.711 loss will be achieved. 284 3.3. G.711 Input Frames to G.711.0 Output Frames 286 G.711.0 is a lossless and stateless compression of G.711 frames. The 287 following figure depicts this where "A" is the process of G.711.0 288 encoding and "B" is the process of G.711.0 decoding. 290 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame 292 |--------------------------| A |------------------------------| 293 | G.711 Input Frame |----->| G.711.0 Output Frame | 294 | of X Octets | | containing 1 to X+1 Octets | 295 | (where X MUST be 40, 80, | | (precise value dependent on | 296 | 160, 240 or 320 octets) |<-----| G.711.0 ability to compress) | 297 |__________________________| B |______________________________| 299 Figure 1 301 Note that the mapping is 1:1 (lossless) in both directions, subject 302 to two constraints. The first constraint is that the input frame 303 provided to the G.711.0 encoder (process "A") has a specific number 304 of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240 305 or 320 octets). The second constraint is that the companding law 306 used to create the G.711 input frame (A-law or mu-law) must be known, 307 consistent with attribute A9. 309 Subject to these two constraints, the input G.711 frame is processed 310 by the G.711.0 encoder ("process A") and produces a "self-describing" 311 G.711.0 output frame, consistent with attribute A4. Depending on the 312 source G.711 symbols, the G.711.0 output frame can contain anywhere 313 from 1 to X+1 octets, where X is the number of input G.711 symbols. 314 Compression results for virtually every zero-mean acoustic signal 315 encoded by G.711.0. 317 Since the G.711.0 output frame is "self-describing", a G.711.0 318 decoder (process "B") can losslessly reproduce the original G.711 319 input frame with only the knowledge of which companding law was used 320 (A-law or mu-law). The first octet of a G.711.0 frame is called the 321 "Prefix Code" octet; the information within this octet conveys how 322 many G.711 symbols the decoder is to create from a given G.711.0 323 input frame (i.e., 0, 40, 80, 160, 240 or 320). The Prefix Code 324 value of 0x00 is used to denote zero G.711 source symbols, which 325 allows the use of 0x00 as a payload padding octet (to be described 326 later in Section 3.3.1). 328 Since G.711.0 was designed with typical G.711 payload lengths as a 329 design constraint (attribute A5), this lossless encoding can be 330 performed only with knowledge of the companding law being used. This 331 information is anticipated to be signaled in SDP and will be 332 described later in this document. 334 If the original inputs were known to be from a zero-mean acoustic 335 signal coded by G.711, an intelligent G.711.0 encoder could infer the 336 G.711 companding law in use (via G.711 input signal amplitude 337 histogram statistics). Likewise, an intelligent G.711.0 decoder 338 producing G.711 from the G.711.0 frames could also infer which 339 encoding law in use. Thus G.711.0 could be designed for use in 340 applications that have limited stream signaling between the G.711 341 endpoints (i.e., they only know "G.711 at 8k sampling is being used", 342 but nothing more). Such usage is not further described in this 343 document. Additionally, if the original inputs were known to come 344 from zero-mean acoustic signals, an intelligent G.711.0 encoder could 345 tell if the G.711.0 payload had been encrypted - as the symbols would 346 not have the distribution expected in either companding law and would 347 appear random. Such determination is also not further discussed in 348 this document. 350 It is easily seen that this process is 1:1 and that G.711.0 based 351 lossless compression can be employed multiple times, as the original 352 G.711 input symbols are always reproduced with 100% fidelity. 354 3.3.1. Multiple G.711.0 Output Frames per RTP Payload Considerations 356 As a general rule, G.711.0 frames containing more source G.711 357 symbols (from a given channel) will typically result in higher 358 compression, but there are exceptions to this rule. A G.711.0 359 encoder may choose to encode 20 ms of input G.711 symbols as: 1) a 360 single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3) 361 any other combination of 5 ms or 10 ms G.711.0 frames - depending on 362 which encoding resulted in fewer bits. As an example, an intelligent 363 encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0 364 frames if the first 10 ms was "silence" and two G.711.0 frames took 365 fewer bits than any other possible encoding combination of G.711.0 366 frame sizes. 368 During the process of G.711.0 standardization it was recognized that 369 although it is sometimes advantageous to encode integer multiples of 370 40 G.711 symbols in whatever input symbol format resulted in the most 371 compression (as per above), the simplest choice is to encode the 372 entire ptime's worth of input G.711 symbols into one G.711.0 frame 373 (if the ptime supported it). This is especially so since the larger 374 number of source G.711 symbols typically resulted in the highest 375 compression anyway and there is added complexity in searching for 376 other possibilities (involving more G.711.0 frames) which were 377 unlikely to produce a more bit efficient result. 379 The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of 380 multiple G.711.0 input frames in that the decoder was defined to 381 decode what it refers to as an incoming "bit stream". For this 382 specification, the bit stream is the G.711.0 RTP payload itself. 383 Thus, the decoder will take the G.711.0 RTP payload and will produce 384 an output frame containing the original G.711 symbols independent of 385 how many G.711.0 frames were present in it. Additionally, any number 386 of 0x00 padding octets placed between the G.711.0 frames will be 387 silently (and safely) ignored by the G.711.0 decoding process 388 Section 4.2.3). 390 To recap, a G.711.0 encoder may choose to encode incoming G.711 391 symbols into one or more than one G.711.0 frames and put the 392 resultant frame(s) into the G.711.0 RTP payload. Zero or more 0x00 393 padding octets may also be included in the G.711.0 RTP payload. The 394 G.711.0 decoder, being insensitive to the number of G.711.0 encoded 395 frames that are contained within it, will decode the G.711.0 RTP 396 payload into the source G.711 symbols. Although examples of single 397 or multiple G.711 frame cases will be illustrated in Section 4.2, the 398 multiple G.711.0 frame cases MUST be supported and there is no need 399 for negotiation (SDP or otherwise) required for it. 401 4. RTP Header and Payload 403 In this section we describe the precise format for G.711.0 frames 404 carried via RTP. We begin with RTP header description relative to 405 G.711, then provide two G.711.0 payload examples. 407 4.1. G.711.0 RTP Header 409 Relative to G.711 RTP headers, the utilization of G.711.0 does not 410 create any special requirements with respect to the contents of the 411 RTP packet header. The only significant difference is that the 412 payload type (PT) RTP header field MUST have a value corresponding to 413 the dynamic payload type assigned to the flow. This is in contrast 414 to most current uses of G.711 which typically use the static payload 415 assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though 416 the negotiation and use of dynamic payload types is allowed for 417 G.711. With the exception of rare PT exhaustion cases, the existing 418 G.711 PT values of 0 and 8 MUST NOT be used for G.711.0 (helping to 419 avoid possible payload confusion with G.711 payloads). 421 Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is 422 negotiated because G.711.0 obtains high compression during "VAD 423 silence intervals" and one of the advantages of G.711.0 over G.711 424 with VAD is the lack of any VAD-inducing artifacts in the received 425 signal. However, if VAD is employed, the Marker bit (M) MUST be set 426 in the first packet of a talkspurt (the first packet after a silence 427 period in which packets have not been transmitted contiguously as per 428 rules specified in [RFC3551] for G.711 payloads). This definition, 429 being consistent with the G.711 RTP VAD use, further allows lossless 430 transcoding between G.711 RTP packets and G.711.0 RTP packets as 431 described in Section 3.1. 433 With this introduction, the RTP packet header fields are defined as 434 follows: 436 V - As per [RFC3550] 438 P - As per [RFC3550] 440 X - As per [RFC3550] 442 CC - As per [RFC3550] 444 M - As per [RFC3550] and [RFC3551] 446 PT - The assignment of an RTP payload type for the format defined 447 in this memo is outside the scope of this document. The RTP 448 profiles in use currently mandate binding the payload type 449 dynamically for this payload format. 451 SN - As per [RFC3550] 453 timestamp - As per [RFC3550] 455 SSRC - As per [RFC3550] 457 CSRC - As per [RFC3550] 459 Where V (version bits), P (padding bit), X (extension bit), CC (CSRC 460 count), M (marker bit), PT (payload type), SN (sequence number), 461 timestamp, SSRC (synchronizing source) and CSRC (contributing 462 sources) are as defined in [RFC3550] and as typically used with 463 G.711. PT (payload type) is as defined in [RFC3551]. 465 4.2. G.711.0 RTP Payload 467 This section defines the G.711.0 RTP payload and illustrates it by 468 means of two examples. 470 The first example, in Section 4.2.1, depicts the case when it is 471 desired to carry only one G.711.0 frame in the RTP payload. This 472 case is expected to be the dominant use case and is shown separately 473 for the purposes of clarity. 475 The second example, in Section 4.2.2, depicts the general case when 476 it is desired to carry one or more G.711.0 frames in the RTP payload. 477 This is the actual definition of the G.711.0 RTP payload. 479 4.2.1. Single G.711.0 Frame per RTP Payload Example 481 This example depicts a single G.711.0 frame in the RTP payload. This 482 is expected to be the dominant RTP payload case for G.711.0, as the 483 G.711.0 encoding process supports the SDP packet times (ptime and 484 maxptime, see [RFC4566]) commonly used when G.711 is transported in 485 RTP. Additionally, as mentioned previously, larger G.711.0 frames 486 generally compress more effectively than a multiplicity of smaller 487 G.711.0 frames. 489 The following Figure illustrates the single G.711.0 frame per RTP 490 payload case. 492 Single G.711.0 Frame in RTP Payload Case 494 |-------------------|-------------------| 495 | One G.711.0 Frame | Zero or more 0x00 | 496 | | Padding Octets | 497 |___________________|___________________| 499 Figure 2 501 Encoding Process: A single G.711.0 frame is inserted into the RTP 502 payload. The amount of time represented by the G.711 symbols 503 compressed in the G.711.0 frame MUST correspond to the ptime signaled 504 for applications using SDP. Although generally not desired, padding 505 desired in the RTP payload after the G.711.0 frame MAY be created by 506 placing one or more 0x00 octets after the G.711.0 frame. Such 507 padding may be desired based on security considerations (see 508 Section 10). 510 Decoding Process: Passing the entire RTP payload to the G.711.0 511 decoder is sufficient for the G.711.0 decoder to create the source 512 G.711 symbols. Any padding inserted after the G.711.0 frame (i.e., 513 the 0x00 octets) present in the RTP payload is silently ignored by 514 the G.711.0 decoding process. The decoding process is fully 515 described in Section 4.2.3 below. 517 4.2.2. G.711.0 RTP Payload Definition 519 This section defines the G.711.0 RTP payload and illustrates the case 520 of when one or more G.711.0 frames are to be placed in the payload. 521 All G.711.0 RTP decoders MUST support the general case described in 522 this section (rationale presented previously in Section 3.3.1). 524 Note that since each G.711.0 frame is self-describing (see Attribute 525 A4 in Section 3.2), the individual G.711.0 frames in the RTP payload 526 need not represent the same duration of time (i.e., a 5 ms G.711.0 527 frame could be followed by a 20 ms G.711.0 frame). Owing to this, 528 the amount of time represented in the RTP payload MAY be any integer 529 multiple of 5 ms (as 5 ms is the smallest interval of time that can 530 be represented in a G.711.0 frame). 532 The following Figure illustrates the one or more G.711.0 frames per 533 RTP payload case where the number of G.711.0 frames placed in the RTP 534 payload is N. We note that when N is equal to 1 that this case is 535 identical to the previous example. 537 One or More G.711.0 Frames in RTP Payload Case 539 |----------|---------|----------|---------|----------------| 540 | First | Second | | Nth | Zero or more | 541 | G.711.0 | G.711.0 | ... | G.711.0 | 0x00 | 542 | Frame | Frame | | Frame | Padding Octets | 543 |__________|_________|__________|_________|________________| 545 Figure 3 547 We note here that when we have multiple G.711.0 frames that the 548 individual frames can be, and generally are, of different lengths. 549 The decoding process described in Section 4.2.3 is used to determine 550 the frame boundaries. 552 Encoding Process: One or more G.711.0 frames are placed in the RTP 553 payload simply by concatenating the G.711.0 frames together. The 554 amount of time represented by the G.711 symbols compressed in all the 555 G.711.0 frames in the RTP payload MUST correspond to the ptime 556 signaled for applications using SDP. Although not generally desired, 557 padding in the RTP payload SHOULD be placed after the last G.711.0 558 frame in the payload and MAY be created by placing one or more 0x00 559 octets after the last G.711.0 frame. Such padding may be desired 560 based on security considerations (see Section 10). Additional 561 encoding process details and considerations are specified later in 562 Section 4.2.2.1. 564 Decoding Process: As G.711.0 frames can be of varying length, the 565 payload decoding process described in Section 4.2.3 is used to 566 determine where the individual G.711.0 frame boundaries are. Any 567 padding octets inserted before or after any G.711.0 frame in the RTP 568 payload is silently (and safely) ignored by the G.711.0 decoding 569 process specified in Section 4.2.3. 571 4.2.2.1. G.711.0 RTP Payload Encoding Process 573 ITU-T G.711.0 supports five possible input frame lengths: 40, 80, 574 160, 240, and 320 samples per frame and the rationale for choosing 575 those lengths was given in the description of property A5 in 576 Section 3.2. Assuming 8000 sample per second, these lengths 577 correspond to input frames representing 5 ms, 10 ms, 20 ms, 30 ms or 578 40 ms. So while the standard assumed the input "bit stream" 579 consisted of G.711 symbols of some integer multiple of 5 ms in 580 length, it did not specify exactly what frame lengths to use as input 581 to the G.711.0 encoder itself. The intent of this section is to 582 provide some guidance for the selection. 584 Consider a typical IETF use case of 20 ms (160 octets) of G.711 input 585 samples represented in a G.711.0 payload and signaled by using the 586 SDP parameter ptime. As described in Section 3.3.1, the simplest way 587 to encode these 160 octets is to pass the entire 160 octet to the 588 G.711.0 encoder, resulting in precisely one G.711.0 compressed frame, 589 and put that singular frame into the G.711.0 RTP payload. However, 590 neither the ITU-T G.711.0 standard nor this IETF payload format 591 mandates this. In fact 20 ms of input G.711 symbols can be encoded 592 as 1, 2, 3 or 4 G.711.0 frames in any one of six combinations (i.e., 593 {20ms}, {10ms:10ms}, {10ms:5ms:5ms}, {5ms:10ms:5ms}, {5ms:5ms:10ms}, 594 {5ms:5ms:5ms:5ms}) and any of these combinations would decompress 595 into the same source 160 G.711 octets. As an aside, we note that the 596 first octet of any G.711.0 frame will be the prefix code octet and 597 information in this octet determines how many G.711 symbols are 598 represented in the G.711.0 frame. 600 Notwithstanding the above, we expect one of two encodings to be used 601 by implementers: the simplest possible (one 160 byte input to the 602 G.711.0 encoder which usually results in the highest compression) or 603 the combination of possible input frames to a G.711.0 encoder that 604 resulted in the highest compression for the payload. The explicit 605 mention of this issue in this IETF document was deemed important 606 because the ITU-T G.711.0 standard is silent on this issue and there 607 is a desire for this issue to be documented in a formal Standards 608 Developing Organization (SDO) document (i.e., here). 610 4.2.3. G.711.0 RTP Payload Decoding Process 612 The G.711.0 decoding process is a standard part of G.711.0 bit stream 613 decoding and is implemented in the ITU-T Rec. G.711.0 reference code. 614 The decoding process algorithm described in this section is a slight 615 enhancement of the ITU-T reference code to explicitly accommodate RTP 616 padding (as described above). 618 Before describing the decoding, we note here that the largest 619 possible G.711.0 frame is created whenever the largest number of 620 G.711 symbols is encoded (320 from Section 3.2, property A5) and 621 these 320 symbols are "uncompressible" by the G.711.0 encoder. In 622 this case (via property A6 in Section 3.2) the G.711.0 output frame 623 will be 321 octets long. We also note that the value 0x00 chosen for 624 the optional padding cannot be the first octet of a valid ITU-T Rec. 625 G.711.0 frame (see [G.711.0]). We also note that whenever more than 626 one G.711.0 frame is contained in the RTP payload, the decoding of 627 the individual G.711.0 frames will occur multiple times. 629 For the decoding algorithm below, let N be the number of octets in 630 the RTP payload (i.e., excluding any RTP padding, but including any 631 RTP payload padding), let P equal the number of RTP payload octets 632 processed by the G.711.0 decoding process, let K be the number of 633 G.711 symbols presently in the output buffer, let Q be the number of 634 octets contained in the G.711.0 frame being processed and let "!=" 635 represent not equal to. The keyword "STOP" is used below to indicate 636 the end of the processing of G.711.0 frames in the RTP payload. The 637 algorithm below assumes an output buffer for the decoded G.711 source 638 symbols of length sufficient to accommodate the expected number of 639 G.711 symbols and an input buffer of length 321 octets. 641 G.711.0 RTP Payload Decoding Heuristic: 643 H1 Initialization of counters: Initialize P, the number of processed 644 octets counter, to zero. Initialize K, the counter for how 645 many G.711 symbols are in the output buffer, to zero. 646 Initialize N to the number of octets in the RTP payload 647 (including any RTP payload padding). Go to H2. 649 H2 Read internal buffer: Read min{320+1, (N-P)-1} octets into the 650 internal buffer from the (P+1) octet of the RTP payload. We 651 note at this point, N-P octets have yet to be processed and 652 that 320+1 octets is the largest possible G.711.0 frame. Also 653 note that in the common case of zero-based array indexing of a 654 uint8 array of octets, that this operation will read octets 655 from index P through index [min{320+1, (N-P)}] from the RTP 656 payload. Go to H3. 658 H3 Analyze the first octet in the internal buffer: If this octet 659 0x00 (a padding octet) go to H4, otherwise go to H5 (process a 660 G.711.0 frame). 662 H4 Process padding octet (no G.711 symbols generated): Increment the 663 processed packets counter by one (set P = P + 1). If the 664 result of this increment results in P >= N then STOP (as all 665 RTP Payload octets have been processed), otherwise go to H2. 667 H5 Process an individual G.711.0 frame (produce G.711 samples in the 668 output frame): Pass the internal buffer to the G.711.0 decoder. 669 The G.711.0 decoder will read the first octet (called the 670 "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to 671 determine the number of source G.711 samples M are contained in 672 this G.711.0 frame. The G.711.0 decoder will produce exactly M 673 G.711 source symbols (M can only have values of 0, 40, 80, 160, 674 240 or 320). If K = 0, these M symbols will be the first in 675 the output buffer and are placed at the beginning of the output 676 buffer. If K != 0, concatenate these M symbols with the prior 677 symbols in the output buffer (there are K prior symbols in the 678 buffer). Set K = K + M (as there are now this many G.711 679 source symbols in the output buffer). The G.711.0 decoder will 680 have consumed some number of octets, Q, in the internal buffer 681 to produce the M G.711 symbols. Increment the number of 682 payload octet processed counter by this quantity (set P = P + 683 Q). If the result of this increment results in P >= N then 684 STOP (as all RTP Payload octets have been processed), otherwise 685 go to H2. 687 At this point, the output buffer will contain precisely K G.711 688 source symbols which should correspond to the ptime signaled if SDP 689 was used and the encoding process was without error. If ptime was 690 signaled via SDP and the number of G.711 symbols in the output buffer 691 is other than what corresponds to ptime, the packet MUST be discarded 692 unless other system design knowledge allows for otherwise (e.g., 693 occasional 5 ms clock slips causing one more or one less G.711.0 694 frame than nominal to be in the payload). Lastly, due to the buffer 695 reads in H2 being bounded (to 321 octets or less), N being bounded to 696 the size of the G.711.0 RTP payload, and M being bounded to the 697 number of source G.711 symbols, there is no buffer overrun risk. 699 We also note, as an aside, that the algorithm above (and the ITU-T 700 G.711.0 reference code) accommodates padding octets (0x00) placed 701 anywhere between G.711.0 frames in the RTP payload as well as prior 702 to or after any or all G.711.0 frames. The ITU-T G.711.0 reference 703 code does not have Step H3 and H4 as separate steps (i.e., Step H5 704 immediately follows H2) at the added computational cost of some 705 additional buffer passing to/from the G.711.0 frame decoder 706 functions. That is the G.711.0 decoder in the reference code 707 "silently ignores" 0x00 padding octets at the beginning of what it 708 believes to be a G.711.0 encoded frame boundary. Thus Step H3 and 709 Step H4 above are an optimization over the reference code shown for 710 clarity. 712 If the decoder is at a playout endpoint location, this G.711 buffer 713 SHOULD be used in the same manner as a received G.711 RTP payload 714 would have been used (passed to a playout buffer, to a PLC 715 implementation, etc.). 717 4.2.4. G.711.0 RTP Payload for Multiple Channels 719 In this section we describe the use of multiple "channels" of G.711 720 data encoded by G.711.0 compression. 722 The dominant use of G.711 in RTP transport has been for single 723 channel use cases. For this case, the above G.711.0 encoding and 724 decoding process is used. However, the multiple channel case for 725 G.711.0 (a frame-based compression) is different from G.711 (a 726 sample-based encoding) and is described separately here. 728 RFC 3551 [RFC3551] provides guidelines for encoding audio channels 729 (Section 4) and for the ordering of the channels within the RTP 730 payload (Section 4.1). The ordering guidelines in RFC 3551, 731 Section 4.1 SHOULD be used unless an application-specific channel 732 ordering is more appropriate. 734 An implicit assumption in RFC 3551 is that all the channel data 735 multiplexed into a RTP payload MUST represent the same physical time 736 span. The case for G.711.0 is no different; the underlying G.711 737 data for all channels in a G.711.0 RTP payload MUST span the same 738 interval in time (e.g., the same "ptime" for a SDP-specified codec 739 negotiation). 741 RFC 3551 provides guidelines for sample-based encodings such as G.711 742 in Section 4.2. This guidance is tantamount to interleaving the 743 individual samples in that they SHOULD be packed in consecutive 744 octets. 746 RFC 3551 provides guidelines for frame-based encodings in which the 747 frames are interleaved. However, this guidance stems from the 748 assumption that "the frame size for frame-oriented codecs is a 749 given". However, this assumption is not valid for G.711.0 in that 750 individual consecutive G.711.0 frames (as per Section 4.2.2) can: 752 1) represent different time spans (e.g., two 5 ms G.711.0 frames 753 in lieu of one 10 ms G.711.0 frame), and 754 2) be of different lengths in octets (and typically are). 756 Therefore a different, but also simple, concatenation-based approach 757 is specified in this RFC. 759 For the multiple channel G.711.0 case, each G.711 channel is 760 independently encoded into one or more G.711.0 frames defined here as 761 a "G.711.0 channel superframe". Each one of these superframes is 762 identical to the multiple G.711.0 frame case illustrated in Figure 3 763 of Section 4.2.2 in which each superframe can have one or more 764 individual G.711.0 frames within it. Then each G.711.0 channel 765 superframe is concatenated - in channel order - into a G.711.0 RTP 766 payload. Then, if optional G.711.0 padding octets (0x00) are 767 desired, it is RECOMMENDED that these octets are placed after the 768 last G.711.0 channel superframe. As per above, such padding may be 769 desired based on security considerations (see Section 10). This is 770 depicted in the following Figure 4 below. 772 Multiple G.711.0 Channel Superframes in RTP Payload 774 |----------|---------|----------|---------|---------| 775 | First | Second | | Nth | Zero | 776 | G.711.0 | G.711.0 | ... | G.711.0 | or more | 777 | Channel | Channel | | Channel | 0x00 | 778 | Super- | Super- | | Super | Padding | 779 | Frame | Frame | | Frame | Octets | 780 |__________|_________|__________|_________|_________| 782 Figure 4 784 We note that although the individual superframes can be of different 785 lengths in octets (and usually are), that the number of G.711 source 786 symbols represented - in compressed form - in each channel superframe 787 is identical (since all the channels represent the identically same 788 time interval). 790 The G.711.0 decoder at the receiving end simply decodes the entire 791 G.711.0 (multiple channel) payload into individual G.711 symbols. If 792 M such G.711 symbols result and there were N channels, then the first 793 M/N G.711 samples would be from the first channel, the second M/N 794 G.711 samples would be from the second channel, and so on until the 795 Nth set of G.711 samples are found. Similarly, if the number of 796 channels was not known, but the payload "ptime" was known, one could 797 infer (knowing the sampling rate) how many G.711 symbols each channel 798 contained; then with this knowledge determine how many channels of 799 data were contained in the payload. When SDP is used, the number of 800 channels is known because the optional parameter is a MUST when there 801 is more than one channel negotiated (see Section 5.1). Additionally, 802 when SDP is used the parameter ptime is a RECOMMENDED optional 803 parameter. We note that if both parameters channels and ptime are 804 known that one could provide a check for the other and the converse. 805 Whichever algorithm is used to determine the number of channels, if 806 the length of the source G.711 symbols in the payload (M) is not an 807 integer multiple of the number of channels (N), then the packet 808 SHOULD be discarded. 810 Lastly we note that although any padding for the multiple channel 811 G.711.0 payload is RECOMMENDED to be placed at the end of the 812 payload, the G.711.0 decoding algorithm described in Section 4.2.3 813 will successfully decode the payload in Figure 4 if the 0x00 padding 814 octet is placed anywhere before or after any individual G.711.0 frame 815 in the RTP payload. The number of padding octets introduced at any 816 G.711.0 frame boundary therefore does not affect the number M of the 817 source G.711 symbols produced. Thus the decision for padding MAY be 818 made on a per-superframe basis. 820 5. Payload Format Parameters 822 This section defines the parameters that may be used to configure 823 optional features in the G.711.0 RTP transmission. 825 The parameters defined here are a part of the media subtype 826 registration for the G.711.0 codec. Mapping of the parameters into 827 Session Description Protocol (SDP) RFC 4566 [RFC4566] is also 828 provided for those applications that use SDP. 830 5.1. Media Type Registration 832 Type name: audio 834 Subtype name: G711-0 836 Required parameters: 838 clock rate: The RTP timestamp clock rate, which is equal to the 839 sampling rate. The typical rate used with G.711 encoding is 8000, 840 but other rates may be specified. The default rate is 8000. 842 complaw: This format specific parameter, specified on the "a=fmtp: 843 line", indicates the companding law (A-law or mu-law) employed. 844 This format specific parameter, as per RFC 4566 [RFC4566], is 845 given unchanged to the media tool using this format. The case- 846 insensitive values are "complaw=al" or "complaw=mu" are used for 847 A-law and mu-law, respectively. 849 Optional parameters: 851 channels: See RFC 4566 [RFC4566] for definition. Specifies how 852 many audio streams are represented in the G.711.0 payload and MUST 853 be present if the number of channels is greater than one. This 854 parameter defaults to 1 if not present (as per RFC 4566) and is 855 typically a non-zero small-valued positive integer. It is 856 expected that implementations that specify multiple channels will 857 also define a mechanism to map the channels appropriately within 858 their system design, otherwise the channel order specified in RFC 859 3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right, 860 center, ... ). Similar to the usual interpretation in RFC 3551 861 [RFC3551], the number of channels SHALL be a non-zero positive 862 integer. 864 maxptime: See RFC 4566 [RFC4566] for definition. 866 ptime: See RFC 4566 [RFC4566] for definition. The inclusion of 867 "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an 868 application specific reason not to include it (e.g., an 869 application that has a variable ptime on a packet-by-packet 870 basis). For constant ptime applications, it is considered good 871 form to include "ptime" in the SDP for session diagnostic 872 purposes. For the constant ptime multiple channel case described 873 in Section 4.2.2, the inclusion of "ptime" can provide a desirable 874 payload check. 876 Encoding considerations: 878 This media type is framed binary data (see Section 4.8 in RFC 6838 879 [RFC6838]) compressed as per ITU-T Rec. G.711.0. 881 Security considerations: 883 See Section 10. 885 Interoperability considerations: none 887 Published specification: 889 ITU-T Rec. G.711.0 and RFC XXXX. 891 [ RFC Editor: please replace XXXXX with a reference to this RFC ] 893 Applications that use this media type: 895 Although initially conceived for VoIP, the use of G.711.0, like 896 G.711 before it, may find use within audio and video streaming 897 and/or conferencing applications for the audio portion of those 898 applications. 900 Additional information: 902 The following applies to stored-file transfer methods: 904 Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law 905 encodings respectively, see Section 6). 907 File Extensions: None 909 Macintosh file type code: None 911 Object identifier or OIL: None 913 Person & email address to contact for further information: 915 Michael A. Ramalho or 917 Intended usage: COMMON 919 Restrictions on usage: 921 This media type depends on RTP framing, and hence is only defined 922 for transfer via RTP [RFC3550]. Transport within other framing 923 protocols is not defined at this time. 925 Author: Michael A. Ramalho 927 Change controller: 929 IETF Payload working group delegated from the IESG. 931 5.2. Mapping to SDP Parameters 933 The information carried in the media type specification has a 934 specific mapping to fields in the Session Description Protocol (SDP), 935 which is commonly used to describe a RTP session. When SDP is used 936 to specify sessions employing G.711.0, the mapping is as follows: 938 o The media type ("audio") goes in SDP "m=" as the media name. 940 o The media subtype ("G711-0") goes in SDP "a=rtpmap" as the 941 encoding name. 943 o The required parameter "rate" also goes in "a=rtpmap" as the clock 944 rate. 946 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 947 "a=maxptime" attributes, respectively. 949 o Remaining parameters go in the SDP "a=fmtp" attribute by copying 950 them directly from the media type string as a semicolon-separated 951 list of parameter=value pairs. 953 5.3. Offer/Answer Considerations 955 The following considerations apply when using the SDP offer/answer 956 RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute. 958 o If the offering endpoint specifies a value for the optional 959 channels parameter greater than one and the answering endpoint 960 both understands the parameter and cannot support that value 961 requested, the answer MUST contain the optional channels parameter 962 with the highest value it can support. 964 o If the offering endpoint specifies a value for the optional 965 channels parameter the answer MUST contain the optional channels 966 parameter unless the only value the answering endpoint can support 967 is one, in which case the answer MAY contain the optional channels 968 parameter with value of 1. 970 o If the offering endpoint specifies a value for the ptime parameter 971 that the answering endpoint cannot support, the answer MUST 972 contain the optional ptime parameter. 974 o If the offering endpoint specifies a value for the maxptime 975 parameter that the answering endpoint cannot support, the answer 976 MUST contain the optional maxptime parameter. 978 5.4. SDP Examples 980 The following examples illustrate how to signal G.711.0 via SDP. 982 5.4.1. SDP Example 1 984 m=audio RTP/AVP 98 985 a=rtpmap:98 G711-0/8000 986 a=fmtp:98 complaw=mu 988 In the above example the dynamic payload type 98 is mapped to G.711.0 989 via the "a=rtpmap" parameter. The mandatory "complaw" is on the 990 "a=fmtp" parameter line. Note that neither optional parameters 991 "ptime" nor "channels" is present; although it is generally good form 992 to include "ptime" in the SDP if the session is a constant ptime 993 session for diagnostic purposes. 995 5.4.2. SDP Example 2 997 The following example illustrates an offering endpoint requesting 2 998 channels, but the answering endpoint can only support (or render) one 999 channel. 1001 Offer: 1003 m=audio RTP/AVP 98 1004 a=rtpmap:98 G711-0/8000/2 1005 a=ptime:20 1006 a=fmtp:98 complaw=al 1008 Answer: 1010 m=audio RTP/AVP 98 1011 a=rtpmap: 98 G711-0/8000/1 1012 a=ptime: 20 1013 a=fmtp:98 complaw=al 1015 In this example the offer had an optional channels parameter. The 1016 answer must have the optional channels parameter also unless the 1017 value in the answer is one. Shown here is when the answer explicitly 1018 contains the channels parameter (it need not have and it would be 1019 interpreted as one channel). As mentioned previously, it is 1020 considered good form to include "ptime" in the SDP for session 1021 diagnostic purposes if the session is a constant ptime session. 1023 6. G.711.0 Storage Mode Conventions and Definition 1025 The G.711.0 storage mode definition in this section is similar to 1026 many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a 1027 concatenation of individual G.711.0 frames. 1029 We note that something must be stored for any G.711.0 frames that are 1030 not received at the receiving endpoint, no matter what the cause. In 1031 this section we describe two mechanisms, a "G.711.0 PLC Frame" and a 1032 "G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure 1033 Frames are described prior to the G.711.0 storage mode definition for 1034 clarity. 1036 6.1. G.711.0 PLC Frame 1038 When G.711 RTP payloads not received by a rendering endpoint a Packet 1039 Loss Concealment (PLC) mechanism is typically employed to "fill in" 1040 the missing G.711 symbols with something that is auditorially 1041 pleasing and thus the loss may be not noticed by a listener. Such a 1042 PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1 1043 [G.711-AP1]. 1045 An natural extension when creating G.711.0 frames for storage 1046 environments is to employ such a PLC mechanism to create G.711 1047 symbols for the span of time in which G.711.0 payloads were not 1048 received - and then to compress the resulting "G.711 PLC symbols" via 1049 G.711.0 compression. The G.711.0 frame(s) created by such a process 1050 are called "G.711.0 PLC Frames". 1052 Since PLC mechanisms are designed to render missing audio data with 1053 the best fidelity and intelligibility, G.711.0 frames created via 1054 such processing is likely best for most recording situations (such as 1055 voicemail storage) unless there is a requirement not to fabricate 1056 (audio) data not actually received. 1058 After such PLC G.711 symbols have been generated and then encoded by 1059 a G.711.0 encoder, the resulting frames may be stored in G.711.0 1060 frame format. As a result, there is nothing to specify here - the 1061 G.711.0 PLC Frames are stored as if they were received by the 1062 receiving endpoint. In other words, PLC-generated G.711.0 frames 1063 appear as "normal" or "ordinary" G.711.0 frames in the storage mode 1064 file. 1066 6.2. G.711.0 Erasure Frame 1068 "Erasure Frames", or equivalently "Null Frames", have been designed 1069 for many frame-based codecs since G.711 was standardized. These 1070 null/erasure frames explicitly represent data from incoming audio 1071 that were either not received by the receiving system or represent 1072 data that a transmitting system decided not to send. Transmitting 1073 systems may choose not to send data for a variety of reasons (e.g., 1074 not enough wireless link capacity in radio-based systems) and can 1075 choose to send a "null frame" in lieu of the actual audio. It is 1076 also envisioned that erasure frames would be used in storage mode 1077 applications for specific archival purposes where there is a 1078 requirement not to fabricate audio data that was not actually 1079 received. 1081 Thus, a G.711.0 erasure frame is a representation of the amount of 1082 time in G.711.0 frames that were not received or not encoded by the 1083 transmitting system. 1085 Prior to defining a G.711.0 erasure frame it is beneficial to note 1086 what many G.711 RTP systems send when the endpoint is "muted". When 1087 muted, many of these systems will send an entire G.711 payload of 1088 either 0+ or 0- (i.e., one of the two levels closest to "analog zero" 1089 in either G.711 companding law). Next we note that a desirable 1090 property for a G.711.0 erasure frame is for "non G.711.0 Erasure 1091 Frame aware" endpoints to be able to playback a G.711.0 erasure frame 1092 with the existing G.711.0 ITU-T reference code. 1094 A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the 1095 corresponding G.711 sample values are either the value 0++ or the 1096 value 0-- for the entirety of the G.711.0 frame. The levels of 0++ 1097 and 0-- are defined to be the two levels above or below analog zero, 1098 respectively. An entire frame of value 0++ or 0-- is expected to be 1099 extraordinarily rare when the frame was in fact generated by a 1100 natural signal, as analog inputs such as speech and music are zero- 1101 mean and are typically acoustically coupled to digital sampling 1102 systems. Note that the playback of a G.711.0 frame characterized as 1103 an erasure frame is auditorially equivalent to a muted signal (a very 1104 low value constant). 1106 These G.711.0 erasure frames can be reasonably characterized as null 1107 or erasure frames while meeting the desired playback goal of being 1108 decoded by the G.711.0 ITU-T reference code. Thus, similarly to 1109 G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or 1110 "ordinary" G.711.0 frames in the storage mode format. 1112 6.3. G.711.0 Storage Mode Definition 1114 The storage format is used for storing G.711.0 encoded frames. The 1115 format for the G.711.0 storage mode file defined by this RFC is shown 1116 below. 1118 G.711.0 Storage Mode Format 1120 |---------------------------|----------|--------------| 1121 | Magic Number | | | 1122 | | Version | Concatenated | 1123 | "#!G7110A\n" (for A-law) | Octet | G.711.0 | 1124 | or | | Frames | 1125 | "#!G7110M\n" (for mu-law) | "0x00" | | 1126 |___________________________|__________|______________| 1128 Figure 5 1130 The storage mode file consists of a magic number and a version octet 1131 followed by the individual G.711.0 frames concatenated together. 1133 The magic number for G.711.0 A-law corresponds to the ASCII character 1134 string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41 1135 0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to 1136 the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37 1137 0x31 0x31 0x4E 0x4D 0x0A". 1139 The version number octet allows for the future specification of other 1140 G.711.0 storage mode formats. The specification of other storage 1141 mode formats may be desirable as G.711.0 frames are of variable 1142 length and a future format may include an indexing methodology that 1143 would enable playout far into a long G.711.0 recording without the 1144 necessity of decoding all the G.711.0 frames since the beginning of 1145 the recording. Other future format specification may include support 1146 for multiple channels, metadata and the like. For these reasons it 1147 was determined that a versioning strategy was desirable for the 1148 G.711.0 storage mode definition specified by this RFC. This RFC only 1149 specifies Version 0 and thus the value of "0x00" MUST be used for the 1150 storage mode defined by this RFC. 1152 The G.711.0 codec data frames, including any necessary erasure or PLC 1153 frames, are stored in consecutive order concatenated together as 1154 shown in Section 4.2.2. As the Version 0 storage mode only supports 1155 a single channel, the RTP payload format supporting multiple channels 1156 defined in Section 4.2.4 is not supported in this storage mode 1157 definition. 1159 To decode the individual G.711.0 frames, the algorithm presented in 1160 Section 4.2.2 may be used to decode the individual G.711.0 frames. 1161 If the version octet is determined not to be zero, the remainder of 1162 the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T 1163 G.711.0 reference decoder can only decode concatenated G.711.0 frames 1164 and has not been designed to decode elements in yet to be specified 1165 future storage mode formats. 1167 7. Acknowledgements 1169 There have been many people contributing to G.711.0 in the course of 1170 its development. The people listed here deserve special mention: 1171 Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke 1172 Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick 1173 Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs, 1174 Yutaka Kamamoto, and Csaba Kos. The review and oversight by the IETF 1175 Payload Working Group chairs Ali Begen and Roni Even during the 1176 development of this RFC is appreciated. Additionally, the careful 1177 review by Richard Barnes and extensive review by David Black and the 1178 rest of the IESG is likewise very much appreciated. 1180 8. Contributors 1182 The authors thank everyone who have contributed to this document. 1183 The people listed here deserve special mention: Ali Begen, Roni Even, 1184 and Hadriel Kaplan. 1186 9. IANA Considerations 1188 One media type (audio/G711-0) has been defined and requires IANA 1189 registration in the media types registry. See Section 5.1 for 1190 details. 1192 10. Security Considerations 1194 RTP packets using the payload format defined in this specification 1195 are subject to the security considerations discussed in the RTP 1196 specification [RFC3550], and in any appropriate RTP profile (for 1197 example RFC 3551 [RFC3551] or [RFC4585]). This implies that 1198 confidentiality of the media streams is achieved by encryption; for 1199 example, through the application of SRTP [RFC3711]. Because the data 1200 compression used with this payload format is applied end-to-end, any 1201 encryption needs to be performed after compression. 1203 Note that the appropriate mechanism to ensure confidentiality and 1204 integrity of RTP packets and their payloads is very dependent on the 1205 application and on the transport and signaling protocols employed. 1206 Thus, although SRTP is given as an example above, other possible 1207 choices exist. 1209 Note that end-to-end security with either authentication, integrity 1210 or confidentiality protection will prevent a network element not 1211 within the security context from performing media-aware operations 1212 other than discarding complete packets. To allow any (media-aware) 1213 intermediate network element to perform its operations, it is 1214 required to be a trusted entity which is included in the security 1215 context establishment. 1217 G.711.0 has no known denial-of-service attacks due to decoding, as 1218 data posing as a desired G711.0 payload will be decoded into 1219 something (as per the decoding algorithm) with a finite amount of 1220 computation. This is due to the decompression algorithm having a 1221 finite worst-case processing path (no infinite computational loops 1222 are possible). We also note that the data read by the G.711.0 1223 decoder is controlled by the length of the individual encoded G.711.0 1224 frame(s) contained in the RTP payload. The decoding algorithm 1225 specified in Section 4.2.3 above ensures that the G.711.0 decoder 1226 will not read beyond the length of the internal buffer specified 1227 (which is in turn specified to be no greater than the largest 1228 possible G.711.0 frame of 321 octets). Therefore a G.711.0 payload 1229 does not carry "active content" that could impose malicious side- 1230 effects upon the receiver. 1232 G.711.0 is a variable bit rate (VBR) audio codec. There have been 1233 recent concerns with VBR speech codecs where a passive observer can 1234 identify phrases from a standard speech corpus by means of the 1235 lengths produced by the encoder even when the payload is encrypted 1236 [IEEE]. In this paper, it was determined that some code excited 1237 linear prediction (CELP) codecs would produce discrete packet lengths 1238 for some phonemes. And furthermore with the use of appropriately 1239 designed Hidden Markov Models (HMMs) that such a system could predict 1240 phrases with unexpected accuracy. One CELP codec studied, SPEEX, had 1241 the property that it produced 21 different packet lengths in its 1242 wideband mode and that these packet lengths probabilistically mapped 1243 to phonemes that a HMM system could be trained on. In this paper it 1244 was determined that a mitigation technique would be to pad the output 1245 of the encoder with random padding lengths to the effect: 1) that 1246 more discrete payload sizes would result, and 2) that the 1247 probabilistic mapping to phonemes would become less clear. As G.711 1248 is not a speech model based codec, neither is G.711.0. A G.711.0 1249 encoding, during talking periods, produces frames of varying frame 1250 lengths which are not likely to have a strong mapping to phonemes. 1251 Thus G.711.0 is not expected to have this same vulnerability. It 1252 should be noted that "silence" (only one value of G.711 in the entire 1253 G.711 input frame)" or "near silence" (only a few G.711 values) is 1254 easily detectable as G.711.0 frame lengths or one or a few octets. 1255 If one desires to mitigate for silence/non-silence detection, 1256 statistically variable padding should be added to G.711.0 frames that 1257 resulted in very small G.711.0 frames (less than about 20% of the 1258 symbols of the corresponding G.711 input frame). Methods of 1259 introducing padding in the G.711.0 payloads have been provided in the 1260 G.711.0 RTP payload definition in Section 4.2.2. 1262 11. Congestion Control 1264 The G.711 codec is a Constant Bit Rate (CBR) codec which does not 1265 have a means to regulate the bitrate. The G.711.0 lossless 1266 compression algorithm typically compresses the G.711 CBR stream into 1267 a lower bandwidth VBR stream. However, being lossless, it does not 1268 possess means of further reducing the bitrate beyond the 1269 G.711.0-based compression result. The G.711.0 RTP payloads can be 1270 made arbitrarily large by means of adding optional padding bytes 1271 (subject only to MTU limitations). 1273 Therefore, there are no explicit ways to regulate the bit-rate of the 1274 transmissions outlined in this RTP Payload format except by means of 1275 modulating the number of optional padding bytes in the RTP payload. 1277 12. References 1279 12.1. Normative References 1281 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1282 Requirement Levels", BCP 14, RFC 2119, March 1997. 1284 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1285 Description Protocol", RFC 4566, July 2006. 1287 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1288 Specifications and Registration Procedures", BCP 13, RFC 1289 6838, January 2013. 1291 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1292 Jacobson, "RTP: A Transport Protocol for Real-Time 1293 Applications", STD 64, RFC 3550, July 2003. 1295 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1296 Video Conferences with Minimal Control", STD 65, RFC 3551, 1297 July 2003. 1299 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1300 "Extended RTP Profile for Real-time Transport Control 1301 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1302 2006. 1304 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1305 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1306 RFC 3711, March 2004. 1308 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1309 with Session Description Protocol (SDP)", RFC 3264, June 1310 2002. 1312 [G.711.0] ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless 1313 Compression of G.711 Pulse Code Modulation", September 1314 2009. 1316 [G.711] ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code 1317 Modulation (PCM) of Voice Frequencies", November 1988. 1319 [G.711-AP1] 1320 ITU-T G.711 Appendix 1, , "Recommendation G.711 1321 Appendix 1: A high quality low-complexity algorithm for 1322 packet loss concealment with G.711", September 1999. 1324 [G.711-A1] 1325 ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711 1326 Amendment 1 - Amendment 1: New Annex A on Lossless 1327 Encoding of PCM Frames", September 2009. 1329 12.2. Informative References 1331 [G.729] ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of 1332 speech at 8 kbit/s using conjugate-structure algebraic- 1333 code-excited linear prediction (CS-ACELP)", January 2007. 1335 [G.722] ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio- 1336 coding within 64 kbit/s", November 1988. 1338 [ICASSP] N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M. 1339 A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H. 1340 Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0 1341 - Lossless Compression of G.711 Pulse Code Modulation, 1342 International Conference on Acoustics Speech and Signal 1343 Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9", 1344 March 2010. 1346 [IEEE] C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, , 1347 and G.M. Masson, "Spot Me if You Can: Uncovering Spoken 1348 Phrases in Encrypted VoIP Conversations, IEEE Symposium on 1349 Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 1350 2008. 1352 Authors' Addresses 1354 Michael A. Ramalho (editor) 1355 Cisco Systems, Inc. 1356 6310 Watercrest Way Unit 203 1357 Lakewood Ranch, FL 34202 1358 USA 1360 Phone: +1 919 476 2038 1361 Email: mramalho@cisco.com 1363 Paul E. Jones 1364 Cisco Systems, Inc. 1365 7025 Kit Creek Rd. 1366 Research Triangle Park, NC 27709 1367 USA 1369 Phone: +1 919 476 2048 1370 Email: paulej@packetizer.com 1371 Noboru Harada 1372 NTT Communications Science Labs. 1373 3-1 Morinosato-Wakamiya 1374 Atsugi, Kanagawa 243-0198 1375 JAPAN 1377 Phone: +81 46 240 3676 1378 Email: harada.noboru@lab.ntt.co.jp 1380 Muthu Arul Mozhi Perumal 1381 Ericsson 1382 Ferns Icon 1383 Doddanekundi, Mahadevapura 1384 Bangalore, Karnataka 560037 1385 India 1387 Phone: +91 9449288768 1388 Email: muthu.arul@gmail.com 1390 Lei Miao 1391 Huawei Technologies Co. Ltd 1392 Q22-2-A15R, Enviroment Protection Park 1393 No. 156 Beiqing Road 1394 HaiDian District 1395 Beijing 100095 1396 China 1398 Phone: +86 1059728300 1399 Email: lei.miao@huawei.com