idnits 2.17.1 draft-ietf-payload-g7110-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 22, 2014) is 3534 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1' Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Ramalho, Ed. 3 Internet-Draft P. Jones 4 Intended status: Standards Track Cisco Systems 5 Expires: February 23, 2015 N. Harada 6 NTT 7 M. Perumal 8 Ericsson 9 L. Miao 10 Huawei Technologies 11 August 22, 2014 13 RTP Payload Format for G.711.0 14 draft-ietf-payload-g7110-03 16 Abstract 18 This document specifies the Real-Time Transport Protocol (RTP) 19 payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0 20 defines a lossless and stateless compression for G.711 packet 21 payloads typically used in IP networks. This document also defines a 22 storage mode format for G.711.0 and a media type registration for the 23 G.711.0 RTP payload format. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on February 23, 2015. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 61 3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 3 62 3.1. General Information and Use of the ITU-T G.711.0 Codec . 3 63 3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 4 64 3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 6 65 3.3.1. Multiple G.711.0 Output Frames per RTP Payload 66 Considerations . . . . . . . . . . . . . . . . . . . 8 67 4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 9 68 4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 9 69 4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 10 70 4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 10 71 4.2.2. G.711.0 RTP Payload Definition . . . . . . . . . . . 11 72 4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 12 73 4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 14 74 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 17 75 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 17 76 5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 19 77 5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 19 78 5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 20 79 5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 20 80 5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 20 81 6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 21 82 6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 21 83 6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 22 84 6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 23 85 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 86 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 88 10. Security Considerations . . . . . . . . . . . . . . . . . . . 24 89 11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 26 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 91 12.1. Normative References . . . . . . . . . . . . . . . . . . 26 92 12.2. Informative References . . . . . . . . . . . . . . . . . 27 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 95 1. Introduction 97 The International Telecommunication Union (ITU-T) Recommendation 98 G.711.0 [G.711.0] specifies a stateless and lossless compression for 99 G.711 packet payloads typically used in Voice over IP (VoIP) 100 networks. This document specifies the Real-Time Transport Protocol 101 (RTP) RFC 3550 [RFC3550] payload format and storage modes for this 102 compression. 104 2. Requirements Language 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in RFC 2119 [RFC2119]. 110 3. G.711.0 Codec Background 112 ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless 113 compression mechanism for ITU-T Recommendation G.711 [G.711] and thus 114 is not a "codec" in the sense of "lossy" codecs typically carried by 115 RTP. When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as 116 if it were a codec, with the understanding that ITU-T Rec. G.711.0 117 losslessly encoded the underlying (lossy) G.711 pulse code modulation 118 (PCM) sample representation of an audio signal. For this reason 119 ITU-T Rec. G.711.0 will be interchangeably referred to in this 120 document as a "lossless data compression algorithm" or a "codec", 121 depending on context. Within this document, individual G.711 PCM 122 samples will be referred to as "G.711 symbols" or just "symbols" for 123 brevity. 125 This section describes the ITU-T Recommendation G.711 [G.711] codec, 126 its properties, typical uses cases and its key design properties. 128 3.1. General Information and Use of the ITU-T G.711.0 Codec 130 ITU-T Recommendation G.711 is the benchmark standard for narrowband 131 telephony. It has been successful for many decades because of its 132 proven voice quality, ubiquity and utility. A new ITU-T 133 recommendation, G.711.0, has been established for defining a 134 stateless and lossless compression for G.711 packet payloads 135 typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as 136 ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is 137 effectively a pointer ITU-T Rec. G.711.0. Henceforth in this 138 document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0" 139 and ITU-T Rec. G.711 simply as "G.711". 141 G.711.0 may be employed end-to-end; in which case the RTP payload 142 format specification and use is nearly identical to the G.711 RTP 143 specification found in RFC 3551 [RFC3551]. The only significant 144 difference for G.711.0 is the required use of a dynamic payload type 145 (the static PT of 0 or 8 is presently almost always used with G.711 146 even though dynamic assignment of other payload types is allowed) and 147 the recommendation not to use Voice Activity Detection (see 148 Section 4.1). 150 G.711.0, being both lossless and stateless, may also be employed as a 151 lossless compression mechanism anywhere between end systems which 152 have negotiated use of G.711. Because the only significance between 153 the G.711 RTP payload format header and the G.711.0 payload format 154 header is the payload type, a G.711 RTP packet can be losslessly 155 converted to a G.711.0 RTP packet simply by compressing the G.711 156 payload (thus creating a G.711.0 payload), changing the payload type 157 to the dynamic value desired and copying all the remaining G.711 RTP 158 header fields into the corresponding G.711.0 RTP header. Conversely, 159 the corresponding decompression of a G.711.0 RTP packet back to the 160 original source G.711 RTP packet can be accomplished by losslessly 161 decompressing the G.711.0 payload back to the original source G.711 162 payload, changing the payload type back to the payload type of the 163 original G.711 RTP packet and copying all the remaining G.711.0 RTP 164 header fields into the corresponding G.711 RTP header. 166 It is special to note that G.711.0, being both lossless and 167 stateless, can be employed multiple times (e.g., on multiple, 168 individual hops or series of hops) of a given flow with no 169 degradation of quality relative to end-to-end G.711. Stated another 170 way, multiple "lossless transcodes" from/to G.711.0/G.711 do not 171 affect voice quality as typically occurs with lossy transcodes to/ 172 from dissimilar codecs. 174 Lastly, it is expected that G.711.0 will be used as an archival 175 format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode 176 Format is also included in this document. 178 3.2. Key Properties of G.711.0 Design 180 The fundamental design of G.711.0 resulted from the desire to 181 losslessly encode and compress frames of G.711 symbols independent of 182 what types of signals those G.711 frames contained. The primary 183 G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals 184 (such as speech and music). 186 G.711.0 attributes are below: 188 A1 Compression for zero-mean acoustic signals: G.711.0 was designed 189 as its primary use case for the compression of G.711 payloads 190 that contained "speech" or other zero-mean acoustic signals. 192 G.711.0 obtains greater than 50% average compression in service 193 provider environments [ICASSP]. 195 A2 Lossless for any G.711 payload: G.711.0 was designed to be 196 lossless for any valid G.711 payload - even if the payload 197 consisted of apparently random G.711 symbols (e.g., a modem or 198 FAX payload). G.711.0 could be used for "aggregate 64 kbps 199 G.711 channels" carried over IP without explicit concern if a 200 subset of these channels happened to be carrying something 201 other than voice or general audio. To the extent that a 202 particular channel carried something other than voice or 203 general audio, G.711.0 ensured that it was carried losslessly, 204 if not significantly compressed. 206 A3 Stateless: Compression of a frame of G.711 symbols was only to be 207 dependent on that frame and not on any prior frame. Although 208 greater compression is usually available by observing a longer 209 history of past G.711 symbols, it was decided that the 210 compression design would be stateless to completely eliminate 211 error propagation common in many lossy codec designs (e.g., 212 ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]). That is, 213 the decoding process need not be concerned about lost prior 214 packets because the decompression of a given G.711.0 frame is 215 not dependent on potentially lost prior G.711.0 frames. Owing 216 to this stateless property, the frames input to the G.711.0 217 encoder may be changed "on-the-fly" (a 5 ms encoding could be 218 followed by a 20 ms encoding). 220 A4 Self-describing: This property is defined as the ability to 221 determine how many source G.711 samples are contained within 222 the G.711.0 frame solely by information contained within the 223 G.711.0 frame. Generally, the number of source G.711 symbols 224 can be determined by decoding the initial octets of the 225 compressed G.711.0 frame (these octets are called "prefix 226 codes" in the standard) [ICASSP]. A G.711.0 decoder need not 227 know what ptime is, as it is able to decompress the G.711.0 228 frame presented to it without signaling knowledge. 230 A5 Accommodate G.711 payload sizes typically used in IP: G.711 input 231 frames of length typically found in VoIP applications represent 232 SDP ptimes (see RFC 4566 [RFC4566]) of 5 ms, 10 ms, 20 ms, 30 233 ms or 40 ms. Since the dominant sampling frequency for G.711 234 is 8000 samples per second, G.711.0 was designed to compress 235 G.711 input frames of 40, 80, 160, 240 or 320 samples. 237 A6 Bounded expansion: Since attribute A2 above requires G.711.0 to 238 be lossless for any payload, by definition there exists at 239 least one potential G.711 payload which must be 240 "uncompressible". Since the quantum of compression is an 241 octet, the minimum expansion of such an uncompressible payload 242 was designed to be the minimum possible of one octet. Thus 243 G.711.0 "compressed" frames can be of length one octet to X+1 244 octets, where X is the size of the input G.711 frame in octets. 245 G.711.0 can therefore be viewed as a Variable Bit Rate (VBR) 246 encoding in which the size of the G.711.0 output frame is a 247 function of the G.711 symbols input to it. 249 A7 Algorithmic delay: G.711.0 was designed to have the algorithmic 250 delay equal to the time represented by the number of samples in 251 the G.711 input frame (i.e., no "look-ahead"). 253 A8 Low Complexity: Less than 1.0 WMOPS average and low memory 254 footprint (~5k octets RAM, ~5.7k octets ROM and ~3.6 basic 255 operations) [ICASSP] [G.711.0]. 257 A9 Both A-law and mu-law supported: G.711 has two operating laws, 258 A-law and mu-law. These two laws are also known as PCMA and 259 PCMU in RTP applications RFC 3551 [RFC3551]. 261 These attributes generally make it trivial to compress a G.711 input 262 frame consisting of 40, 80, 160, 240 or 320 samples. After the input 263 frame is presented to a G.711.0 encoder, a G.711.0 "self-describing" 264 output frame is produced. The number of samples contained within 265 this frame is easily determined at the G.711.0 decoder by virtue of 266 attribute A4. The G.711.0 decoder can decode the G.711.0 frame back 267 to a G.711 frame by using only data within the G.711.0 frame. 269 Lastly we note that losing a G.711.0 encoded packet is identical in 270 effect of losing a G.711 packet (when using RTP); this is because a 271 G.711.0 payload, like the corresponding G.711 payload, is stateless. 272 Thus, it is anticipated that existing G.711 PLC mechanisms will be 273 employed when a G.711.0 packet is lost and an identical MOS 274 degradation relative to G.711 loss will be achieved. 276 3.3. G.711 Input Frames to G.711.0 Output Frames 278 G.711.0 is a lossless and stateless compression of G.711 frames. The 279 following figure depicts this where "A" is the process of G.711.0 280 encoding and "B" is the process of G.711.0 decoding. 282 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame 284 |--------------------------| A |------------------------------| 285 | G.711 Input Frame |----->| G.711.0 Output Frame | 286 | of X Octets | | containing 1 to X+1 Octets | 287 | (where X MUST be 40, 80, | | (precise value dependent on | 288 | 160, 240 or 320 octets) |<-----| G.711.0 ability to compress) | 289 |__________________________| B |______________________________| 291 Figure 1 293 Note that the mapping is 1:1 (lossless) in both directions, subject 294 to two constraints. The first constraint is that the input frame 295 provided to the G.711.0 encoder (process "A") has a specific number 296 of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240 297 or 320 octets). The second constraint is that the compression law 298 used to create the G.711 input frame (A-law or mu-law) must be known, 299 consistent with attribute A9. 301 Subject to these two constraints, the input G.711 frame is processed 302 by the G.711.0 encoder ("A") and produces a "self-describing" G.711.0 303 output frame, consistent with attribute A4. Depending on the source 304 G.711 symbols, the G.711.0 output frame can contain anywhere from 1 305 to X+1 octets, where X is the number of input G.711 symbols. 306 Compression results for virtually every zero-mean acoustic signal 307 encoded by G.711.0. 309 Since the G.711.0 output frame is "self-describing", a G.711.0 310 decoder (process "B") can losslessly reproduce the original G.711 311 input frame with only the knowledge of which companding law was used 312 (A-law or mu-law). The G.711.0 frame, being "self-describing", 313 allows for the G.711.0 decoder ("B") to know precisely how many G.711 314 symbols to create. 316 Since G.711.0 was designed with typical G.711 payload lengths as a 317 design constraint (attribute A5), this lossless encoding can be 318 performed only with knowledge of the companding law being used. This 319 information is anticipated to be signaled in SDP and will be 320 described later in this document. 322 If the original inputs were known to be from a zero-mean acoustic 323 signal coded by G.711, an intelligent G.711.0 encoder could infer the 324 G.711 companding law in use (via G.711 input signal amplitude 325 histogram statistics). Likewise, an intelligent G.711.0 decoder 326 producing G.711 from the G.711.0 frames could also infer which 327 encoding law in use. Thus G.711.0 could be designed for use in 328 applications that have limited stream signaling between the G.711 329 endpoints (i.e., they only know "G.711 at 8k sampling is being used", 330 but nothing more). Such usage is not further described in this 331 document. Additionally, if the original inputs were known to come 332 from zero-mean acoustic signals, an intelligent G.711.0 encoder could 333 tell if the G.711.0 payload had been encrypted - as the symbols would 334 not have the distribution expected in either companding law and would 335 appear random. Such determination is also not further discussed in 336 this document. 338 It is easily seen that this process is 1:1 and that G.711.0 based 339 lossless compression can be employed multiple times, as the original 340 G.711 input symbols are always reproduced with 100% fidelity. 342 3.3.1. Multiple G.711.0 Output Frames per RTP Payload Considerations 344 As a general rule, G.711.0 frames containing more source G.711 345 symbols (from a given channel) will typically result in higher 346 compression, but there are exceptions to this rule. A G.711.0 347 encoder may choose to encode 20 ms of input G.711 symbols as: 1) a 348 single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3) 349 any other combination of 5 ms or 10 ms G.711.0 frames - depending on 350 which encoding resulted in fewer bits. As an example, an intelligent 351 encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0 352 frames if the first 10 ms was "silence" and two G.711.0 frames took 353 fewer bits than any other possible encoding combination of G.711.0 354 frame sizes. 356 During the process of G.711.0 standardization it was recognized that 357 although it is sometimes advantageous to encode integer multiples of 358 40 G.711 symbols in whatever input symbol format resulted in the most 359 compression (as per above), the simplest choice is to encode the 360 entire ptime's worth of input G.711 symbols into one G.711.0 frame 361 (if the ptime supported it). This is especially so since the larger 362 number of source G.711 symbols typically resulted in the highest 363 compression anyway and there is added complexity in searching for 364 other possibilities (involving more G.711.0 frames) which were 365 unlikely to produce a more bit efficient result. 367 The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of 368 multiple G.711.0 input frames in that the decoder was defined to 369 decode what it refers to as an incoming "bit stream". For this 370 specification, the bit stream is the G.711.0 RTP payload itself. 371 Thus, the decoder will take the G.711.0 RTP payload and will produce 372 an output frame containing the original G.711 symbols independent of 373 how many G.711.0 frames were present in it. Additionally, any number 374 of 0x00 padding octets placed between the G.711.0 frames will be 375 silently (and safely) ignored by the G.711.0 decoding process 376 Section 4.2.3). 378 To recap, a G.711.0 encoder may choose to encode incoming G.711 379 symbols into one or more than one G.711.0 frames and put the 380 resultant frame(s) into the G.711.0 RTP payload. Zero or more 0x00 381 padding octets may also be included in the G.711.0 RTP payload. The 382 G.711.0 decoder, being insensitive to the number of G.711.0 encoded 383 frames that are contained within it, will decode the G.711.0 RTP 384 payload into the source G.711 symbols. Although examples of single 385 or multiple G.711 frame cases will be illustrated in Section 4.2, the 386 multiple G.711.0 frame cases MUST be supported and there is no need 387 for negotiation (SDP or otherwise) required for it. 389 4. RTP Header and Payload 391 In this section we describe the precise format for G.711.0 frames 392 carried via RTP. We begin with RTP header description relative to 393 G.711, then provide two G.711.0 payload examples. 395 4.1. G.711.0 RTP Header 397 Relative to G.711 RTP headers, the utilization of G.711.0 does not 398 create any special requirements with respect to the contents of the 399 RTP packet header. The only significant difference is that the 400 payload type (PT) RTP header field will have a value corresponding to 401 the dynamic payload type assigned to the flow. This is in contrast 402 to most current uses of G.711 which typically use the static payload 403 assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though 404 the negotiation and use of dynamic payload types is allowed for 405 G.711. 407 Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is 408 negotiated because G.711.0 obtains high compression during "VAD 409 silence intervals" and one of the advantages of G.711.0 over G.711 410 with VAD is the lack of any VAD-inducing artifacts in the received 411 signal. However, if VAD is employed, the Marker bit (M) MUST be set 412 in the first packet of a talkspurt (the first packet after a silence 413 period in which packets have not been transmitted contiguously as per 414 rules specified in [RFC3551] for G.711 payloads). This definition, 415 being consistent with the G.711 RTP VAD use, further allows lossless 416 transcoding between G.711 RTP packets and G.711.0 RTP packets as 417 described in Section 3.1. 419 With this introduction, the RTP packet header fields are defined as 420 follows: 422 V - As per [RFC3550] 423 P - As per [RFC3550] 425 X - As per [RFC3550] 427 CC - As per [RFC3550] 429 M - As per [RFC3550] and [RFC3551] 431 PT - The assignment of an RTP payload type for the format defined 432 in this memo is outside the scope of this document. The RTP 433 profiles in use currently mandate binding the payload type 434 dynamically for this payload format. 436 SN - As per [RFC3550] 438 timestamp - As per [RFC3550] 440 SSRC - As per [RFC3550] 442 CSRC - As per [RFC3550] 444 Where V (version bits), P (padding bit), X (extension bit), CC (CSRC 445 count), M (marker bit), PT (payload type), SN (sequence number), 446 timestamp, SSRC (synchronizing source) and CSRC (contributing 447 sources) are as defined in [RFC3550] and as typically used with 448 G.711. PT (payload type) is as defined in [RFC3551]. 450 4.2. G.711.0 RTP Payload 452 This section defines the G.711.0 RTP payload and illustrates it by 453 means of two examples. 455 The first example, in Section 4.2.1, depicts the case when it is 456 desired to carry only one G.711.0 frame in the RTP payload. This 457 case is expected to be the dominant use case and is shown separately 458 for the purposes of clarity. 460 The second example, in Section 4.2.2, depicts the general case when 461 it is desired to carry one or more G.711.0 frames in the RTP payload. 462 This is the actual definition of the G.711.0 RTP payload. 464 4.2.1. Single G.711.0 Frame per RTP Payload Example 466 This example depicts a single G.711.0 frame in the RTP payload. This 467 is expected to be the dominant RTP payload case for G.711.0, as the 468 G.711.0 encoding process supports the SDP packet times (ptime and 469 maxptime, see [RFC4566]) commonly used when G.711 is transported in 470 RTP. Additionally, as mentioned previously, larger G.711.0 frames 471 generally compress more effectively than a multiplicity of smaller 472 G.711.0 frames. 474 The following Figure illustrates the single G.711.0 frame per RTP 475 payload case. 477 Single G.711.0 Frame in RTP Payload Case 479 |-------------------|-------------------| 480 | One G.711.0 Frame | Zero or more 0x00 | 481 | | Padding Octets | 482 |___________________|___________________| 484 Figure 2 486 Encoding Process: A single G.711.0 frame is inserted into the RTP 487 payload. The amount of time represented by the G.711 symbols 488 compressed in the G.711.0 frame MUST correspond to the ptime signaled 489 for applications using SDP. Although generally not desired, padding 490 desired in the RTP payload after the G.711.0 frame MAY be created by 491 placing one or more 0x00 octets after the G.711.0 frame. Such 492 padding may be desired based on security considerations (see 493 Section 10). 495 Decoding Process: Passing the entire RTP payload to the G.711.0 496 decoder is sufficient for the G.711.0 decoder to create the source 497 G.711 symbols. Any padding inserted after the G.711.0 frame (i.e., 498 the 0x00 octets) present in the RTP payload is silently ignored by 499 the G.711.0 decoding process. The decoding process is fully 500 described in Section 4.2.3 below. 502 4.2.2. G.711.0 RTP Payload Definition 504 This section defines the G.711.0 RTP payload and illustrates the case 505 of when one or more G.711.0 frames are to be placed in the payload. 506 All G.711.0 RTP decoders MUST support the general case described in 507 this section (rationale presented previously in Section 3.3.1). 509 Note that since each G.711.0 frame is self-describing (see Attribute 510 A4 in Section 3.2), the individual G.711.0 frames in the RTP payload 511 need not represent the same duration of time (i.e., a 5 ms G.711.0 512 frame could be followed by a 20 ms G.711.0 frame). Owing to this, 513 the amount of time represented in the RTP payload MAY be any integer 514 multiple of 5 ms (as 5 ms is the smallest interval of time that can 515 be represented in a G.711.0 frame). 517 The following Figure illustrates the one or more G.711.0 frames per 518 RTP payload case where the number of G.711.0 frames placed in the RTP 519 payload is N. We note that when N is equal to 1 that this case is 520 identical to the previous example. 522 One or More G.711.0 Frames in RTP Payload Case 524 |----------|---------|----------|---------|----------------| 525 | First | Second | | Nth | Zero or more | 526 | G.711.0 | G.711.0 | ... | G.711.0 | 0x00 | 527 | Frame | Frame | | Frame | Padding Octets | 528 |__________|_________|__________|_________|________________| 530 Figure 3 532 We note here that when we have multiple G.711.0 frames that the 533 individual frames can be, and generally are, of different lengths. 534 The decoding process in the following section is used to determine 535 the frame boundaries. 537 Encoding Process: One or more G.711.0 frames are placed in the RTP 538 payload simply by concatenating the G.711.0 frames together. The 539 amount of time represented by the G.711 symbols compressed in all the 540 G.711.0 frames in the RTP payload MUST correspond to the ptime 541 signaled for applications using SDP. Although not generally desired, 542 padding in the RTP payload SHOULD be placed after the last G.711.0 543 frame in the payload and MAY be created by placing one or more 0x00 544 octets after the last G.711.0 frame. Such padding may be desired 545 based on security considerations (see Section 10). 547 Decoding Process: As G.711.0 frames can be of varying length, the 548 payload decoding process described in the following section is used 549 to determine where the individual G.711.0 frame boundaries are. Any 550 padding octets inserted before or after any G.711.0 frame in the RTP 551 payload is silently (and safely) ignored by the G.711.0 decoding 552 process. 554 4.2.3. G.711.0 RTP Payload Decoding Process 556 The G.711.0 decoding process is a standard part of G.711.0 bit stream 557 decoding and is implemented in the ITU-T Rec. G.711.0 reference code. 558 The decoding process algorithm described in this section is a slight 559 enhancement of the ITU-T reference code to explicitly accommodate RTP 560 padding (as described above). 562 Before describing the decoding, we note here that the largest 563 possible G.711.0 frame is created whenever the largest number of 564 G.711 symbols is encoded (320 from Section 3.2, property A5) and 565 these 320 symbols are "uncompressible" by the G.711.0 encoder. In 566 this case (via property A6 in Section 3.2) the G.711.0 output frame 567 will be 321 octets long. We also note that the value 0x00 chosen for 568 the optional padding cannot be the first octet of a valid ITU-T Rec. 569 G.711.0 frame (see [G.711.0]). We also note that whenever more than 570 one G.711.0 frame is contained in the RTP payload, the decoding of 571 the individual G.711.0 frames will occur multiple times. 573 For the decoding algorithm below, let N be the number of octets in 574 the RTP payload (i.e., excluding any RTP padding, but including any 575 RTP payload padding), let P equal the number of RTP payload octets 576 processed by the G.711.0 decoding process, let K be the number of 577 G.711 symbols presently in the output buffer, let Q be the number of 578 octets contained in the G.711.0 frame being processed and let "!=" 579 represent not equal to. The keyword "STOP" is used below to indicate 580 the end of the processing of G.711.0 frames in the RTP payload. The 581 algorithm below assumes an output buffer for the decoded G.711 source 582 symbols of length sufficient to accommodate the expected number of 583 G.711 symbols and an input buffer of length 321 octets. 585 G.711.0 RTP Payload Decoding Heuristic: 587 H1 Initialization of counters: Initialize P, the number of processed 588 octets counter, to zero. Initialize K, the counter for how 589 many G.711 symbols are in the output buffer, to zero. 590 Initialize N to the number of octets in the RTP payload 591 (including any RTP payload padding). Go to H2. 593 H2 Read internal buffer: Read min{320+1, (N-P)-1} octets into the 594 internal buffer from the (P+1) octet of the RTP payload. We 595 note at this point, N-P octets have yet to be processed and 596 that 320+1 octets is the largest possible G.711.0 frame. Also 597 note that in the common case of zero-based array indexing of a 598 uint8 array of octets, that this operation will read octets 599 from index P through index [min{320+1, (N-P)}] from the RTP 600 payload. Go to H3. 602 H3 Analyze the first octet in the internal buffer: If this octet 603 0x00 (a padding octet) go to H4, otherwise go to H5 (process a 604 G.711.0 frame). 606 H4 Process padding octet (no G.711 symbols generated): Increment the 607 processed packets counter by one (set P = P + 1). If the 608 result of this increment results in P >= N then STOP (as all 609 RTP Payload octets have been processed), otherwise go to H2. 611 H5 Process an individual G.711.0 frame (produce G.711 samples in the 612 output frame): Pass the internal buffer to the G.711.0 decoder. 613 The G.711.0 decoder will read the first octet (called the 614 "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to 615 determine the number of source G.711 samples M are contained in 616 this G.711.0 frame. The G.711.0 decoder will produce exactly M 617 G.711 source symbols. If K = 0, these M symbols will be the 618 first in the output buffer and are placed at the beginning of 619 the output buffer. If K != 0, concatenate these M symbols with 620 the prior symbols in the output buffer (there are K prior 621 symbols in the buffer). Set K = K + M (as there are now this 622 many G.711 source symbols in the output buffer). The G.711.0 623 decoder will have consumed some number of octets, Q, in the 624 internal buffer to produce the M G.711 symbols. Increment the 625 number of payload octet processed counter by this quantity (set 626 P = P + Q). If the result of this increment results in P >= N 627 then STOP (as all RTP Payload octets have been processed), 628 otherwise go to H2. 630 At this point, the output buffer will contain precisely K G.711 631 source symbols which should correspond to the ptime signaled if SDP 632 was used and the encoding process was without error. 634 We also note, as an aside, that the algorithm above (and the ITU-T 635 G.711.0 reference code) accommodates padding octets (0x00) placed 636 anywhere between G.711.0 frames in the RTP payload as well as prior 637 to or after any or all G.711.0 frames. The ITU-T G.711.0 reference 638 code does not have Step H3 and H4 as separate steps (i.e., Step H5 639 immediately follows H2) at the added computational cost of some 640 additional buffer passing to/from the G.711.0 frame decoder 641 functions. That is the G.711.0 decoder in the reference code 642 "silently ignores" 0x00 padding octets at the beginning of what it 643 believes to be a G.711.0 encoded frame boundary. Thus Step H3 and 644 Step H4 above are an optimization over the reference code shown for 645 clarity. 647 If the decoder is at a playout endpoint location, this G.711 buffer 648 SHOULD be used in the same manner as a received G.711 RTP payload 649 would have been used (passed to a playout buffer, to a PLC 650 implementation, etc.). 652 4.2.4. G.711.0 RTP Payload for Multiple Channels 654 In this section we describe the use of multiple "channels" of G.711 655 data encoded by G.711.0 compression. 657 The dominant use of G.711 in RTP transport has been for single 658 channel use cases. For this case, the above G.711.0 encoding and 659 decoding process is used. However, the multiple channel case for 660 G.711.0 (a frame-based compression) is different from G.711 (a 661 sample-based encoding) and is described separately here. 663 RFC 3551 [RFC3551] provides guidelines for encoding audio channels 664 (Section 4) and for the ordering of the channels within the RTP 665 payload (Section 4.1). The ordering guidelines in RFC 3551, 666 Section 4.1 SHOULD be used unless an application-specific channel 667 ordering is more appropriate. 669 An implicit assumption in RFC 3551 is that all the channel data 670 multiplexed into a RTP payload MUST represent the same physical time 671 span. The case for G.711.0 is no different; the underlying G.711 672 data for all channels in a G.711.0 RTP payload MUST span the same 673 interval in time (e.g., the same "ptime" for a SDP-specified codec 674 negotiation). 676 RFC 3551 provides guidelines for sample-based encodings such as G.711 677 in Section 4.2. This guidance is tantamount to interleaving the 678 individual samples in that they SHOULD be packed in consecutive 679 octets. 681 RFC 3551 provides guidelines for frame-based encodings in which the 682 frames are interleaved. However, this guidance stems from the 683 assumption that "the frame size for frame-oriented codecs is a 684 given". However, this assumption is not valid for G.711.0 in that 685 individual consecutive G.711.0 frames (as per Section 4.2.2) can: 687 1) represent different time spans (e.g., two 5 ms G.711.0 frames 688 in lieu of one 10 ms G.711.0 frame), and 690 2) be of different lengths in octets (and typically are). 692 Therefore a different, but also simple, concatenation-based approach 693 is specified in this RFC. 695 For the multiple channel G.711.0 case, each G.711 channel is 696 independently encoded into one or more G.711.0 frames defined here as 697 a "G.711.0 channel superframe". Each one of these superframes is 698 identical to the multiple G.711.0 frame case illustrated in Figure 3 699 of Section 4.2.2 in which each superframe can have one or more 700 individual G.711.0 frames within it. Then each G.711.0 channel 701 superframe is concatenated - in channel order - into a G.711.0 RTP 702 payload. Then, if optional G.711.0 padding octets (0x00) are 703 desired, it is RECOMMENDED that these octets are placed after the 704 last G.711.0 channel superframe. As per above, such padding may be 705 desired based on security considerations (see Section 10). This is 706 depicted in the following Figure 4 below. 708 Multiple G.711.0 Channel Superframes in RTP Payload 710 |----------|---------|----------|---------|---------| 711 | First | Second | | Nth | Zero | 712 | G.711.0 | G.711.0 | ... | G.711.0 | or more | 713 | Channel | Channel | | Channel | 0x00 | 714 | Super- | Super- | | Super | Padding | 715 | Frame | Frame | | Frame | Octets | 716 |__________|_________|__________|_________|_________| 718 Figure 4 720 We note that although the individual superframes can be of different 721 lengths in octets (and usually are), that the number of G.711 source 722 symbols represented - in compressed form - in each channel superframe 723 is identical (since all the channels represent the identically same 724 time interval). 726 The G.711.0 decoder at the receiving end simply decodes the entire 727 G.711.0 (multiple channel) payload into individual G.711 symbols. If 728 M such G.711 symbols result and there were N channels, then the first 729 M/N G.711 samples would be from the first channel, the second M/N 730 G.711 samples would be from the second channel, and so on until the 731 Nth set of G.711 samples are found. Similarly, if the number of 732 channels was not known, but the payload "ptime" was known, one could 733 infer (knowing the sampling rate) how many G.711 symbols each channel 734 contained; then with this knowledge determine how many channels of 735 data were contained in the payload. When SDP is used, the number of 736 channels is known because the optional parameter is a MUST when there 737 is more than one channel negotiated (see Section 5.1). Additionally, 738 when SDP is used the parameter ptime is a RECOMMENDED optional 739 parameter. We note that if both parameters channels and ptime are 740 known that one could provide a check for the other and the converse. 741 Whichever algorithm is used to determine the number of channels, if 742 the length of the source G.711 symbols in the payload (M) is not an 743 integer multiple of the number of channels (N), then the packet 744 SHOULD be discarded. 746 Lastly we note that although any padding for the multiple channel 747 G.711.0 payload is RECOMMENDED to be placed at the end of the 748 payload, the G.711.0 decoding algorithm described in Section 4.2.3 749 will successfully decode the payload in Figure 4 if the 0x00 padding 750 octet is placed anywhere before or after any individual G.711.0 frame 751 in the RTP payload. The number of padding octets introduced at any 752 G.711.0 frame boundary therefore does not affect the number M of the 753 source G.711 symbols produced. Thus the decision for padding MAY be 754 made on a per-superframe basis. 756 5. Payload Format Parameters 758 This section defines the parameters that may be used to configure 759 optional features in the G.711.0 RTP transmission. 761 The parameters defined here as a part of the media subtype 762 registration for the G.711.0 codec. Mapping of the parameters into 763 Session Description Protocol (SDP) RFC 4566 [RFC4566] is also 764 provided for those applications that use SDP. 766 5.1. Media Type Registration 768 Type name: audio 770 Subtype name: G711-0 772 Required parameters: 774 clock rate: The RTP timestamp clock rate, which is equal to the 775 sampling rate. The typical rate used with G.711 encoding is 8000, 776 but other rates may be specified. The default rate is 8000. 778 complaw: This format specific parameter, specified on the "a=fmtp: 779 line", indicates the companding law (A-law or mu-law) employed. 780 This format specific parameter, as per RFC 4566 [RFC4566], is 781 given unchanged to the media tool using this format. The case- 782 insensitive values are "complaw=al" or "complaw=mu" are used for 783 A-law and mu-law, respectively. 785 Optional parameters: 787 channels: See RFC 4566 [RFC4566] for definition. Specifies how 788 many audio streams are represented in the G.711.0 payload and MUST 789 be present if the number of channels is greater than one. This 790 parameter defaults to 1 if not present (as per RFC 4566) and is 791 typically a non-zero small-valued positive integer. It is 792 expected that implementations that specify multiple channels will 793 also define a mechanism to map the channels appropriately within 794 their system design, otherwise the channel order specified in RFC 795 3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right, 796 center, ... ). Similar to the usual interpretation in RFC 3551 797 [RFC3551], the number of channels SHALL be a non-zero positive 798 integer. 800 maxptime: See RFC 4566 [RFC4566] for definition. 802 ptime: See RFC 4566 [RFC4566] for definition. The inclusion of 803 "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an 804 application specific reason not to include it (e.g., an 805 application that has a variable ptime on a packet-by-packet 806 basis). For constant ptime applications, it is considered good 807 form to include "ptime" in the SDP for session diagnostic 808 purposes. For the constant ptime multiple channel case described 809 in Section 4.2.2, the inclusion of "ptime" can provide a desirable 810 payload check. 812 Encoding considerations: 814 This media type is framed binary data (see Section 4.8 in RFC 6838 815 [RFC6838]) compressed as per ITU-T Rec. G.711.0. 817 Security considerations: 819 See Section 10. 821 Interoperability considerations: none 823 Published specification: 825 ITU-T Rec. G.711.0 and RFC XXXX. 827 [ RFC Editor: please replace XXXXX with a reference to this RFC ] 829 Applications that use this media type: 831 Although initially conceived for VoIP, the use of G.711.0, like 832 G.711 before it, may find use within audio and video streaming 833 and/or conferencing applications for the audio portion of those 834 applications. 836 Additional information: 838 The following applies to stored-file transfer methods: 840 Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law 841 encodings respectively, see Section 6). 843 File Extensions: None 845 Macintosh file type code: None 847 Object identifier or OIL: None 849 Person & email address to contact for further information: 851 Michael A. Ramalho or 853 Intended usage: COMMON 855 Restrictions on usage: 857 This media type depends on RTP framing, and hence is only defined 858 for transfer via RTP [RFC3550]. Transport within other framing 859 protocols is not defined at this time. 861 Author: Michael A. Ramalho 863 Change controller: 865 IETF Payload working group delegated from the IESG. 867 5.2. Mapping to SDP Parameters 869 The information carried in the media type specification has a 870 specific mapping to fields in the Session Description Protocol (SDP), 871 which is commonly used to describe RTP sessions. When SDP is used to 872 specify sessions employing G.711.0, the mapping is as follows: 874 o The media type ("audio") goes in SDP "m=" as the media name. 876 o The media subtype ("G711-0") goes in SDP "a=rtpmap" as the 877 encoding name. 879 o The required parameter "rate" also goes in "a=rtpmap" as the clock 880 rate. 882 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 883 "a=maxptime" attributes, respectively. 885 o Remaining parameters go in the SDP "a=fmtp" attribute by copying 886 them directly from the media type string as a semicolon-separated 887 list of parameter=value pairs. 889 5.3. Offer/Answer Considerations 891 The following considerations apply when using the SDP offer/answer 892 RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute. 894 o If the offering endpoint specifies a value for the optional 895 channels parameter greater than one and the answering endpoint 896 both understands the parameter and cannot support that value 897 requested, the answer MUST contain the optional channels parameter 898 with the highest value it can support. 900 o If the offering endpoint specifies a value for the optional 901 channels parameter the answer MUST contain the optional channels 902 parameter unless the only value the answering endpoint can support 903 is one, in which case the answer MAY contain the optional channels 904 parameter with value of 1. 906 o If the offering endpoint specifies a value for the ptime parameter 907 that the answering endpoint cannot support, the answer MUST 908 contain the optional ptime parameter. 910 o If the offering endpoint specifies a value for the maxptime 911 parameter that the answering endpoint cannot support, the answer 912 MUST contain the optional maxptime parameter. 914 5.4. SDP Examples 916 The following examples illustrate how to signal G.711.0 via SDP. 918 5.4.1. SDP Example 1 920 m=audio RTP/AVP 98 921 a=rtpmap:98 G711-0/8000 922 a=fmtp:98 complaw=mu 924 In the above example the dynamic payload type 98 is mapped to G.711.0 925 via the "a=rtpmap" parameter. The mandatory "complaw" is on the 926 "a=fmtp" parameter line. Note that neither optional parameters 927 "ptime" nor "channels" is present; although it is generally good form 928 to include "ptime" in the SDP for session diagnostic purposes. 930 5.4.2. SDP Example 2 932 The following example illustrates an offering endpoint requesting 2 933 channels, but the answering endpoint can only support (or render) one 934 channel. 936 Offer: 938 m=audio RTP/AVP 98 939 a=rtpmap:98 G711-0/8000/2 940 a=ptime:20 941 a=fmtp:98 complaw=al 943 Answer: 945 m=audio RTP/AVP 98 946 a=rtpmap: 98 G711-0/8000/1 947 a=ptime: 20 948 a=fmtp:98 complaw=al 950 In this example the offer had an optional channels parameter. The 951 answer must have the optional channels parameter also unless the 952 value in the answer is one. Shown here is when the answer explicitly 953 contains the channels parameter (it need not have and it would be 954 interpreted as one channel). As mentioned previously, it is 955 considered good form to include "ptime" in the SDP for session 956 diagnostic purposes if the session is a constant ptime session. 958 6. G.711.0 Storage Mode Conventions and Definition 960 The G.711.0 storage mode definition in this section is similar to 961 many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a 962 concatenation of individual G.711.0 frames. 964 We note that something must be stored for any G.711.0 frames that not 965 received at the receiving endpoint, no matter what the cause. In 966 this section we describe two mechanisms, a "G.711.0 PLC Frame" and a 967 "G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure 968 Frames are described prior to the G.711.0 storage mode definition for 969 clarity. 971 6.1. G.711.0 PLC Frame 973 When G.711 RTP payloads not received by a rendering endpoint a Packet 974 Loss Concealment (PLC) mechanism is typically employed to "fill in" 975 the missing G.711 symbols with something that is auditorially 976 pleasing and thus the loss may be not noticed by a listener. Such a 977 PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1 978 [G.711-AP1]. 980 An natural extension when creating G.711.0 frames for storage 981 environments is to employ such a PLC mechanism to create G.711 982 symbols for the span of time in which G.711.0 payloads were not 983 received - and then to compress the resulting "G.711 PLC symbols" via 984 G.711.0 compression. The G.711.0 frame(s) created by such a process 985 are called "G.711.0 PLC Frames". 987 Since PLC mechanisms are designed to render missing audio data with 988 the best fidelity and intelligibility, G.711.0 frames created via 989 such processing is likely best for most recording situations (such as 990 voicemail storage) unless there is a requirement not to fabricate 991 (audio) data not actually received. 993 After such PLC G.711 symbols have been generated and then encoded by 994 a G.711.0 encoder, the resulting frames may be stored in G.711.0 995 frame format. As a result, there is nothing to specify here - the 996 G.711.0 PLC Frames are stored as if they were received by the 997 receiving endpoint. In other words, PLC-generated G.711.0 frames 998 appear as "normal" or "ordinary" G.711.0 frames in the storage mode 999 file. 1001 6.2. G.711.0 Erasure Frame 1003 "Erasure Frames", or equivalently "Null Frames", have been designed 1004 for many frame-based codecs since G.711 was standardized. These 1005 null/erasure frames explicitly represent data from incoming audio 1006 that were either not received by the receiving system or represent 1007 data that a transmitting system decided not to send. Transmitting 1008 systems may choose not to send data for a variety of reasons (e.g., 1009 not enough wireless link capacity in radio-based systems) and can 1010 choose to send a "null frame" in lieu of the actual audio. It is 1011 also envisioned that erasure frames would be used in storage mode 1012 applications for specific archival purposes where there is a 1013 requirement not to fabricate audio data that was not actually 1014 received. 1016 Thus, a G.711.0 erasure frame is a representation of the amount of 1017 time in G.711.0 frames that were not received or not encoded by the 1018 transmitting system. 1020 Prior to defining a G.711.0 erasure frame it is beneficial to note 1021 what many G.711 RTP systems send when the endpoint is "muted". When 1022 muted, many of these systems will send an entire G.711 payload of 1023 either 0+ or 0- (i.e., one of the two levels closest to "analog zero" 1024 in either G.711 companding law). Next we note that a desirable 1025 property for a G.711.0 erasure frame is for "non G.711.0 Erasure 1026 Frame aware" endpoints to be able to playback a G.711.0 erasure frame 1027 with the existing G.711.0 ITU-T reference code. 1029 A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the 1030 corresponding G.711 sample values are either the value 0++ or the 1031 value 0-- for the entirety of the G.711.0 frame. The levels of 0++ 1032 and 0-- are defined to be the two levels above or below analog zero, 1033 respectively. An entire frame of value 0++ or 0-- is expected to be 1034 extraordinarily rare when the frame was in fact generated by a 1035 natural signal (on the order of one in 2^{ptime in samples, minus 1036 one}), as analog inputs such as speech and music are zero-mean and 1037 are typically acoustically coupled to digital sampling systems. Note 1038 that the playback of a G.711.0 frame characterized as an erasure 1039 frame is auditorially equivalent to a muted signal (a very low value 1040 constant). 1042 These G.711.0 erasure frames can be reasonably characterized as null 1043 or erasure frames while meeting the desired playback goal of being 1044 decoded by the G.711.0 ITU-T reference code. Thus, similarly to 1045 G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or 1046 "ordinary" G.711.0 frames in the storage mode format. 1048 6.3. G.711.0 Storage Mode Definition 1050 The storage format is used for storing G.711.0 encoded frames. The 1051 format for the G.711.0 storage mode file defined by this RFC is shown 1052 below. 1054 G.711.0 Storage Mode Format 1056 |---------------------------|----------|--------------| 1057 | Magic Number | | | 1058 | | Version | Concatenated | 1059 | "#!G7110A\n" (for A-law) | Octet | G.711.0 | 1060 | or | | Frames | 1061 | "#!G7110M\n" (for mu-law) | "0x00" | | 1062 |___________________________|__________|______________| 1064 Figure 5 1066 The storage mode file consists of a magic number and a version octet 1067 followed by the individual G.711.0 frames concatenated together. 1069 The magic number for G.711.0 A-law corresponds to the ASCII character 1070 string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41 1071 0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to 1072 the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37 1073 0x31 0x31 0x4E 0x4D 0x0A". 1075 The version number octet allows for the future specification of other 1076 G.711.0 storage mode formats. The specification of other storage 1077 mode formats may be desirable as G.711.0 frames are of variable 1078 length and a future format may include an indexing methodology that 1079 would enable playout far into a long G.711.0 recording without the 1080 necessity of decoding all the G.711.0 frames since the beginning of 1081 the recording. Other future format specification may include support 1082 for multiple channels, metadata and the like. For these reasons it 1083 was determined that a versioning strategy was desirable for the 1084 G.711.0 storage mode definition specified by this RFC. This RFC only 1085 specifies Version 0 and thus the value of "0x00" MUST be used for the 1086 storage mode defined by this RFC. 1088 The G.711.0 codec data frames, including any necessary erasure or PLC 1089 frames, are stored in consecutive order concatenated together as 1090 shown in Section 4.2.2. As the Version 0 storage mode only supports 1091 a single channel, the RTP payload format supporting multiple channels 1092 defined in Section 4.2.4 is not supported in this storage mode 1093 definition. 1095 To decode the individual G.711.0 frames, the algorithm presented in 1096 Section 4.2.2 may be used to decode the individual G.711.0 frames. 1097 If the version octet is determined not to be zero, the remainder of 1098 the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T 1099 G.711.0 reference decoder can only decode concatenated G.711.0 frames 1100 and has not been designed to decode elements in yet to be specified 1101 future storage mode formats. 1103 7. Acknowledgements 1105 There have been many people contributing to G.711.0 in the course of 1106 its development. The people listed here deserve special mention: 1107 Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke 1108 Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick 1109 Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs, 1110 Yutaka Kamamoto, and Csaba Kos. The review and oversight by the IETF 1111 Payload Working Group chairs Ali Begen and Roni Even during the 1112 development of this RFC is appreciated. Additionally, the careful 1113 review and comments by Richard Barnes is likewise very much 1114 appreciated. 1116 8. Contributors 1118 The authors thank everyone who have contributed to this document. 1119 The people listed here deserve special mention: Ali Begen, Roni Even, 1120 and Hadriel Kaplan. 1122 9. IANA Considerations 1124 One media type (audio/G711-0) has been defined and requires IANA 1125 registration in the media types registry. See Section 5.1 for 1126 details. 1128 10. Security Considerations 1130 RTP packets using the payload format defined in this specification 1131 are subject to the security considerations discussed in the RTP 1132 specification [RFC3550], and in any appropriate RTP profile (for 1133 example RFC 3551 [RFC3551] or [RFC4585]). This implies that 1134 confidentiality of the media streams is achieved by encryption; for 1135 example, through the application of SRTP [RFC3711]. Because the data 1136 compression used with this payload format is applied end-to-end, any 1137 encryption needs to be performed after compression. 1139 Note that the appropriate mechanism to ensure confidentiality and 1140 integrity of RTP packets and their payloads is very dependent on the 1141 application and on the transport and signaling protocols employed. 1142 Thus, although SRTP is given as an example above, other possible 1143 choices exist. 1145 Note that end-to-end security with either authentication, integrity 1146 or confidentiality protection will prevent a network element not 1147 within the security context from performing media-aware operations 1148 other than discarding complete packets. To allow any (media-aware) 1149 intermediate network element to perform its operations, it is 1150 required to be a trusted entity which is included in the security 1151 context establishment. 1153 G.711.0 has no known denial-of-service attacks due to decoding, as 1154 data posing as a desired G711.0 payload will be decoded into 1155 something (as per the decoding algorithm) with a finite amount of 1156 computation. This is due to the decompression algorithm having a 1157 finite worst-case processing path (no infinite computational loops 1158 are possible). We also note that the data read by the G.711.0 1159 decoder is controlled by the length of the individual encoded G.711.0 1160 frame(s) contained in the RTP payload. The decoding algorithm 1161 specified in Section 4.2.3 above ensures that the G.711.0 decoder 1162 will not read beyond the length of the internal buffer specified 1163 (which is in turn specified to be no greater than the largest 1164 possible G.711.0 frame of 321 octets). Therefore a G.711.0 payload 1165 does not carry "active content" that could impose malicious side- 1166 effects upon the receiver. 1168 G.711.0 is a variable bit rate (VBR) audio codec. There have been 1169 recent concerns with VBR speech codecs where a passive observer can 1170 identify phrases from a standard speech corpus by means of the 1171 lengths produced by the encoder even when the payload is encrypted 1172 [IEEE]. In this paper, it was determined that some code excited 1173 linear prediction (CELP) codecs would produce discrete packet lengths 1174 for some phonemes. And furthermore with the use of appropriately 1175 designed Hidden Markov Models (HMMs) that such a system could predict 1176 phrases with unexpected accuracy. One CELP codec studied, SPEEX, had 1177 the property that it produced 21 different packet lengths in its 1178 wideband mode and that these packet lengths probabilistically mapped 1179 to phonemes that a HMM system could be trained on. In this paper it 1180 was determined that a mitigation technique would be to pad the output 1181 of the encoder with random padding lengths to the effect: 1) that 1182 more discrete payload sizes would result, and 2) that the 1183 probabilistic mapping to phonemes would become less clear. As G.711 1184 is not a speech model based codec, neither is G.711.0. A G.711.0 1185 encoding, during talking periods, produces frames of varying frame 1186 lengths which are not likely to have a strong mapping to phonemes. 1188 Thus G.711.0 is not expected to have this same vulnerability. It 1189 should be noted that "silence" (only one value of G.711 in the entire 1190 G.711 input frame)" or "near silence" (only a few G.711 values) is 1191 easily detectable as G.711.0 frame lengths or one or a few octets. 1192 If one desires to mitigate for silence/non-silence detection, 1193 statistically variable padding should be added to G.711.0 frames that 1194 resulted in very small G.711.0 frames (less than about 20% of the 1195 symbols of the corresponding G.711 input frame). Methods of 1196 introducing padding in the G.711.0 payloads have been provided in the 1197 G.711.0 RTP payload definition in Section 4.2.2. 1199 11. Congestion Control 1201 The G.711 codec is a Constant Bit Rate (CBR) codec which does not 1202 have a means to regulate the bitrate. The G.711.0 lossless 1203 compression algorithm typically compresses the G.711 CBR stream into 1204 a smaller VBR stream. However, being lossless, it does not possess 1205 means of further reducing the bitrate beyond the G.711.0-based 1206 compression result. The G.711.0 RTP payloads can be made arbitrarily 1207 large by means of adding optional padding bytes (subject only to MTU 1208 limitations). 1210 Therefore, there are no explicit ways to regulate the bit-rate of the 1211 transmissions outlined in this RTP Payload format except by means of 1212 modulating the number of optional padding bytes in the RTP payload. 1214 12. References 1216 12.1. Normative References 1218 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1219 Requirement Levels", BCP 14, RFC 2119, March 1997. 1221 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1222 Description Protocol", RFC 4566, July 2006. 1224 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1225 Specifications and Registration Procedures", BCP 13, RFC 1226 6838, January 2013. 1228 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1229 Jacobson, "RTP: A Transport Protocol for Real-Time 1230 Applications", STD 64, RFC 3550, July 2003. 1232 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1233 Video Conferences with Minimal Control", STD 65, RFC 3551, 1234 July 2003. 1236 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1237 "Extended RTP Profile for Real-time Transport Control 1238 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1239 2006. 1241 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1242 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1243 RFC 3711, March 2004. 1245 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1246 with Session Description Protocol (SDP)", RFC 3264, June 1247 2002. 1249 [G.711.0] ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless 1250 Compression of G.711 Pulse Code Modulation", September 1251 2009. 1253 [G.711] ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code 1254 Modulation (PCM) of Voice Frequencies", November 1988. 1256 [G.711-AP1] 1257 ITU-T G.711 Appendix 1, , "Recommendation G.711 1258 Appendix 1: A high quality low-complexity algorithm for 1259 packet loss concealment with G.711", September 1999. 1261 [G.711-A1] 1262 ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711 1263 Amendment 1 - Amendment 1: New Annex A on Lossless 1264 Encoding of PCM Frames", September 2009. 1266 12.2. Informative References 1268 [G.729] ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of 1269 speech at 8 kbit/s using conjugate-structure algebraic- 1270 code-excited linear prediction (CS-ACELP)", January 2007. 1272 [G.722] ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio- 1273 coding within 64 kbit/s", November 1988. 1275 [ICASSP] N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M. 1276 A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H. 1277 Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0 1278 - Lossless Compression of G.711 Pulse Code Modulation, 1279 International Conference on Acoustics Speech and Signal 1280 Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9", 1281 March 2010. 1283 [IEEE] C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, , 1284 and G.M. Masson, "Spot Me if You Can: Uncovering Spoken 1285 Phrases in Encrypted VoIP Conversations, IEEE Symposium on 1286 Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 1287 2008. 1289 Authors' Addresses 1291 Michael A. Ramalho (editor) 1292 Cisco Systems, Inc. 1293 6310 Watercrest Way Unit 203 1294 Lakewood Ranch, FL 34202 1295 USA 1297 Phone: +1 919 476 2038 1298 Email: mramalho@cisco.com 1300 Paul E. Jones 1301 Cisco Systems, Inc. 1302 7025 Kit Creek Rd. 1303 Research Triangle Park, NC 27709 1304 USA 1306 Phone: +1 919 476 2048 1307 Email: paulej@packetizer.com 1309 Noboru Harada 1310 NTT Communications Science Labs. 1311 3-1 Morinosato-Wakamiya 1312 Atsugi, Kanagawa 243-0198 1313 JAPAN 1315 Phone: +81 46 240 3676 1316 Email: harada.noboru@lab.ntt.co.jp 1318 Muthu Arul Mozhi Perumal 1319 Ericsson 1320 Ferns Icon 1321 Doddanekundi, Mahadevapura 1322 Bangalore, Karnataka 560037 1323 India 1325 Phone: +91 9449288768 1326 Email: muthu.arul@gmail.com 1327 Lei Miao 1328 Huawei Technologies Co. Ltd 1329 Q22-2-A15R, Enviroment Protection Park 1330 No. 156 Beiqing Road 1331 HaiDian District 1332 Beijing 100095 1333 China 1335 Phone: +86 1059728300 1336 Email: lei.miao@huawei.com