idnits 2.17.1 draft-ietf-payload-g7110-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 11, 2015) is 3266 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1' Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Ramalho, Ed. 3 Internet-Draft P. Jones 4 Intended status: Standards Track Cisco Systems 5 Expires: November 12, 2015 N. Harada 6 NTT 7 M. Perumal 8 Ericsson 9 L. Miao 10 Huawei Technologies 11 May 11, 2015 13 RTP Payload Format for G.711.0 14 draft-ietf-payload-g7110-06 16 Abstract 18 This document specifies the Real-Time Transport Protocol (RTP) 19 payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0 20 defines a lossless and stateless compression for G.711 packet 21 payloads typically used in IP networks. This document also defines a 22 storage mode format for G.711.0 and a media type registration for the 23 G.711.0 RTP payload format. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 12, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 61 3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 3 62 3.1. General Information and Use of the ITU-T G.711.0 Codec . 3 63 3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 4 64 3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 7 65 3.3.1. Multiple G.711.0 Output Frames per RTP Payload 66 Considerations . . . . . . . . . . . . . . . . . . . 8 67 4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 9 68 4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 9 69 4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 10 70 4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 11 71 4.2.2. G.711.0 RTP Payload Definition . . . . . . . . . . . 12 72 4.2.2.1. G.711.0 RTP Payload Encoding Process . . . . . . 13 73 4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 14 74 4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 16 75 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 18 76 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 18 77 5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 21 78 5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 21 79 5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 22 80 5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 22 81 5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 22 82 6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 23 83 6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 23 84 6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 23 85 6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 24 86 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 87 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 88 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 89 10. Security Considerations . . . . . . . . . . . . . . . . . . . 26 90 11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 28 91 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 92 12.1. Normative References . . . . . . . . . . . . . . . . . . 28 93 12.2. Informative References . . . . . . . . . . . . . . . . . 29 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 96 1. Introduction 98 The International Telecommunication Union (ITU-T) Recommendation 99 G.711.0 [G.711.0] specifies a stateless and lossless compression for 100 G.711 packet payloads typically used in Voice over IP (VoIP) 101 networks. This document specifies the Real-Time Transport Protocol 102 (RTP) RFC 3550 [RFC3550] payload format and storage modes for this 103 compression. 105 2. Requirements Language 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 109 document are to be interpreted as described in RFC 2119 [RFC2119]. 111 3. G.711.0 Codec Background 113 ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless 114 compression mechanism for ITU-T Recommendation G.711 [G.711] and thus 115 is not a "codec" in the sense of "lossy" codecs typically carried by 116 RTP. When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as 117 if it were a codec, with the understanding that ITU-T Rec. G.711.0 118 losslessly encoded the underlying (lossy) G.711 pulse code modulation 119 (PCM) sample representation of an audio signal. For this reason 120 ITU-T Rec. G.711.0 will be interchangeably referred to in this 121 document as a "lossless data compression algorithm" or a "codec", 122 depending on context. Within this document, individual G.711 PCM 123 samples will be referred to as "G.711 symbols" or just "symbols" for 124 brevity. 126 This section describes the ITU-T Recommendation G.711 [G.711] codec, 127 its properties, typical uses cases and its key design properties. 129 3.1. General Information and Use of the ITU-T G.711.0 Codec 131 ITU-T Recommendation G.711 is the benchmark standard for narrowband 132 telephony. It has been successful for many decades because of its 133 proven voice quality, ubiquity and utility. A new ITU-T 134 recommendation, G.711.0, has been established for defining a 135 stateless and lossless compression for G.711 packet payloads 136 typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as 137 ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is 138 effectively a pointer ITU-T Rec. G.711.0. Henceforth in this 139 document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0" 140 and ITU-T Rec. G.711 simply as "G.711". 142 G.711.0 may be employed end-to-end; in which case the RTP payload 143 format specification and use is nearly identical to the G.711 RTP 144 specification found in RFC 3551 [RFC3551]. The only significant 145 difference for G.711.0 is the required use of a dynamic payload type 146 (the static PT of 0 or 8 is presently almost always used with G.711 147 even though dynamic assignment of other payload types is allowed) and 148 the recommendation not to use Voice Activity Detection (see 149 Section 4.1). 151 G.711.0, being both lossless and stateless, may also be employed as a 152 lossless compression mechanism for G.711 payloads anywhere between 153 end systems which have negotiated use of G.711. Because the only 154 significance between the G.711 RTP payload format header and the 155 G.711.0 payload format header defined in this document is the payload 156 type, a G.711 RTP packet can be losslessly converted to a G.711.0 RTP 157 packet simply by compressing the G.711 payload (thus creating a 158 G.711.0 payload), changing the payload type to the dynamic value 159 desired and copying all the remaining G.711 RTP header fields into 160 the corresponding G.711.0 RTP header. In a similar manner, the 161 corresponding decompression of the G.711.0 RTP packet thus created 162 back to the original source G.711 RTP packet can be accomplished by 163 losslessly decompressing the G.711.0 payload back to the original 164 source G.711 payload, changing the payload type back to the payload 165 type of the original G.711 RTP packet and copying all the remaining 166 G.711.0 RTP header fields into the corresponding G.711 RTP header. 167 As a packet produced by the compression and decompression as 168 described above is indistinguishable in every detail to the source 169 G.711 packet, such compression can be made invisible to the end 170 systems. Specification of how systems on the path between the end 171 systems discover each other and negotiate the use of G.711.0 172 compression as described in this paragraph is outside the scope of 173 this document. 175 It is special to note that G.711.0, being both lossless and 176 stateless, can be employed multiple times (e.g., on multiple, 177 individual hops or series of hops) of a given flow with no 178 degradation of quality relative to end-to-end G.711. Stated another 179 way, multiple "lossless transcodes" from/to G.711.0/G.711 do not 180 affect voice quality as typically occurs with lossy transcodes to/ 181 from dissimilar codecs. 183 Lastly, it is expected that G.711.0 will be used as an archival 184 format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode 185 Format is also included in this document. 187 3.2. Key Properties of G.711.0 Design 189 The fundamental design of G.711.0 resulted from the desire to 190 losslessly encode and compress frames of G.711 symbols independent of 191 what types of signals those G.711 frames contained. The primary 192 G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals 193 (such as speech and music). 195 G.711.0 attributes are below: 197 A1 Compression for zero-mean acoustic signals: G.711.0 was designed 198 as its primary use case for the compression of G.711 payloads 199 that contained "speech" or other zero-mean acoustic signals. 200 G.711.0 obtains greater than 50% average compression in service 201 provider environments [ICASSP]. 203 A2 Lossless for any G.711 payload: G.711.0 was designed to be 204 lossless for any valid G.711 payload - even if the payload 205 consisted of apparently random G.711 symbols (e.g., a modem or 206 FAX payload). G.711.0 could be used for "aggregate 64 kbps 207 G.711 channels" carried over IP without explicit concern if a 208 subset of these channels happened to be carrying something 209 other than voice or general audio. To the extent that a 210 particular channel carried something other than voice or 211 general audio, G.711.0 ensured that it was carried losslessly, 212 if not significantly compressed. 214 A3 Stateless: Compression of a frame of G.711 symbols was only to be 215 dependent on that frame and not on any prior frame. Although 216 greater compression is usually available by observing a longer 217 history of past G.711 symbols, it was decided that the 218 compression design would be stateless to completely eliminate 219 error propagation common in many lossy codec designs (e.g., 220 ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]). That is, 221 the decoding process need not be concerned about lost prior 222 packets because the decompression of a given G.711.0 frame is 223 not dependent on potentially lost prior G.711.0 frames. Owing 224 to this stateless property, the frames input to the G.711.0 225 encoder may be changed "on-the-fly" (a 5 ms encoding could be 226 followed by a 20 ms encoding). 228 A4 Self-describing: This property is defined as the ability to 229 determine how many source G.711 samples are contained within 230 the G.711.0 frame solely by information contained within the 231 G.711.0 frame. Generally, the number of source G.711 symbols 232 can be determined by decoding the initial octets of the 233 compressed G.711.0 frame (these octets are called "prefix 234 codes" in the standard). A G.711.0 decoder need not know how 235 many symbols are contained in the original G.711 frame (e.g., 236 parameter ptime in Session Description Protocol, SDP, 237 [RFC4566]), as it is able to decompress the G.711.0 frame 238 presented to it without signaling knowledge. 240 A5 Accommodate G.711 payload sizes typically used in IP: G.711 input 241 frames of length typically found in VoIP applications represent 242 SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms or 40 ms. Since 243 the dominant sampling frequency for G.711 is 8000 samples per 244 second, G.711.0 was designed to compress G.711 input frames of 245 40, 80, 160, 240 or 320 samples. 247 A6 Bounded expansion: Since attribute A2 above requires G.711.0 to 248 be lossless for any payload (which could consist of any 249 combination of octets with each octet spanning the entire space 250 of 2^8 values), by definition there exists at least one 251 potential G.711 payload which must be "uncompressible". Since 252 the quantum of compression is an octet, the minimum expansion 253 of such an uncompressible payload was designed to be the 254 minimum possible of one octet. Thus G.711.0 "compressed" 255 frames can be of length one octet to X+1 octets, where X is the 256 size of the input G.711 frame in octets. G.711.0 can therefore 257 be viewed as a Variable Bit Rate (VBR) encoding in which the 258 size of the G.711.0 output frame is a function of the G.711 259 symbols input to it. 261 A7 Algorithmic delay: G.711.0 was designed to have the algorithmic 262 delay equal to the time represented by the number of samples in 263 the G.711 input frame (i.e., no "look-ahead"). 265 A8 Low Complexity: Less than 1.0 Weighted Million Operations Per 266 Second (WMOPS) average and low memory footprint (~5k octets 267 RAM, ~5.7k octets ROM and ~3.6 basic operations) [ICASSP] 268 [G.711.0]. 270 A9 Both A-law and mu-law supported: G.711 has two operating laws, 271 A-law and mu-law. These two laws are also known as PCMA and 272 PCMU in RTP applications RFC 3551 [RFC3551]. 274 These attributes generally make it trivial to compress a G.711 input 275 frame consisting of 40, 80, 160, 240 or 320 samples. After the input 276 frame is presented to a G.711.0 encoder, a G.711.0 "self-describing" 277 output frame is produced. The number of samples contained within 278 this frame is easily determined at the G.711.0 decoder by virtue of 279 attribute A4. The G.711.0 decoder can decode the G.711.0 frame back 280 to a G.711 frame by using only data within the G.711.0 frame. 282 Lastly we note that losing a G.711.0 encoded packet is identical in 283 effect of losing a G.711 packet (when using RTP); this is because a 284 G.711.0 payload, like the corresponding G.711 payload, is stateless. 285 Thus, it is anticipated that existing G.711 PLC mechanisms will be 286 employed when a G.711.0 packet is lost and an identical MOS 287 degradation relative to G.711 loss will be achieved. 289 3.3. G.711 Input Frames to G.711.0 Output Frames 291 G.711.0 is a lossless and stateless compression of G.711 frames. The 292 following figure depicts this where "A" is the process of G.711.0 293 encoding and "B" is the process of G.711.0 decoding. 295 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame 297 |--------------------------| A |------------------------------| 298 | G.711 Input Frame |----->| G.711.0 Output Frame | 299 | of X Octets | | containing 1 to X+1 Octets | 300 | (where X MUST be 40, 80, | | (precise value dependent on | 301 | 160, 240 or 320 octets) |<-----| G.711.0 ability to compress) | 302 |__________________________| B |______________________________| 304 Figure 1 306 Note that the mapping is 1:1 (lossless) in both directions, subject 307 to two constraints. The first constraint is that the input frame 308 provided to the G.711.0 encoder (process "A") has a specific number 309 of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240 310 or 320 octets). The second constraint is that the companding law 311 used to create the G.711 input frame (A-law or mu-law) must be known, 312 consistent with attribute A9. 314 Subject to these two constraints, the input G.711 frame is processed 315 by the G.711.0 encoder ("process A") and produces a "self-describing" 316 G.711.0 output frame, consistent with attribute A4. Depending on the 317 source G.711 symbols, the G.711.0 output frame can contain anywhere 318 from 1 to X+1 octets, where X is the number of input G.711 symbols. 319 Compression results for virtually every zero-mean acoustic signal 320 encoded by G.711.0. 322 Since the G.711.0 output frame is "self-describing", a G.711.0 323 decoder (process "B") can losslessly reproduce the original G.711 324 input frame with only the knowledge of which companding law was used 325 (A-law or mu-law). The first octet of a G.711.0 frame is called the 326 "Prefix Code" octet; the information within this octet conveys how 327 many G.711 symbols the decoder is to create from a given G.711.0 328 input frame (i.e., 0, 40, 80, 160, 240 or 320). The Prefix Code 329 value of 0x00 is used to denote zero G.711 source symbols, which 330 allows the use of 0x00 as a payload padding octet (to be described 331 later in Section 3.3.1). 333 Since G.711.0 was designed with typical G.711 payload lengths as a 334 design constraint (attribute A5), this lossless encoding can be 335 performed only with knowledge of the companding law being used. This 336 information is anticipated to be signaled in SDP and will be 337 described later in this document. 339 If the original inputs were known to be from a zero-mean acoustic 340 signal coded by G.711, an intelligent G.711.0 encoder could infer the 341 G.711 companding law in use (via G.711 input signal amplitude 342 histogram statistics). Likewise, an intelligent G.711.0 decoder 343 producing G.711 from the G.711.0 frames could also infer which 344 encoding law in use. Thus G.711.0 could be designed for use in 345 applications that have limited stream signaling between the G.711 346 endpoints (i.e., they only know "G.711 at 8k sampling is being used", 347 but nothing more). Such usage is not further described in this 348 document. Additionally, if the original inputs were known to come 349 from zero-mean acoustic signals, an intelligent G.711.0 encoder could 350 tell if the G.711.0 payload had been encrypted - as the symbols would 351 not have the distribution expected in either companding law and would 352 appear random. Such determination is also not further discussed in 353 this document. 355 It is easily seen that this process is 1:1 and that G.711.0 based 356 lossless compression can be employed multiple times, as the original 357 G.711 input symbols are always reproduced with 100% fidelity. 359 3.3.1. Multiple G.711.0 Output Frames per RTP Payload Considerations 361 As a general rule, G.711.0 frames containing more source G.711 362 symbols (from a given channel) will typically result in higher 363 compression, but there are exceptions to this rule. A G.711.0 364 encoder may choose to encode 20 ms of input G.711 symbols as: 1) a 365 single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3) 366 any other combination of 5 ms or 10 ms G.711.0 frames - depending on 367 which encoding resulted in fewer bits. As an example, an intelligent 368 encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0 369 frames if the first 10 ms was "silence" and two G.711.0 frames took 370 fewer bits than any other possible encoding combination of G.711.0 371 frame sizes. 373 During the process of G.711.0 standardization it was recognized that 374 although it is sometimes advantageous to encode integer multiples of 375 40 G.711 symbols in whatever input symbol format resulted in the most 376 compression (as per above), the simplest choice is to encode the 377 entire ptime's worth of input G.711 symbols into one G.711.0 frame 378 (if the ptime supported it). This is especially so since the larger 379 number of source G.711 symbols typically resulted in the highest 380 compression anyway and there is added complexity in searching for 381 other possibilities (involving more G.711.0 frames) which were 382 unlikely to produce a more bit efficient result. 384 The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of 385 multiple G.711.0 input frames in that the decoder was defined to 386 decode what it refers to as an incoming "bit stream". For this 387 specification, the bit stream is the G.711.0 RTP payload itself. 388 Thus, the decoder will take the G.711.0 RTP payload and will produce 389 an output frame containing the original G.711 symbols independent of 390 how many G.711.0 frames were present in it. Additionally, any number 391 of 0x00 padding octets placed between the G.711.0 frames will be 392 silently (and safely) ignored by the G.711.0 decoding process 393 Section 4.2.3). 395 To recap, a G.711.0 encoder may choose to encode incoming G.711 396 symbols into one or more than one G.711.0 frames and put the 397 resultant frame(s) into the G.711.0 RTP payload. Zero or more 0x00 398 padding octets may also be included in the G.711.0 RTP payload. The 399 G.711.0 decoder, being insensitive to the number of G.711.0 encoded 400 frames that are contained within it, will decode the G.711.0 RTP 401 payload into the source G.711 symbols. Although examples of single 402 or multiple G.711 frame cases will be illustrated in Section 4.2, the 403 multiple G.711.0 frame cases MUST be supported and there is no need 404 for negotiation (SDP or otherwise) required for it. 406 4. RTP Header and Payload 408 In this section we describe the precise format for G.711.0 frames 409 carried via RTP. We begin with RTP header description relative to 410 G.711, then provide two G.711.0 payload examples. 412 4.1. G.711.0 RTP Header 414 Relative to G.711 RTP headers, the utilization of G.711.0 does not 415 create any special requirements with respect to the contents of the 416 RTP packet header. The only significant difference is that the 417 payload type (PT) RTP header field MUST have a value corresponding to 418 the dynamic payload type assigned to the flow. This is in contrast 419 to most current uses of G.711 which typically use the static payload 420 assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though 421 the negotiation and use of dynamic payload types is allowed for 422 G.711. With the exception of rare PT exhaustion cases, the existing 423 G.711 PT values of 0 and 8 MUST NOT be used for G.711.0 (helping to 424 avoid possible payload confusion with G.711 payloads). 426 Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is 427 negotiated because G.711.0 obtains high compression during "VAD 428 silence intervals" and one of the advantages of G.711.0 over G.711 429 with VAD is the lack of any VAD-inducing artifacts in the received 430 signal. However, if VAD is employed, the Marker bit (M) MUST be set 431 in the first packet of a talkspurt (the first packet after a silence 432 period in which packets have not been transmitted contiguously as per 433 rules specified in [RFC3551] for G.711 payloads). This definition, 434 being consistent with the G.711 RTP VAD use, further allows lossless 435 transcoding between G.711 RTP packets and G.711.0 RTP packets as 436 described in Section 3.1. 438 With this introduction, the RTP packet header fields are defined as 439 follows: 441 V - As per [RFC3550] 443 P - As per [RFC3550] 445 X - As per [RFC3550] 447 CC - As per [RFC3550] 449 M - As per [RFC3550] and [RFC3551] 451 PT - The assignment of an RTP payload type for the format defined 452 in this memo is outside the scope of this document. The RTP 453 profiles in use currently mandate binding the payload type 454 dynamically for this payload format (see [RFC3550], [RFC4585]). 456 SN - As per [RFC3550] 458 timestamp - As per [RFC3550] 460 SSRC - As per [RFC3550] 462 CSRC - As per [RFC3550] 464 Where V (version bits), P (padding bit), X (extension bit), CC (CSRC 465 count), M (marker bit), PT (payload type), SN (sequence number), 466 timestamp, SSRC (synchronizing source) and CSRC (contributing 467 sources) are as defined in [RFC3550] and as typically used with 468 G.711. PT (payload type) is as defined in [RFC3551]. 470 4.2. G.711.0 RTP Payload 472 This section defines the G.711.0 RTP payload and illustrates it by 473 means of two examples. 475 The first example, in Section 4.2.1, depicts the case when it is 476 desired to carry only one G.711.0 frame in the RTP payload. This 477 case is expected to be the dominant use case and is shown separately 478 for the purposes of clarity. 480 The second example, in Section 4.2.2, depicts the general case when 481 it is desired to carry one or more G.711.0 frames in the RTP payload. 482 This is the actual definition of the G.711.0 RTP payload. 484 4.2.1. Single G.711.0 Frame per RTP Payload Example 486 This example depicts a single G.711.0 frame in the RTP payload. This 487 is expected to be the dominant RTP payload case for G.711.0, as the 488 G.711.0 encoding process supports the SDP packet times (ptime and 489 maxptime, see [RFC4566]) commonly used when G.711 is transported in 490 RTP. Additionally, as mentioned previously, larger G.711.0 frames 491 generally compress more effectively than a multiplicity of smaller 492 G.711.0 frames. 494 The following Figure illustrates the single G.711.0 frame per RTP 495 payload case. 497 Single G.711.0 Frame in RTP Payload Case 499 |-------------------|-------------------| 500 | One G.711.0 Frame | Zero or more 0x00 | 501 | | Padding Octets | 502 |___________________|___________________| 504 Figure 2 506 Encoding Process: A single G.711.0 frame is inserted into the RTP 507 payload. The amount of time represented by the G.711 symbols 508 compressed in the G.711.0 frame MUST correspond to the ptime signaled 509 for applications using SDP. Although generally not desired, padding 510 desired in the RTP payload after the G.711.0 frame MAY be created by 511 placing one or more 0x00 octets after the G.711.0 frame. Such 512 padding may be desired based on security considerations (see 513 Section 10). 515 Decoding Process: Passing the entire RTP payload to the G.711.0 516 decoder is sufficient for the G.711.0 decoder to create the source 517 G.711 symbols. Any padding inserted after the G.711.0 frame (i.e., 518 the 0x00 octets) present in the RTP payload is silently ignored by 519 the G.711.0 decoding process. The decoding process is fully 520 described in Section 4.2.3 below. 522 4.2.2. G.711.0 RTP Payload Definition 524 This section defines the G.711.0 RTP payload and illustrates the case 525 of when one or more G.711.0 frames are to be placed in the payload. 526 All G.711.0 RTP decoders MUST support the general case described in 527 this section (rationale presented previously in Section 3.3.1). 529 Note that since each G.711.0 frame is self-describing (see Attribute 530 A4 in Section 3.2), the individual G.711.0 frames in the RTP payload 531 need not represent the same duration of time (i.e., a 5 ms G.711.0 532 frame could be followed by a 20 ms G.711.0 frame). Owing to this, 533 the amount of time represented in the RTP payload MAY be any integer 534 multiple of 5 ms (as 5 ms is the smallest interval of time that can 535 be represented in a G.711.0 frame). 537 The following Figure illustrates the one or more G.711.0 frames per 538 RTP payload case where the number of G.711.0 frames placed in the RTP 539 payload is N. We note that when N is equal to 1 that this case is 540 identical to the previous example. 542 One or More G.711.0 Frames in RTP Payload Case 544 |----------|---------|----------|---------|----------------| 545 | First | Second | | Nth | Zero or more | 546 | G.711.0 | G.711.0 | ... | G.711.0 | 0x00 | 547 | Frame | Frame | | Frame | Padding Octets | 548 |__________|_________|__________|_________|________________| 550 Figure 3 552 We note here that when we have multiple G.711.0 frames that the 553 individual frames can be, and generally are, of different lengths. 554 The decoding process described in Section 4.2.3 is used to determine 555 the frame boundaries. 557 Encoding Process: One or more G.711.0 frames are placed in the RTP 558 payload simply by concatenating the G.711.0 frames together. The 559 amount of time represented by the G.711 symbols compressed in all the 560 G.711.0 frames in the RTP payload MUST correspond to the ptime 561 signaled for applications using SDP. Although not generally desired, 562 padding in the RTP payload SHOULD be placed after the last G.711.0 563 frame in the payload and MAY be created by placing one or more 0x00 564 octets after the last G.711.0 frame. Such padding may be desired 565 based on security considerations (see Section 10). Additional 566 encoding process details and considerations are specified later in 567 Section 4.2.2.1. 569 Decoding Process: As G.711.0 frames can be of varying length, the 570 payload decoding process described in Section 4.2.3 is used to 571 determine where the individual G.711.0 frame boundaries are. Any 572 padding octets inserted before or after any G.711.0 frame in the RTP 573 payload is silently (and safely) ignored by the G.711.0 decoding 574 process specified in Section 4.2.3. 576 4.2.2.1. G.711.0 RTP Payload Encoding Process 578 ITU-T G.711.0 supports five possible input frame lengths: 40, 80, 579 160, 240, and 320 samples per frame and the rationale for choosing 580 those lengths was given in the description of property A5 in 581 Section 3.2. Assuming 8000 sample per second, these lengths 582 correspond to input frames representing 5 ms, 10 ms, 20 ms, 30 ms or 583 40 ms. So while the standard assumed the input "bit stream" 584 consisted of G.711 symbols of some integer multiple of 5 ms in 585 length, it did not specify exactly what frame lengths to use as input 586 to the G.711.0 encoder itself. The intent of this section is to 587 provide some guidance for the selection. 589 Consider a typical IETF use case of 20 ms (160 octets) of G.711 input 590 samples represented in a G.711.0 payload and signaled by using the 591 SDP parameter ptime. As described in Section 3.3.1, the simplest way 592 to encode these 160 octets is to pass the entire 160 octet to the 593 G.711.0 encoder, resulting in precisely one G.711.0 compressed frame, 594 and put that singular frame into the G.711.0 RTP payload. However, 595 neither the ITU-T G.711.0 standard nor this IETF payload format 596 mandates this. In fact 20 ms of input G.711 symbols can be encoded 597 as 1, 2, 3 or 4 G.711.0 frames in any one of six combinations (i.e., 598 {20ms}, {10ms:10ms}, {10ms:5ms:5ms}, {5ms:10ms:5ms}, {5ms:5ms:10ms}, 599 {5ms:5ms:5ms:5ms}) and any of these combinations would decompress 600 into the same source 160 G.711 octets. As an aside, we note that the 601 first octet of any G.711.0 frame will be the prefix code octet and 602 information in this octet determines how many G.711 symbols are 603 represented in the G.711.0 frame. 605 Notwithstanding the above, we expect one of two encodings to be used 606 by implementers: the simplest possible (one 160 byte input to the 607 G.711.0 encoder which usually results in the highest compression) or 608 the combination of possible input frames to a G.711.0 encoder that 609 resulted in the highest compression for the payload. The explicit 610 mention of this issue in this IETF document was deemed important 611 because the ITU-T G.711.0 standard is silent on this issue and there 612 is a desire for this issue to be documented in a formal Standards 613 Developing Organization (SDO) document (i.e., here). 615 4.2.3. G.711.0 RTP Payload Decoding Process 617 The G.711.0 decoding process is a standard part of G.711.0 bit stream 618 decoding and is implemented in the ITU-T Rec. G.711.0 reference code. 619 The decoding process algorithm described in this section is a slight 620 enhancement of the ITU-T reference code to explicitly accommodate RTP 621 padding (as described above). 623 Before describing the decoding, we note here that the largest 624 possible G.711.0 frame is created whenever the largest number of 625 G.711 symbols is encoded (320 from Section 3.2, property A5) and 626 these 320 symbols are "uncompressible" by the G.711.0 encoder. In 627 this case (via property A6 in Section 3.2) the G.711.0 output frame 628 will be 321 octets long. We also note that the value 0x00 chosen for 629 the optional padding cannot be the first octet of a valid ITU-T Rec. 630 G.711.0 frame (see [G.711.0]). We also note that whenever more than 631 one G.711.0 frame is contained in the RTP payload, the decoding of 632 the individual G.711.0 frames will occur multiple times. 634 For the decoding algorithm below, let N be the number of octets in 635 the RTP payload (i.e., excluding any RTP padding, but including any 636 RTP payload padding), let P equal the number of RTP payload octets 637 processed by the G.711.0 decoding process, let K be the number of 638 G.711 symbols presently in the output buffer, let Q be the number of 639 octets contained in the G.711.0 frame being processed and let "!=" 640 represent not equal to. The keyword "STOP" is used below to indicate 641 the end of the processing of G.711.0 frames in the RTP payload. The 642 algorithm below assumes an output buffer for the decoded G.711 source 643 symbols of length sufficient to accommodate the expected number of 644 G.711 symbols and an input buffer of length 321 octets. 646 G.711.0 RTP Payload Decoding Heuristic: 648 H1 Initialization of counters: Initialize P, the number of processed 649 octets counter, to zero. Initialize K, the counter for how 650 many G.711 symbols are in the output buffer, to zero. 651 Initialize N to the number of octets in the RTP payload 652 (including any RTP payload padding). Go to H2. 654 H2 Read internal buffer: Read min{320+1, (N-P)-1} octets into the 655 internal buffer from the (P+1) octet of the RTP payload. We 656 note at this point, N-P octets have yet to be processed and 657 that 320+1 octets is the largest possible G.711.0 frame. Also 658 note that in the common case of zero-based array indexing of a 659 uint8 array of octets, that this operation will read octets 660 from index P through index [min{320+1, (N-P)}] from the RTP 661 payload. Go to H3. 663 H3 Analyze the first octet in the internal buffer: If this octet 664 0x00 (a padding octet) go to H4, otherwise go to H5 (process a 665 G.711.0 frame). 667 H4 Process padding octet (no G.711 symbols generated): Increment the 668 processed packets counter by one (set P = P + 1). If the 669 result of this increment results in P >= N then STOP (as all 670 RTP Payload octets have been processed), otherwise go to H2. 672 H5 Process an individual G.711.0 frame (produce G.711 samples in the 673 output frame): Pass the internal buffer to the G.711.0 decoder. 674 The G.711.0 decoder will read the first octet (called the 675 "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to 676 determine the number of source G.711 samples M are contained in 677 this G.711.0 frame. The G.711.0 decoder will produce exactly M 678 G.711 source symbols (M can only have values of 0, 40, 80, 160, 679 240 or 320). If K = 0, these M symbols will be the first in 680 the output buffer and are placed at the beginning of the output 681 buffer. If K != 0, concatenate these M symbols with the prior 682 symbols in the output buffer (there are K prior symbols in the 683 buffer). Set K = K + M (as there are now this many G.711 684 source symbols in the output buffer). The G.711.0 decoder will 685 have consumed some number of octets, Q, in the internal buffer 686 to produce the M G.711 symbols. Increment the number of 687 payload octet processed counter by this quantity (set P = P + 688 Q). If the result of this increment results in P >= N then 689 STOP (as all RTP Payload octets have been processed), otherwise 690 go to H2. 692 At this point, the output buffer will contain precisely K G.711 693 source symbols which should correspond to the ptime signaled if SDP 694 was used and the encoding process was without error. If ptime was 695 signaled via SDP and the number of G.711 symbols in the output buffer 696 is other than what corresponds to ptime, the packet MUST be discarded 697 unless other system design knowledge allows for otherwise (e.g., 698 occasional 5 ms clock slips causing one more or one less G.711.0 699 frame than nominal to be in the payload). Lastly, due to the buffer 700 reads in H2 being bounded (to 321 octets or less), N being bounded to 701 the size of the G.711.0 RTP payload, and M being bounded to the 702 number of source G.711 symbols, there is no buffer overrun risk. 704 We also note, as an aside, that the algorithm above (and the ITU-T 705 G.711.0 reference code) accommodates padding octets (0x00) placed 706 anywhere between G.711.0 frames in the RTP payload as well as prior 707 to or after any or all G.711.0 frames. The ITU-T G.711.0 reference 708 code does not have Step H3 and H4 as separate steps (i.e., Step H5 709 immediately follows H2) at the added computational cost of some 710 additional buffer passing to/from the G.711.0 frame decoder 711 functions. That is the G.711.0 decoder in the reference code 712 "silently ignores" 0x00 padding octets at the beginning of what it 713 believes to be a G.711.0 encoded frame boundary. Thus Step H3 and 714 Step H4 above are an optimization over the reference code shown for 715 clarity. 717 If the decoder is at a playout endpoint location, this G.711 buffer 718 SHOULD be used in the same manner as a received G.711 RTP payload 719 would have been used (passed to a playout buffer, to a PLC 720 implementation, etc.). 722 We explicitly note that a framing error condition will result 723 whenever the buffer sent to a G.711.0 decoder does not begin with a 724 valid first G.711.0 frame octet (i.e., a valid G.711.0 prefix code or 725 a 0x00 padding octet). The expected result is that the decoder will 726 not produce the desired/correct G.711 source symbols. However, as 727 already noted, the output returned by the G.711.0 decoder will be 728 bounded (to less than 321 octets per G.711.0 decode request) and if 729 the number of the (presumed) G.711 symbols produced is known to be in 730 error, the decoded output MUST be discarded. 732 4.2.4. G.711.0 RTP Payload for Multiple Channels 734 In this section we describe the use of multiple "channels" of G.711 735 data encoded by G.711.0 compression. 737 The dominant use of G.711 in RTP transport has been for single 738 channel use cases. For this case, the above G.711.0 encoding and 739 decoding process is used. However, the multiple channel case for 740 G.711.0 (a frame-based compression) is different from G.711 (a 741 sample-based encoding) and is described separately here. 743 RFC 3551 [RFC3551] provides guidelines for encoding audio channels 744 (Section 4) and for the ordering of the channels within the RTP 745 payload (Section 4.1). The ordering guidelines in RFC 3551, 746 Section 4.1 SHOULD be used unless an application-specific channel 747 ordering is more appropriate. 749 An implicit assumption in RFC 3551 is that all the channel data 750 multiplexed into a RTP payload MUST represent the same physical time 751 span. The case for G.711.0 is no different; the underlying G.711 752 data for all channels in a G.711.0 RTP payload MUST span the same 753 interval in time (e.g., the same "ptime" for a SDP-specified codec 754 negotiation). 756 RFC 3551 provides guidelines for sample-based encodings such as G.711 757 in Section 4.2. This guidance is tantamount to interleaving the 758 individual samples in that they SHOULD be packed in consecutive 759 octets. 761 RFC 3551 provides guidelines for frame-based encodings in which the 762 frames are interleaved. However, this guidance stems from the 763 assumption that "the frame size for frame-oriented codecs is a 764 given". However, this assumption is not valid for G.711.0 in that 765 individual consecutive G.711.0 frames (as per Section 4.2.2) can: 767 1) represent different time spans (e.g., two 5 ms G.711.0 frames 768 in lieu of one 10 ms G.711.0 frame), and 770 2) be of different lengths in octets (and typically are). 772 Therefore a different, but also simple, concatenation-based approach 773 is specified in this RFC. 775 For the multiple channel G.711.0 case, each G.711 channel is 776 independently encoded into one or more G.711.0 frames defined here as 777 a "G.711.0 channel superframe". Each one of these superframes is 778 identical to the multiple G.711.0 frame case illustrated in Figure 3 779 of Section 4.2.2 in which each superframe can have one or more 780 individual G.711.0 frames within it. Then each G.711.0 channel 781 superframe is concatenated - in channel order - into a G.711.0 RTP 782 payload. Then, if optional G.711.0 padding octets (0x00) are 783 desired, it is RECOMMENDED that these octets are placed after the 784 last G.711.0 channel superframe. As per above, such padding may be 785 desired based on security considerations (see Section 10). This is 786 depicted in the following Figure 4 below. 788 Multiple G.711.0 Channel Superframes in RTP Payload 790 |----------|---------|----------|---------|---------| 791 | First | Second | | Nth | Zero | 792 | G.711.0 | G.711.0 | ... | G.711.0 | or more | 793 | Channel | Channel | | Channel | 0x00 | 794 | Super- | Super- | | Super | Padding | 795 | Frame | Frame | | Frame | Octets | 796 |__________|_________|__________|_________|_________| 798 Figure 4 800 We note that although the individual superframes can be of different 801 lengths in octets (and usually are), that the number of G.711 source 802 symbols represented - in compressed form - in each channel superframe 803 is identical (since all the channels represent the identically same 804 time interval). 806 The G.711.0 decoder at the receiving end simply decodes the entire 807 G.711.0 (multiple channel) payload into individual G.711 symbols. If 808 M such G.711 symbols result and there were N channels, then the first 809 M/N G.711 samples would be from the first channel, the second M/N 810 G.711 samples would be from the second channel, and so on until the 811 Nth set of G.711 samples are found. Similarly, if the number of 812 channels was not known, but the payload "ptime" was known, one could 813 infer (knowing the sampling rate) how many G.711 symbols each channel 814 contained; then with this knowledge determine how many channels of 815 data were contained in the payload. When SDP is used, the number of 816 channels is known because the optional parameter is a MUST when there 817 is more than one channel negotiated (see Section 5.1). Additionally, 818 when SDP is used the parameter ptime is a RECOMMENDED optional 819 parameter. We note that if both parameters channels and ptime are 820 known that one could provide a check for the other and the converse. 821 Whichever algorithm is used to determine the number of channels, if 822 the length of the source G.711 symbols in the payload (M) is not an 823 integer multiple of the number of channels (N), then the packet 824 SHOULD be discarded. 826 Lastly we note that although any padding for the multiple channel 827 G.711.0 payload is RECOMMENDED to be placed at the end of the 828 payload, the G.711.0 decoding algorithm described in Section 4.2.3 829 will successfully decode the payload in Figure 4 if the 0x00 padding 830 octet is placed anywhere before or after any individual G.711.0 frame 831 in the RTP payload. The number of padding octets introduced at any 832 G.711.0 frame boundary therefore does not affect the number M of the 833 source G.711 symbols produced. Thus the decision for padding MAY be 834 made on a per-superframe basis. 836 5. Payload Format Parameters 838 This section defines the parameters that may be used to configure 839 optional features in the G.711.0 RTP transmission. 841 The parameters defined here are a part of the media subtype 842 registration for the G.711.0 codec. Mapping of the parameters into 843 Session Description Protocol (SDP) RFC 4566 [RFC4566] is also 844 provided for those applications that use SDP. 846 5.1. Media Type Registration 848 Type name: audio 850 Subtype name: G711-0 851 Required parameters: 853 clock rate: The RTP timestamp clock rate, which is equal to the 854 sampling rate. The typical rate used with G.711 encoding is 8000, 855 but other rates may be specified. The default rate is 8000. 857 complaw: This format specific parameter, specified on the "a=fmtp: 858 line", indicates the companding law (A-law or mu-law) employed. 859 This format specific parameter, as per RFC 4566 [RFC4566], is 860 given unchanged to the media tool using this format. The case- 861 insensitive values are "complaw=al" or "complaw=mu" are used for 862 A-law and mu-law, respectively. 864 Optional parameters: 866 channels: See RFC 4566 [RFC4566] for definition. Specifies how 867 many audio streams are represented in the G.711.0 payload and MUST 868 be present if the number of channels is greater than one. This 869 parameter defaults to 1 if not present (as per RFC 4566) and is 870 typically a non-zero small-valued positive integer. It is 871 expected that implementations that specify multiple channels will 872 also define a mechanism to map the channels appropriately within 873 their system design, otherwise the channel order specified in RFC 874 3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right, 875 center, ... ). Similar to the usual interpretation in RFC 3551 876 [RFC3551], the number of channels SHALL be a non-zero positive 877 integer. 879 maxptime: See RFC 4566 [RFC4566] for definition. 881 ptime: See RFC 4566 [RFC4566] for definition. The inclusion of 882 "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an 883 application specific reason not to include it (e.g., an 884 application that has a variable ptime on a packet-by-packet 885 basis). For constant ptime applications, it is considered good 886 form to include "ptime" in the SDP for session diagnostic 887 purposes. For the constant ptime multiple channel case described 888 in Section 4.2.2, the inclusion of "ptime" can provide a desirable 889 payload check. 891 Encoding considerations: 893 This media type is framed binary data (see Section 4.8 in RFC 6838 894 [RFC6838]) compressed as per ITU-T Rec. G.711.0. 896 Security considerations: 898 See Section 10. 900 Interoperability considerations: none 902 Published specification: 904 ITU-T Rec. G.711.0 and RFC XXXX. 906 [ RFC Editor: please replace XXXXX with a reference to this RFC ] 908 Applications that use this media type: 910 Although initially conceived for VoIP, the use of G.711.0, like 911 G.711 before it, may find use within audio and video streaming 912 and/or conferencing applications for the audio portion of those 913 applications. 915 Additional information: 917 The following applies to stored-file transfer methods: 919 Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law 920 encodings respectively, see Section 6). 922 File Extensions: None 924 Macintosh file type code: None 926 Object identifier or OIL: None 928 Person & email address to contact for further information: 930 Michael A. Ramalho or 932 Intended usage: COMMON 934 Restrictions on usage: 936 This media type depends on RTP framing, and hence is only defined 937 for transfer via RTP [RFC3550]. Transport within other framing 938 protocols is not defined at this time. 940 Author: Michael A. Ramalho 942 Change controller: 944 IETF Payload working group delegated from the IESG. 946 5.2. Mapping to SDP Parameters 948 The information carried in the media type specification has a 949 specific mapping to fields in the Session Description Protocol (SDP), 950 which is commonly used to describe a RTP session. When SDP is used 951 to specify sessions employing G.711.0, the mapping is as follows: 953 o The media type ("audio") goes in SDP "m=" as the media name. 955 o The media subtype ("G711-0") goes in SDP "a=rtpmap" as the 956 encoding name. 958 o The required parameter "rate" also goes in "a=rtpmap" as the clock 959 rate. 961 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 962 "a=maxptime" attributes, respectively. 964 o Remaining parameters go in the SDP "a=fmtp" attribute by copying 965 them directly from the media type string as a semicolon-separated 966 list of parameter=value pairs. 968 5.3. Offer/Answer Considerations 970 The following considerations apply when using the SDP offer/answer 971 RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute. 973 o If the offering endpoint specifies a value for the optional 974 channels parameter greater than one and the answering endpoint 975 both understands the parameter and cannot support that value 976 requested, the answer MUST contain the optional channels parameter 977 with the highest value it can support. 979 o If the offering endpoint specifies a value for the optional 980 channels parameter the answer MUST contain the optional channels 981 parameter unless the only value the answering endpoint can support 982 is one, in which case the answer MAY contain the optional channels 983 parameter with value of 1. 985 o If the offering endpoint specifies a value for the ptime parameter 986 that the answering endpoint cannot support, the answer MUST 987 contain the optional ptime parameter. 989 o If the offering endpoint specifies a value for the maxptime 990 parameter that the answering endpoint cannot support, the answer 991 MUST contain the optional maxptime parameter. 993 5.4. SDP Examples 995 The following examples illustrate how to signal G.711.0 via SDP. 997 5.4.1. SDP Example 1 999 m=audio RTP/AVP 98 1000 a=rtpmap:98 G711-0/8000 1001 a=fmtp:98 complaw=mu 1003 In the above example the dynamic payload type 98 is mapped to G.711.0 1004 via the "a=rtpmap" parameter. The mandatory "complaw" is on the 1005 "a=fmtp" parameter line. Note that neither optional parameters 1006 "ptime" nor "channels" is present; although it is generally good form 1007 to include "ptime" in the SDP if the session is a constant ptime 1008 session for diagnostic purposes. 1010 5.4.2. SDP Example 2 1012 The following example illustrates an offering endpoint requesting 2 1013 channels, but the answering endpoint can only support (or render) one 1014 channel. 1016 Offer: 1018 m=audio RTP/AVP 98 1019 a=rtpmap:98 G711-0/8000/2 1020 a=ptime:20 1021 a=fmtp:98 complaw=al 1023 Answer: 1025 m=audio RTP/AVP 98 1026 a=rtpmap: 98 G711-0/8000/1 1027 a=ptime: 20 1028 a=fmtp:98 complaw=al 1030 In this example the offer had an optional channels parameter. The 1031 answer must have the optional channels parameter also unless the 1032 value in the answer is one. Shown here is when the answer explicitly 1033 contains the channels parameter (it need not have and it would be 1034 interpreted as one channel). As mentioned previously, it is 1035 considered good form to include "ptime" in the SDP for session 1036 diagnostic purposes if the session is a constant ptime session. 1038 6. G.711.0 Storage Mode Conventions and Definition 1040 The G.711.0 storage mode definition in this section is similar to 1041 many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a 1042 concatenation of individual G.711.0 frames. 1044 We note that something must be stored for any G.711.0 frames that are 1045 not received at the receiving endpoint, no matter what the cause. In 1046 this section we describe two mechanisms, a "G.711.0 PLC Frame" and a 1047 "G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure 1048 Frames are described prior to the G.711.0 storage mode definition for 1049 clarity. 1051 6.1. G.711.0 PLC Frame 1053 When G.711 RTP payloads not received by a rendering endpoint a Packet 1054 Loss Concealment (PLC) mechanism is typically employed to "fill in" 1055 the missing G.711 symbols with something that is auditorially 1056 pleasing and thus the loss may be not noticed by a listener. Such a 1057 PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1 1058 [G.711-AP1]. 1060 An natural extension when creating G.711.0 frames for storage 1061 environments is to employ such a PLC mechanism to create G.711 1062 symbols for the span of time in which G.711.0 payloads were not 1063 received - and then to compress the resulting "G.711 PLC symbols" via 1064 G.711.0 compression. The G.711.0 frame(s) created by such a process 1065 are called "G.711.0 PLC Frames". 1067 Since PLC mechanisms are designed to render missing audio data with 1068 the best fidelity and intelligibility, G.711.0 frames created via 1069 such processing is likely best for most recording situations (such as 1070 voicemail storage) unless there is a requirement not to fabricate 1071 (audio) data not actually received. 1073 After such PLC G.711 symbols have been generated and then encoded by 1074 a G.711.0 encoder, the resulting frames may be stored in G.711.0 1075 frame format. As a result, there is nothing to specify here - the 1076 G.711.0 PLC Frames are stored as if they were received by the 1077 receiving endpoint. In other words, PLC-generated G.711.0 frames 1078 appear as "normal" or "ordinary" G.711.0 frames in the storage mode 1079 file. 1081 6.2. G.711.0 Erasure Frame 1083 "Erasure Frames", or equivalently "Null Frames", have been designed 1084 for many frame-based codecs since G.711 was standardized. These 1085 null/erasure frames explicitly represent data from incoming audio 1086 that were either not received by the receiving system or represent 1087 data that a transmitting system decided not to send. Transmitting 1088 systems may choose not to send data for a variety of reasons (e.g., 1089 not enough wireless link capacity in radio-based systems) and can 1090 choose to send a "null frame" in lieu of the actual audio. It is 1091 also envisioned that erasure frames would be used in storage mode 1092 applications for specific archival purposes where there is a 1093 requirement not to fabricate audio data that was not actually 1094 received. 1096 Thus, a G.711.0 erasure frame is a representation of the amount of 1097 time in G.711.0 frames that were not received or not encoded by the 1098 transmitting system. 1100 Prior to defining a G.711.0 erasure frame it is beneficial to note 1101 what many G.711 RTP systems send when the endpoint is "muted". When 1102 muted, many of these systems will send an entire G.711 payload of 1103 either 0+ or 0- (i.e., one of the two levels closest to "analog zero" 1104 in either G.711 companding law). Next we note that a desirable 1105 property for a G.711.0 erasure frame is for "non G.711.0 Erasure 1106 Frame aware" endpoints to be able to playback a G.711.0 erasure frame 1107 with the existing G.711.0 ITU-T reference code. 1109 A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the 1110 corresponding G.711 sample values are either the value 0++ or the 1111 value 0-- for the entirety of the G.711.0 frame. The levels of 0++ 1112 and 0-- are defined to be the two levels above or below analog zero, 1113 respectively. An entire frame of value 0++ or 0-- is expected to be 1114 extraordinarily rare when the frame was in fact generated by a 1115 natural signal, as analog inputs such as speech and music are zero- 1116 mean and are typically acoustically coupled to digital sampling 1117 systems. Note that the playback of a G.711.0 frame characterized as 1118 an erasure frame is auditorially equivalent to a muted signal (a very 1119 low value constant). 1121 These G.711.0 erasure frames can be reasonably characterized as null 1122 or erasure frames while meeting the desired playback goal of being 1123 decoded by the G.711.0 ITU-T reference code. Thus, similarly to 1124 G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or 1125 "ordinary" G.711.0 frames in the storage mode format. 1127 6.3. G.711.0 Storage Mode Definition 1129 The storage format is used for storing G.711.0 encoded frames. The 1130 format for the G.711.0 storage mode file defined by this RFC is shown 1131 below. 1133 G.711.0 Storage Mode Format 1135 |---------------------------|----------|--------------| 1136 | Magic Number | | | 1137 | | Version | Concatenated | 1138 | "#!G7110A\n" (for A-law) | Octet | G.711.0 | 1139 | or | | Frames | 1140 | "#!G7110M\n" (for mu-law) | "0x00" | | 1141 |___________________________|__________|______________| 1143 Figure 5 1145 The storage mode file consists of a magic number and a version octet 1146 followed by the individual G.711.0 frames concatenated together. 1148 The magic number for G.711.0 A-law corresponds to the ASCII character 1149 string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41 1150 0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to 1151 the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37 1152 0x31 0x31 0x4E 0x4D 0x0A". 1154 The version number octet allows for the future specification of other 1155 G.711.0 storage mode formats. The specification of other storage 1156 mode formats may be desirable as G.711.0 frames are of variable 1157 length and a future format may include an indexing methodology that 1158 would enable playout far into a long G.711.0 recording without the 1159 necessity of decoding all the G.711.0 frames since the beginning of 1160 the recording. Other future format specification may include support 1161 for multiple channels, metadata and the like. For these reasons it 1162 was determined that a versioning strategy was desirable for the 1163 G.711.0 storage mode definition specified by this RFC. This RFC only 1164 specifies Version 0 and thus the value of "0x00" MUST be used for the 1165 storage mode defined by this RFC. 1167 The G.711.0 codec data frames, including any necessary erasure or PLC 1168 frames, are stored in consecutive order concatenated together as 1169 shown in Section 4.2.2. As the Version 0 storage mode only supports 1170 a single channel, the RTP payload format supporting multiple channels 1171 defined in Section 4.2.4 is not supported in this storage mode 1172 definition. 1174 To decode the individual G.711.0 frames, the algorithm presented in 1175 Section 4.2.2 may be used to decode the individual G.711.0 frames. 1176 If the version octet is determined not to be zero, the remainder of 1177 the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T 1178 G.711.0 reference decoder can only decode concatenated G.711.0 frames 1179 and has not been designed to decode elements in yet to be specified 1180 future storage mode formats. 1182 7. Acknowledgements 1184 There have been many people contributing to G.711.0 in the course of 1185 its development. The people listed here deserve special mention: 1186 Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke 1187 Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick 1188 Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs, 1189 Yutaka Kamamoto, and Csaba Kos. The review and oversight by the IETF 1190 Payload Working Group chairs Ali Begen and Roni Even during the 1191 development of this RFC is appreciated. Additionally, the careful 1192 review by Richard Barnes and extensive review by David Black and the 1193 rest of the IESG is likewise very much appreciated. 1195 8. Contributors 1197 The authors thank everyone who have contributed to this document. 1198 The people listed here deserve special mention: Ali Begen, Roni Even, 1199 and Hadriel Kaplan. 1201 9. IANA Considerations 1203 One media type (audio/G711-0) has been defined and requires IANA 1204 registration in the media types registry. See Section 5.1 for 1205 details. 1207 10. Security Considerations 1209 RTP packets using the payload format defined in this specification 1210 are subject to the security considerations discussed in the RTP 1211 specification [RFC3550], and in any appropriate RTP profile (for 1212 example RFC 3551 [RFC3551] or [RFC4585]). This implies that 1213 confidentiality of the media streams is achieved by encryption; for 1214 example, through the application of SRTP [RFC3711]. Because the data 1215 compression used with this payload format is applied end-to-end, any 1216 encryption needs to be performed after compression. 1218 Note that the appropriate mechanism to ensure confidentiality and 1219 integrity of RTP packets and their payloads is very dependent on the 1220 application and on the transport and signaling protocols employed. 1221 Thus, although SRTP is given as an example above, other possible 1222 choices exist. 1224 Note that end-to-end security with either authentication, integrity 1225 or confidentiality protection will prevent a network element not 1226 within the security context from performing media-aware operations 1227 other than discarding complete packets. To allow any (media-aware) 1228 intermediate network element to perform its operations, it is 1229 required to be a trusted entity which is included in the security 1230 context establishment. 1232 G.711.0 has no known denial-of-service attacks due to decoding, as 1233 data posing as a desired G711.0 payload will be decoded into 1234 something (as per the decoding algorithm) with a finite amount of 1235 computation. This is due to the decompression algorithm having a 1236 finite worst-case processing path (no infinite computational loops 1237 are possible). We also note that the data read by the G.711.0 1238 decoder is controlled by the length of the individual encoded G.711.0 1239 frame(s) contained in the RTP payload. The decoding algorithm 1240 specified in Section 4.2.3 above ensures that the G.711.0 decoder 1241 will not read beyond the length of the internal buffer specified 1242 (which is in turn specified to be no greater than the largest 1243 possible G.711.0 frame of 321 octets). Therefore a G.711.0 payload 1244 does not carry "active content" that could impose malicious side- 1245 effects upon the receiver. 1247 G.711.0 is a variable bit rate (VBR) audio codec. There have been 1248 recent concerns with VBR speech codecs where a passive observer can 1249 identify phrases from a standard speech corpus by means of the 1250 lengths produced by the encoder even when the payload is encrypted 1251 [IEEE]. In this paper, it was determined that some code excited 1252 linear prediction (CELP) codecs would produce discrete packet lengths 1253 for some phonemes. And furthermore with the use of appropriately 1254 designed Hidden Markov Models (HMMs) that such a system could predict 1255 phrases with unexpected accuracy. One CELP codec studied, SPEEX, had 1256 the property that it produced 21 different packet lengths in its 1257 wideband mode and that these packet lengths probabilistically mapped 1258 to phonemes that a HMM system could be trained on. In this paper it 1259 was determined that a mitigation technique would be to pad the output 1260 of the encoder with random padding lengths to the effect: 1) that 1261 more discrete payload sizes would result, and 2) that the 1262 probabilistic mapping to phonemes would become less clear. As G.711 1263 is not a speech model based codec, neither is G.711.0. A G.711.0 1264 encoding, during talking periods, produces frames of varying frame 1265 lengths which are not likely to have a strong mapping to phonemes. 1266 Thus G.711.0 is not expected to have this same vulnerability. It 1267 should be noted that "silence" (only one value of G.711 in the entire 1268 G.711 input frame)" or "near silence" (only a few G.711 values) is 1269 easily detectable as G.711.0 frame lengths or one or a few octets. 1270 If one desires to mitigate for silence/non-silence detection, 1271 statistically variable padding should be added to G.711.0 frames that 1272 resulted in very small G.711.0 frames (less than about 20% of the 1273 symbols of the corresponding G.711 input frame). Methods of 1274 introducing padding in the G.711.0 payloads have been provided in the 1275 G.711.0 RTP payload definition in Section 4.2.2. 1277 11. Congestion Control 1279 The G.711 codec is a Constant Bit Rate (CBR) codec which does not 1280 have a means to regulate the bitrate. The G.711.0 lossless 1281 compression algorithm typically compresses the G.711 CBR stream into 1282 a lower bandwidth VBR stream. However, being lossless, it does not 1283 possess means of further reducing the bitrate beyond the 1284 G.711.0-based compression result. The G.711.0 RTP payloads can be 1285 made arbitrarily large by means of adding optional padding bytes 1286 (subject only to MTU limitations). 1288 Therefore, there are no explicit ways to regulate the bit-rate of the 1289 transmissions outlined in this RTP Payload format except by means of 1290 modulating the number of optional padding bytes in the RTP payload. 1292 12. References 1294 12.1. Normative References 1296 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1297 Requirement Levels", BCP 14, RFC 2119, March 1997. 1299 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1300 Description Protocol", RFC 4566, July 2006. 1302 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 1303 Specifications and Registration Procedures", BCP 13, RFC 1304 6838, January 2013. 1306 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1307 Jacobson, "RTP: A Transport Protocol for Real-Time 1308 Applications", STD 64, RFC 3550, July 2003. 1310 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1311 Video Conferences with Minimal Control", STD 65, RFC 3551, 1312 July 2003. 1314 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1315 "Extended RTP Profile for Real-time Transport Control 1316 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1317 2006. 1319 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1320 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1321 RFC 3711, March 2004. 1323 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1324 with Session Description Protocol (SDP)", RFC 3264, June 1325 2002. 1327 [G.711.0] ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless 1328 Compression of G.711 Pulse Code Modulation", September 1329 2009. 1331 [G.711] ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code 1332 Modulation (PCM) of Voice Frequencies", November 1988. 1334 [G.711-AP1] 1335 ITU-T G.711 Appendix 1, , "Recommendation G.711 1336 Appendix 1: A high quality low-complexity algorithm for 1337 packet loss concealment with G.711", September 1999. 1339 [G.711-A1] 1340 ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711 1341 Amendment 1 - Amendment 1: New Annex A on Lossless 1342 Encoding of PCM Frames", September 2009. 1344 12.2. Informative References 1346 [G.729] ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of 1347 speech at 8 kbit/s using conjugate-structure algebraic- 1348 code-excited linear prediction (CS-ACELP)", January 2007. 1350 [G.722] ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio- 1351 coding within 64 kbit/s", November 1988. 1353 [ICASSP] N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M. 1354 A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H. 1355 Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0 1356 - Lossless Compression of G.711 Pulse Code Modulation, 1357 International Conference on Acoustics Speech and Signal 1358 Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9", 1359 March 2010. 1361 [IEEE] C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, , 1362 and G.M. Masson, "Spot Me if You Can: Uncovering Spoken 1363 Phrases in Encrypted VoIP Conversations, IEEE Symposium on 1364 Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 1365 2008. 1367 Authors' Addresses 1369 Michael A. Ramalho (editor) 1370 Cisco Systems, Inc. 1371 6310 Watercrest Way Unit 203 1372 Lakewood Ranch, FL 34202 1373 USA 1375 Phone: +1 919 476 2038 1376 Email: mramalho@cisco.com 1378 Paul E. Jones 1379 Cisco Systems, Inc. 1380 7025 Kit Creek Rd. 1381 Research Triangle Park, NC 27709 1382 USA 1384 Phone: +1 919 476 2048 1385 Email: paulej@packetizer.com 1387 Noboru Harada 1388 NTT Communications Science Labs. 1389 3-1 Morinosato-Wakamiya 1390 Atsugi, Kanagawa 243-0198 1391 JAPAN 1393 Phone: +81 46 240 3676 1394 Email: harada.noboru@lab.ntt.co.jp 1396 Muthu Arul Mozhi Perumal 1397 Ericsson 1398 Ferns Icon 1399 Doddanekundi, Mahadevapura 1400 Bangalore, Karnataka 560037 1401 India 1403 Phone: +91 9449288768 1404 Email: muthu.arul@gmail.com 1405 Lei Miao 1406 Huawei Technologies Co. Ltd 1407 Q22-2-A15R, Enviroment Protection Park 1408 No. 156 Beiqing Road 1409 HaiDian District 1410 Beijing 100095 1411 China 1413 Phone: +86 1059728300 1414 Email: lei.miao@huawei.com