idnits 2.17.1 draft-ietf-payload-g7110-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 11, 2013) is 3782 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4855' is defined on line 1111, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 1158, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1' -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Ramalho, Ed. 3 Internet-Draft P. Jones 4 Intended status: Standards Track Cisco Systems 5 Expires: June 14, 2014 N. Harada 6 NTT 7 M. Perumal 8 Cisco Systems 9 L. Miao 10 Huawei Technologies 11 December 11, 2013 13 RTP Payload Format for G.711.0 14 draft-ietf-payload-g7110-01 16 Abstract 18 This document specifies the Real-Time Transport Protocol (RTP) 19 payload format for ITU-T Recommendation G.711.0. ITU-T Rec. G.711.0 20 defines a lossless and stateless compression for G.711 packet 21 payloads typically used in IP networks. This document also defines a 22 storage mode format for G.711.0 and a media type registration for the 23 G.711.0 RTP payload format. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on June 14, 2014. 42 Copyright Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 61 3. G.711.0 Codec Background . . . . . . . . . . . . . . . . . . 3 62 3.1. General Information and Use of the ITU-T G.711.0 Codec . 3 63 3.2. Key Properties of G.711.0 Design . . . . . . . . . . . . 4 64 3.3. G.711 Input Frames to G.711.0 Output Frames . . . . . . . 6 65 4. RTP Header and Payload . . . . . . . . . . . . . . . . . . . 8 66 4.1. G.711.0 RTP Header . . . . . . . . . . . . . . . . . . . 8 67 4.2. G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . . 9 68 4.2.1. Single G.711.0 Frame per RTP Payload Example . . . . 9 69 4.2.2. Multiple G.711.0 Frames per RTP Payload Example . . . 10 70 4.2.3. G.711.0 RTP Payload Decoding Process . . . . . . . . 12 71 4.2.4. G.711.0 RTP Payload for Multiple Channels . . . . . . 13 72 5. Payload Format Parameters . . . . . . . . . . . . . . . . . . 15 73 5.1. Media Type Registration . . . . . . . . . . . . . . . . . 16 74 5.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 17 75 5.3. Offer/Answer Considerations . . . . . . . . . . . . . . . 18 76 5.4. SDP Examples . . . . . . . . . . . . . . . . . . . . . . 18 77 5.4.1. SDP Example 1 . . . . . . . . . . . . . . . . . . . . 18 78 5.4.2. SDP Example 2 . . . . . . . . . . . . . . . . . . . . 19 79 6. G.711.0 Storage Mode Conventions and Definition . . . . . . . 19 80 6.1. G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . . 20 81 6.2. G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . . 20 82 6.3. G.711.0 Storage Mode Definition . . . . . . . . . . . . . 21 83 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 84 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 23 85 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 86 10. Security Considerations . . . . . . . . . . . . . . . . . . . 23 87 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 88 11.1. Normative References . . . . . . . . . . . . . . . . . . 24 89 11.2. Informative References . . . . . . . . . . . . . . . . . 25 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 92 1. Introduction 94 The International Telecommunication Union (ITU-T) Recommendation 95 G.711.0 [G.711.0] specifies a stateless and lossless compression for 96 G.711 packet payloads typically used in Voice over IP (VoIP) 97 networks. This document specifies the Real-Time Transport Protocol 98 (RTP) RFC 3550 [RFC3550] payload format and storage modes for this 99 compression. 101 2. Requirements Language 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in RFC 2119 [RFC2119]. 107 3. G.711.0 Codec Background 109 ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless 110 compression mechanism for ITU-T Recommendation G.711 [G.711] and thus 111 is not a "codec" in the sense of "lossy" codecs typically carried by 112 RTP. When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as 113 if it were a codec, with the understanding that ITU-T Rec. G.711.0 114 losslessly encoded the underlying (lossy) G.711 pulse code modulation 115 (PCM) sample representation of an audio signal. For this reason 116 ITU-T Rec. G.711.0 will be interchangeably referred to in this 117 document as a "lossless data compression algorithm" or a "codec", 118 depending on context. Within this document, individual G.711 PCM 119 samples will be referred to as "G.711 symbols" or just "symbols" for 120 brevity. 122 This section describes the ITU-T Recommendation G.711 [G.711] codec, 123 its properties, typical uses cases and its key design properties. 125 3.1. General Information and Use of the ITU-T G.711.0 Codec 127 ITU-T Recommendation G.711 is the benchmark standard for narrowband 128 telephony. It has been successful for many decades because of its 129 proven voice quality, ubiquity and utility. A new ITU-T 130 recommendation, G.711.0, has been established for defining a 131 stateless and lossless compression for G.711 packet payloads 132 typically used in VoIP networks. ITU-T Rec. G.711.0 is also known as 133 ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is 134 effectively a pointer ITU-T Rec. G.711.0. Henceforth in this 135 document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0" 136 and ITU-T Rec. G.711 simply as "G.711". 138 G.711.0 may be employed end-to-end; in which case the RTP payload 139 format specification and use is nearly identical to the G.711 RTP 140 specification found in RFC 3550 [RFC3550]. The only significant 141 difference for G.711.0 is the use of a dynamic payload type (the 142 static PT of 0 or 8 are virtually always used with G.711) and the 143 recommendation not to use Voice Activity Detection (see Section 4.1). 145 G.711.0, being both lossless and stateless, may also be employed as a 146 lossless compression mechanism anywhere in between end systems which 147 have negotiated use of G.711. Because the only significance between 148 the G.711 RTP payload format header and the G.711.0 payload format 149 header is the payload type, a G.711 RTP packet can be losslessly 150 converted to a G.711.0 RTP packet simply by compressing the G.711 151 payload (thus creating a G.711.0 payload), changing the payload type 152 to the dynamic value desired and copying all the remaining G.711 RTP 153 header fields into the corresponding G.711.0 RTP header. Conversely, 154 the corresponding decompression of a G.711.0 RTP packet back to the 155 original source G.711 RTP packet can be accomplished by losslessly 156 decompressing the G.711.0 payload back to the original source G.711 157 payload, changing the payload type back to the payload type of the 158 original G.711 RTP packet and copying all the remaining G.711.0 RTP 159 header fields into the corresponding G.711 RTP header. 161 It is special to note that G.711.0, being both lossless and 162 stateless, can be employed multiple times (e.g., on multiple, 163 individual hops or series of hops) of a given flow with no 164 degradation of quality relative to end-to-end G.711. Stated another 165 way, multiple "lossless transcodes" from/to G.711.0/G.711 do not 166 affect voice quality as typically occurs with lossy transcodes to/ 167 from dissimilar codecs. 169 Lastly, it is expected that G.711.0 will be used as an archival 170 format for recorded G.711 streams. Therefore, a G.711.0 Storage Mode 171 Format is also included in this document. 173 3.2. Key Properties of G.711.0 Design 175 The fundamental design of G.711.0 resulted from the desire to 176 losslessly encode and compress frames of G.711 symbols independent of 177 what types of signals those G.711 frames contained. The primary 178 G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals 179 (such as speech and music). 181 G.711.0 attributes are below: 183 A1 Compression for zero-mean acoustic signals: G.711.0 was designed 184 as its primary use case for the compression of G.711 payloads 185 which contained "speech" or other zero-mean acoustic signals. 186 G.711.0 obtains greater than 50% average compression in service 187 provider environments [ICASSP]. 189 A2 Lossless for any G.711 payload: G.711.0 was designed to be 190 lossless for any valid G.711 payload - even if the payload 191 consisted of apparently random G.711 symbols (e.g., a modem or 192 FAX payload). G.711.0 could be used for "aggregate 64 kbps 193 G.711 channels" carried over IP without explicit concern if a 194 subset of these channels happened to be carrying something 195 other than voice or general audio. To the extent that a 196 particular channel carried something other than voice or 197 general audio, G.711.0 ensured that it was carried losslessly, 198 if not significantly compressed. 200 A3 Stateless: Compression of a frame of G.711 symbols was only to be 201 dependent on that frame and not on any prior frame. Although 202 greater compression is usually available by observing a longer 203 history of past G.711 symbols, it was decided that the 204 compression design would be stateless to completely eliminate 205 error propagation common in many lossy codec designs (e.g., 206 ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]). That is, 207 the decoding process need not be concerned about lost prior 208 packets because the decompression of a given G.711.0 frame is 209 not dependent on potentially lost prior G.711.0 frames. Owing 210 to this stateless property, the frames input to the G.711.0 211 encoder may be changed "on-the-fly" (a 5 ms encoding could be 212 followed by a 20 ms encoding). 214 A4 Self-describing: This property is defined as the ability to 215 determine how many source G.711 samples are contained within 216 the G.711.0 frame solely by information contained within the 217 G.711.0 frame. Generally, the number of source G.711 symbols 218 can be determined by decoding the initial octets of the 219 compressed G.711.0 frame (these octets are called "prefix 220 codes" in the standard) [ICASSP]. A G.711.0 decoder need not 221 know what ptime is, as it is able to decompress the G.711.0 222 frame presented to it without signaling knowledge. 224 A5 Accommodate G.711 payload sizes typically used in IP: G.711 input 225 frames of length typically found in VoIP applications represent 226 SDP ptimes (see RFC 4566 [RFC4566]) of 5 ms, 10 ms, 20 ms, 30 227 ms or 40 ms. Since the dominant sampling frequency for G.711 228 is 8000 samples per second, G.711.0 was designed to compress 229 G.711 input frames of 40, 80, 160, 240 or 320 samples. 231 A6 Bounded expansion: Since attribute A2 above requires G.711.0 to 232 be lossless for any payload, by definition there exists at 233 least one potential G.711 payload which must be 234 "uncompressible". Since the quantum of compression is an 235 octet, the minimum expansion of such an uncompressible payload 236 was designed to be the minimum possible of one octet. Thus 237 G.711.0 "compressed" frames can be of length one octet to X+1 238 octets, where X is the size of the input G.711 frame in octets. 239 G.711.0 can therefore be viewed as a Variable Bit Rate (VBR) 240 encoding in which the size of the G.711.0 output frame is a 241 function of the G.711 symbols input to it. 243 A7 Algorithmic delay: G.711.0 was designed to have the algorithmic 244 delay equal to the time represented by the number of samples in 245 the G.711 input frame (i.e., no "look-ahead"). 247 A8 Low Complexity: Less than 1.0 WMOPS average and low memory 248 footprint (~5k octets RAM, ~5.7k octets ROM and ~3.6 basic 249 operations) [ICASSP] [G.711.0]. 251 A9 Both A-law and Mu-law supported: G.711 has two operating laws, 252 A-law and Mu-law. These two laws are also known as PCMA and 253 PCMU in RTP applicaitons RFC 3550 [RFC3550]. 255 These attributes generally make it trivial to compress a G.711 input 256 frame consisting of 40, 80, 160, 240 or 320 samples. After the input 257 frame is presented to a G.711.0 encoder, a G.711.0 "self-describing" 258 output frame is produced. The number of samples contained within 259 this frame is easily determined at the G.711.0 decoder by virtue of 260 attribute A4. The G.711.0 decoder can decode the G.711.0 frame back 261 to a G.711 frame by using only data within the G.711.0 frame. 263 Lastly we note that losing a G.711.0 encoded packet is identical in 264 effect of losing a G.711 packet (when using RTP); this is because a 265 G.711.0 payload, like the corresponding G.711 payload, is stateless. 266 Thus, it is anticipated that existing G.711 PLC mechanisms will be 267 employed when a G.711.0 packet is lost and an identical MOS 268 degradation relative to G.711 loss will be achieved. 270 3.3. G.711 Input Frames to G.711.0 Output Frames 272 G.711.0 is a lossless and stateless compression of G.711 frames. The 273 following figure depicts this where "A" is the process of G.711.0 274 encoding and "B" is the process of G.711.0 decoding. 276 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame 278 |--------------------------| A |------------------------------| 279 | G.711 Input Frame |----->| G.711.0 Output Frame | 280 | of X Octets | | containing 1 to X+1 Octets | 281 | (where X MUST be 40, 80, | | (precise value dependent on | 282 | 160, 240 or 320 octets) |<-----| G.711.0 ability to compress) | 283 |__________________________| B |______________________________| 285 Figure 1 287 Note that the mapping is 1:1 (lossless) in both directions, subject 288 to two constraints. The first constraint is that the input frame 289 provided to the G.711.0 encoder (process "A") has a specific number 290 of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240 291 or 320 octets). The second constraint is that the compression law 292 used to create the G.711 input frame (A-law or Mu-law) must be known, 293 consistent with attribute A9. 295 Subject to these two constraints, the input G.711 frame is processed 296 by the G.711.0 encoder ("A") and produces a "self-describing" G.711.0 297 output frame, consistent with attribute A4. Depending on the source 298 G.711 symbols, the G.711.0 output frame can contain anywhere from 1 299 to X+1 octets, where X is the number of input G.711 symbols. 300 Compression results for virtually every zero-mean acoustic signal 301 encoded by G.711.0. 303 Since the G.711.0 output frame is "self-describing", a G.711.0 304 decoder (process "B") can losslessly reproduce the original G.711 305 input frame with only the knowledge of which companding law was used 306 (A-law or Mu-law). The G.711.0 frame, being "self-describing", 307 allows for the G.711.0 decoder ("B") to know precisely how many G.711 308 symbols to create. 310 Since G.711.0 was designed with typical G.711 payload lengths as a 311 design constraint (attribute A5), this lossless encoding can be 312 performed only with knowledge of the companding law being used. This 313 information is anticipated to be signaled in SDP and will be 314 described later in this document. 316 If the original inputs were known to be from a zero-mean acoustic 317 signal coded by G.711, an intelligent G.711.0 encoder could infer the 318 G.711 companding law in use (via G.711 input signal amplitude 319 histogram statistics). Likewise, an intelligent G.711.0 decoder 320 producing G.711 from the G.711.0 frames could also infer which 321 encoding law in use. Thus G.711.0 could be designed for use in 322 applications that have limited stream signaling between the G.711 323 endpoints (i.e., they only know "G.711 at 8k sampling is being used", 324 but nothing more). Such usage is not further described in this 325 document. Additionally, if the original inputs were known to come 326 from zero-mean acoustic signals, an intelligent G.711.0 encoder could 327 tell if the G.711.0 payload had been encrypted - as the symbols would 328 not have the distribution expected in either companding law and would 329 appear random. Such determination is also not further discussed in 330 this document. 332 It is easily seen that this process is 1:1 and that G.711.0 based 333 lossless compression can be employed multiple times, as the original 334 G.711 input symbols are always reproduced with 100% fidelity. 336 G.711.0 frames containing more source G.711 symbols from a given 337 channel will typically result in higher compression as a general 338 rule, but there are exceptions. For example, an intelligent G.711.0 339 encoder may choose to encode 20 ms of G.711 as two individual 10 ms 340 G.711.0 frames if a higher overall compression will result (this 341 might occur if the first 10 ms was "silence" and two, 10 ms G.711.0 342 frames contained fewer octets than one 20 ms G.711.0 frame). For 343 this reason, we will explicitly allow multiple G.711.0 encoded frames 344 in the G.711.0 RTP payload in Section 4.2.2 below even though the 345 usual case is anticipated to be only one G.711.0 frame per RTP 346 payload. 348 4. RTP Header and Payload 350 In this section we describe the precise format for G.711.0 frames 351 carried via RTP. We begin with RTP header description relative to 352 G.711, then provide two G.711.0 payload examples. 354 4.1. G.711.0 RTP Header 356 Relative to G.711 RTP headers, the utilization of G.711.0 does not 357 create any special requirements with respect to the contents of the 358 RTP packet header. The only significant difference is that the 359 payload type (PT) RTP header field will have a value corresponding to 360 the dynamic payload type assigned to the flow (whereas G.711 PCMU 361 typically has a static PT = 0 and G.711 PCMA typically has a static 362 PT = 8 [RFC3551]). 364 Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is 365 negotiated because G.711.0 obtains high compression during "VAD 366 silence intervals" and one of the advantages of G.711.0 over G.711 367 with VAD is the lack of any VAD-inducing artifacts in the received 368 signal. However, if VAD is employed, the Marker bit (M) MUST be set 369 in the first packet of a talkspurt (the first packet after a silence 370 period in which packets have not been transmitted contiguously as per 371 rules specified in [RFC3550] for G.711 payloads). This definition, 372 being consistent with the G.711 RTP VAD use, further allows lossless 373 transcoding between G.711 RTP packets and G.711.0 RTP packets as 374 described in Section 3.1. 376 With this introduction, the RTP packet header fields are defined as 377 follows: 379 V - As per [RFC3550] 381 P - As per [RFC3550] 383 X - As per [RFC3550] 385 CC - As per [RFC3550] 387 M - As per [RFC3550] 389 PT- Dynamic PT assigned, consistent with MIME allocation for 390 G711.0 defined in Media Type Definition (Section 5.1). 392 SN - As per [RFC3550] 394 timestamp - As per [RFC3550] 396 SSRC - As per [RFC3550] 398 CSRC - As per [RFC3550] 400 Where V (version bits), P (padding bit), X (extension bit), CC (CSRC 401 count), M (marker bit), PT (payload type), SN (sequence number), 402 timestamp, SSRC (synchronizing source) and CSRC (contributing 403 sources) are as defined in [RFC3550] and as typically used with 404 G.711. PT (payload type) is as defined in [RFC3550]. 406 4.2. G.711.0 RTP Payload 408 In this section we provide two examples for carrying G.711.0 frames 409 in RTP payloads. The first example is used when it is desired to 410 carry only one G.711.0 frame in the RTP payload. This example is a 411 subset of the second and shown separately for clarity. 413 4.2.1. Single G.711.0 Frame per RTP Payload Example 414 This example depicts a single G.711.0 frame in the RTP payload. This 415 is expected to be the dominant RTP payload case for G.711.0, as the 416 G.711.0 encoding process supports the SDP packet times (ptime and 417 maxptime, see [RFC4566]) commonly used when G.711 is transported in 418 RTP. Additionally, as mentioned previously, larger G.711.0 frames 419 generally compress more effectively than a multiplicity of smaller 420 G.711.0 frames. 422 The following Figure illustrates the single G.711.0 frame per RTP 423 payload case. 425 Single G.711.0 Frame in RTP Payload Case 427 |-------------------|-------------------| 428 | One G.711.0 Frame | Zero or more 0x00 | 429 | | Padding Octets | 430 |___________________|___________________| 432 Figure 2 434 Encoding Process: A single G.711.0 frame is inserted into the RTP 435 payload. The amount of time represented by the G.711 symbols 436 compressed in the G.711.0 frame MUST correspond to the ptime signaled 437 for applications using SDP. Although generally not desired, padding 438 desired in the RTP payload after the G.711.0 frame MAY be created by 439 placing one or more 0x00 octets after the G.711.0 frame. Such 440 padding may be desired based on security considerations (see 441 Section 10). 443 Decoding Process: Passing the entire RTP payload to the G.711.0 444 decoder is sufficient for the G.711.0 decoder to create the source 445 G.711 symbols. Any padding inserted after the G.711.0 frame (i.e., 446 the 0x00 octets) present in the RTP payload is silently ignored by 447 the G.711.0 decoding process. The decoding process is fully 448 described in Section 4.2.3 below. 450 4.2.2. Multiple G.711.0 Frames per RTP Payload Example 452 This example depicts the case where multiple G.711.0 frames are 453 desired in the RTP payload. 455 As described in Section 3.3, an "intelligent G.711.0 encoder" can 456 decide to encode, let's say, 20 ms of G.711 symbols as two, 10 ms 457 G.711.0 frames because a greater compression is attained for that 458 particular 20 ms segment. The "smart encoding" of such inputs is 459 accommodated by the ability to have multiple G.711.0 frames in the 460 RTP payload. 462 Note that since each G.711.0 frame is self-describing (see Attribute 463 A4 in Section 3.2), the individual G.711.0 frames in the RTP payload 464 need not represent the same duration of time (i.e., a 5 ms G.711.0 465 frame could be followed by a 20 ms G.711.0 frame). Owing to this, 466 the amount of time represented in the RTP payload MAY be any integer 467 multiple of 5 ms (as 5 ms is the smallest interval of time that can 468 be represented in a G.711.0 frame). 470 The following Figure illustrates the multiple G.711.0 frame per RTP 471 payload case where the number of G.711.0 frames placed in the RTP 472 payload is N. 474 Multiple G.711.0 Frames in RTP Payload Case 476 |----------|---------|----------|---------|----------------| 477 | First | Second | | Nth | Zero or more | 478 | G.711.0 | G.711.0 | ... | G.711.0 | 0x00 | 479 | Frame | Frame | | Frame | Padding Octets | 480 |__________|_________|__________|_________|________________| 482 Figure 3 484 We note here that the individual G.711.0 frames can be, and generally 485 are, of different lengths. The decoding process in the following 486 section is used to determine the frame boundaries. 488 Encoding Process: One or more G.711.0 frames are placed in the RTP 489 payload simply by concatenating the G.711.0 frames together. The 490 amount of time represented by the G.711 symbols compressed in all the 491 G.711.0 frames in the RTP payload MUST correspond to the ptime 492 signaled for applications using SDP. Although not generally desired, 493 padding in the RTP payload SHOULD be placed after the last G.711.0 494 frame in the payload and MAY be created by placing one or more 0x00 495 octets after the last G.711.0 frame. Such padding may be desired 496 based on security considerations (see Section 10). 498 Decoding Process: As G.711.0 frames can be of varying length, the 499 payload decoding process described in the following section is used 500 to determine where the individual G.711.0 frame boundaries are. 502 4.2.3. G.711.0 RTP Payload Decoding Process 504 The G.711.0 decoding process is a standard part of G.711.0 bit stream 505 decoding and is implemented in the ITU-T Rec. G.711.0 reference code. 506 The decoding process heuristic described in this section is a slight 507 enhancement of the ITU-T reference code to explicitly accommodate RTP 508 padding (as described above). 510 Before describing the decoding, we note here that the largest 511 possible G.711.0 frame is created whenever the largest number of 512 G.711 symbols is encoded (320 from Section 3.2, property A5) and 513 these 320 symbols are "uncompressible" by the G.711.0 encoder. In 514 this case (via property A6 in Section 3.2) the G.711.0 output frame 515 will be 321 octets long. We also note that the value 0x00 chosen for 516 the optional padding cannot be the first octet of a valid ITU-T Rec. 517 G.711.0 frame (see [G.711.0]). We also note that whenever more than 518 one G.711.0 frame is contained in the RTP payload, the decoding of 519 the individual G.711.0 frames will occur multiple times. 521 For the decoding heuristic below, let N be the number of octets in 522 the RTP payload (i.e., excluding any RTP padding, but including any 523 RTP payload padding), let P equal the number of RTP payload octets 524 processed by the G.711.0 decoding process, let K be the number of 525 G.711 symbols presently in the output buffer, let Q be the number of 526 octets contained in the G.711.0 frame being processed and let "!=" 527 represent not equal to. The keyword "STOP" is used below to indicate 528 the end of the processing of G.711.0 frames in the RTP payload. The 529 heuristic below assumes an output buffer for the decoded G.711 source 530 symbols of length sufficient to accommodate the expected number of 531 G.711 symbols and an input buffer of length 321 octets. 533 G.711.0 RTP Payload Decoding Heuristic: 535 H1 Initialization: Initialize the number of processed octets to zero 536 (P = 0). Initialize the counter for how many G.711 symbols are 537 in the output buffer to zero (K = 0). Initialize N to the 538 number of octets in the RTP payload. Go to H2. 540 H2 Read internal buffer: Read min{320+1, (N-P)} octets into the 541 internal buffer from the (P+1) octet of the RTP payload. We 542 note at this point, N-P octets have yet to be processed and 543 that 320+1 octets is the largest possible G.711.0 frame. Go to 544 H3. 546 H3 Analyze the first octet in the internal buffer: If this octet 547 0x00 (a padding octet) go to H4, otherwise go to H5 (process a 548 G.711.0 frame). 550 H4 Process padding octet (no G.711 symbols generated): Increment the 551 processed packets counter by one (set P = P + 1). If the 552 result of this increment results in P >= N then STOP (as all 553 RTP Payload octets have been processed), otherwise go to H2. 555 H5 Process an individual G.711.0 frame (produce G.711 samples in the 556 output frame): Pass the internal buffer to the G.711.0 decoder. 557 The G.711.0 decoder will read the first octet (called the 558 "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to 559 determine the number of source G.711 samples M are contained in 560 this G.711.0 frame. The G.711.0 decoder will produce exactly M 561 G.711 source symbols. If K = 0, these M symbols will be the 562 first in the output buffer and are placed at the beginning of 563 the output buffer. If K != 0, concatenate these M symbols with 564 the prior symbols in the output buffer (there are K prior 565 symbols in the buffer). Set K = K + M (as there are now this 566 many G.711 source symbols in the output buffer). The G.711.0 567 decoder will have consumed some number of packets, Q, in the 568 internal buffer to produce the M G.711 symbols. Increment the 569 number of payload octet processed counter by this quantity (set 570 P = P + Q). If the result of this increment results in P >= N 571 then STOP (as all RTP Payload octets have been processed), 572 otherwise go to H2. 574 At this point, the output buffer will contain precisely K G.711 575 source symbols which should correspond to the ptime signaled if SDP 576 was used and the encoding process was without error. 578 We also note, as an aside, that the heuristic above (and the ITU-T 579 G.711.0 reference code) accommodates padding octets (0x00) placed 580 anywhere in between G.711.0 frames in the RTP payload as well as 581 prior to or after any or all G.711.0 frames. The ITU-T G.711.0 582 reference code does not have Step H3 and H4 as separate steps (i.e., 583 Step H5 immediately follows H2) at the added computational cost of 584 some additional buffer passing to/from the G.711.0 frame decoder 585 functions. That is the G.711.0 decoder in the reference code 586 "silently ignores" 0x00 padding octets at the beginning of what it 587 believes to be a G.711.0 encoded frame boundary. Thus Step H3 and 588 Step H4 above are an optimization over the reference code shown for 589 clarity. 591 If the decoder is at a playout endpoint location, this G.711 buffer 592 SHOULD be used in the same manner as a received G.711 RTP payload 593 would have been used (passed to a playout buffer, to a PLC 594 implementation, etc.). 596 4.2.4. G.711.0 RTP Payload for Multiple Channels 597 In this section we describe the use of multiple "channels" of G.711 598 data encoded by G.711.0 compression. 600 The dominant use of G.711 in RTP transport has been for single 601 channel use cases. For this case, the above G.711.0 encoding and 602 decoding process is used. However, the multiple channel case for 603 G.711.0 (a frame-based compression) is different from G.711 (a 604 sample-based encoding) and is described separately here. 606 RFC 3551 [RFC3551] provides guidelines for encoding audio channels 607 (Section 4) and for the ordering of the channels within the RTP 608 payload (Section 4.1). The ordering guidelines in RFC 3551, 609 Section 4.1 SHOULD be used unless an application-specific channel 610 ordering is more appropriate. 612 An implicit assumption in RFC 3551 is that all the channel data 613 multiplexed into a RTP payload MUST represent the same physical time 614 span. The case for G.711.0 is no different; the underlying G.711 615 data for all channels in a G.711.0 RTP payload MUST span the same 616 interval in time (e.g., the same "ptime" for a SDP-specified codec 617 negotiation). 619 RFC 3551 provides guidelines for sample-based encodings such as G.711 620 in Section 4.2. This guidance is tantamount to interleaving the 621 individual samples in that they SHOULD be packed in consecutive 622 octets. 624 RFC 3551 provides guidelines for frame-based encodings in which the 625 frames are interleaved. However, this guidance stems from the 626 assumption that "the frame size for frame-oriented codecs is a 627 given". However, this assumption is not valid for G.711.0 in that 628 individual consecutive G.711.0 frames (as per Section 4.2.2) can: 630 1) represent different time spans (e.g., two 5 ms G.711.0 frames 631 in lieu of one 10 ms G.711.0 frame), and 633 2) be of different lengths in octets (and typically are). 635 Therefore a different, but also simple, concatenation-based approach 636 is specified in this RFC. 638 For the multiple channel G.711.0 case, each G.711 channel is 639 independently encoded into one or more G.711.0 frames defined here as 640 a "G.711.0 channel superframe". Each one of these superframes is 641 identical to the multiple G.711.0 frame case illustrated in Figure 3 642 of Section 4.2.2 in which each superframe can have one or more 643 individual G.711.0 frames within it. Then each G.711.0 channel 644 superframe is concatenated - in channel order - into a G.711.0 RTP 645 payload. Then, if optional G.711.0 padding octets (0x00) are 646 desired, it is RECOMMENDED that these octets are placed after the 647 last G.711.0 channel superframe. As per above, such padding may be 648 desired based on security considerations (see Section 10). This is 649 depicted in the following Figure 4 below. 651 Multiple G.711.0 Channel Superframes in RTP Payload 653 |----------|---------|----------|---------|---------| 654 | First | Second | | Nth | Zero | 655 | G.711.0 | G.711.0 | ... | G.711.0 | or more | 656 | Channel | Channel | | Channel | 0x00 | 657 | Super- | Super- | | Super | Padding | 658 | Frame | Frame | | Frame | Octets | 659 |__________|_________|__________|_________|_________| 661 Figure 4 663 The G.711.0 decoder at the receiving end simply decodes the entire 664 G.711.0 (multiple channel) payload into individual G.711 symbols. If 665 M such G.711 symbols result and there were N channels, then the first 666 M/N G.711 samples would be from the first channel, the second M/N 667 G.711 samples would be from the second channel, and so on until the 668 Nth set of G.711 samples are found. Similarly, if the number of 669 channels was not known, but the payload "ptime" was known, one could 670 infer (knowing the sampling rate) how many G.711 symbols each channel 671 contained; then with this knowledge determine how many channels of 672 data were contained in the payload. When SDP is used, the number of 673 channels is known because the optional parameter is a MUST when there 674 is more than one channel negotiated (see Section 5.1). Additionally, 675 when SDP is used the parameter ptime is a RECOMMENDED optional 676 parameter. We note that if both parameters channels and ptime are 677 known that one could provide a check for the other and the converse. 679 Lastly we note that although any padding for the multiple channel 680 G.711.0 payload is RECOMMENDED to be placed at the end of the 681 payload, the G.711.0 decoding heuristic described in Section 4.2.3 682 will successfully decode the payload in Figure 4 if the 0x00 padding 683 octet is placed anywhere before or after any individual G.711.0 frame 684 in the RTP payload. The number of padding octets introduced at any 685 G.711.0 frame boundary therefore does not affect the number M of the 686 source G.711 symbols produced. Thus the decision for padding MAY be 687 made on a per-superframe basis. 689 5. Payload Format Parameters 690 This section defines the parameters that may be used to configure 691 optional features in the G.711.0 RTP transmission. 693 The parameters defined here as a part of the media subtype 694 registration for the G.711.0 codec. Mapping of the parameters into 695 Session Description Protocol (SDP) RFC 4566 [RFC4566] is also 696 provided for those applications that use SDP. 698 5.1. Media Type Registration 700 Type name: audio 702 Subtype name: G7110 704 Required Parameters: 706 rate: The RTP timestamp clock rate, which is equal to the sampling 707 rate. The typical rate used with G.711 encoding is 8000, but 708 other rates may be specified. The default rate is 8000. 710 complaw: Indicates the companding law (A-law or mu-law) employed. 711 The case-insensitive values are "al" or "mu" for A-law and mu-law, 712 respectively. 714 Optional parameters: 716 channels: See RFC 4566 [RFC4566] for definition. Specifies how 717 many audio streams are represented in the G.711.0 payload and MUST 718 be present if the number of channels is greater than one. This 719 parameter defaults to 1 if not present (as per RFC 4566) an is 720 typically a non-zero small-valued positive integer. It is 721 expected that implementations that specify multiple channels will 722 also define a mechanism to map the channels appropriately within 723 their system design, otherwise the channel order specified in RFC 724 3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right, 725 center, ... ). 727 maxptime: See RFC 4566 [RFC4566] for definition. 729 ptime: See RFC 4566 [RFC4566] for definition. The inclusion of 730 "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an 731 application specific reason not to include it (e.g., an 732 application that has a variable ptime on a packet-by-packet 733 basis). For constant ptime applications, it is considered good 734 form to include "ptime" in the SDP for session diagnostic 735 purposes. For the constant ptime multiple channel case described 736 in Section 4.2.2, the inclusion of "ptime" can provide a desirable 737 payload check. 739 Encoding considerations: 741 This media type is framed binary data (see Section 4.8 in RFC 4288 742 [RFC4288]) compressed as per ITU-T Rec. G.711.0. 744 Security considerations: 746 This media type does not carry active content. It does transfer 747 compressed data. See Section 4 of RFC 4856 [RFC4856]. 749 Interoperability considerations: none 751 Published specification: 753 ITU-T Rec. G.711.0 and RFC QQQQ. 755 [ RFC Editor: please replace QQQQ with a reference to this RFC ] 757 Applications that use this media type: 759 Audio and video streaming and conferencing tools. 761 Additional information: none 763 Person & email address to contact for further information: 765 Michael Ramalho or 767 Intended usage: COMMON 769 Restrictions on usage: 771 This media type depends on RTP framing, and hence is only defined 772 for transfer via RTP [RFC3550]. Transport within other framing 773 protocols is not defined at this time. 775 Author: Michael Ramalho 777 Change controller: 779 IETF Audio/Video Transport working group delegated from the IESG. 781 5.2. Mapping to SDP Parameters 783 The information carried in the media type specification has a 784 specific mapping to fields in the Session Description Protocol (SDP), 785 which is commonly used to describe RTP sessions. When SDP is used to 786 specify sessions employing G.711.0, the mapping is as follows: 788 o The media type ("audio") goes in SDP "m=" as the media name. 790 o The media subtype ("G7110") goes in SDP "a=rtpmap" as the encoding 791 name. 793 o The required parameter "rate" also goes in "a=rtpmap" as the clock 794 rate. 796 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 797 "a=maxptime" attributes, respectively. 799 o Remaining parameters go in the SDP "a=fmtp" attribute by copying 800 them directly from the media type string as a semicolon-separated 801 list of parameter=value pairs. 803 5.3. Offer/Answer Considerations 805 The following considerations apply when using the SDP offer/answer 806 RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute. 808 o If the offering endpoint specifies a value for the optional 809 channels parameter greater than one and the answering endpoint 810 both understands the parameter and cannot support that value 811 requested, the answer MUST contain the optional channels parameter 812 with the highest value it can support. 814 o If the offering endpoint specifies a value for the optional 815 channels parameter the answer MUST contain the optional channels 816 parameter unless the only value the answering endpoint can support 817 is one, in which case the answer MAY contain the optional channels 818 parameter with value of 1. 820 o If the offering endpoint specifies a value for the ptime parameter 821 that the answering endpoint cannot support, the answer MUST 822 contain the optional ptime parameter. 824 o If the offering endpoint specifies a value for the maxptime 825 parameter that the answering endpoint cannot support, the answer 826 MUST contain the optional maxptime parameter. 828 5.4. SDP Examples 830 The following examples illustrate how to signal G.711.0 via SDP. 832 5.4.1. SDP Example 1 834 m=audio RTP/AVP 98 835 a=rtpmap: 98 G7110/8000 836 a=fmtp:98 complaw = mu 838 In the above example the dynamic payload type 98 is mapped to G.711.0 839 via the "a=rtpmap" parameter. The mandatory "complaw" is on the 840 "a=fmtp" parameter line. Note that neither optional parameters 841 "ptime" nor "channels" is present; although it is generally good form 842 to include "ptime" in the SDP for session diagnostic purposes. 844 5.4.2. SDP Example 2 846 The following example illustrates an offering endpoint requesting 2 847 channels, but the answering endpoint can only support (or render) one 848 channel. 850 Offer: 852 m=audio RTP/AVP 98 853 a=rtpmap: 98 G7110/8000/2 854 a=ptime: 20 855 a=fmtp:98 complaw = al 857 Answer: 859 m=audio RTP/AVP 98 860 a=rtpmap: 98 G7110/8000/1 861 a=ptime: 20 862 a=fmtp:98 complaw = al 864 In this example the offer had an optional channels parameter. The 865 answer must have the optional channels parameter also unless the 866 value in the answer is one. Shown here is when the answer explicitly 867 contains the channels parameter (it need not have and it would be 868 interpreted as one channel). As mentioned previously, it is 869 considered good form to include "ptime" in the SDP for session 870 diagnostic purposes if the session is a contstant ptime session. 872 6. G.711.0 Storage Mode Conventions and Definition 874 The G.711.0 storage mode definition in this section is similar to 875 many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a 876 concatenation of individual G.711.0 frames. 878 We note that something must be stored for any G.711.0 frames that not 879 received at the receiving endpoint, no matter what the cause. In 880 this section we describe two mechanisms, a "G.711.0 PLC Frame" and a 881 "G.711.0 Erasure Frame". These G.711.0 PLC and G.711.0 Erasure 882 Frames are described prior to the G.711.0 storage mode definition for 883 clarity. 885 6.1. G.711.0 PLC Frame 887 When G.711 RTP payloads not received by a rendering endpoint a Packet 888 Loss Concealment (PLC) mechanism is typically employed to "fill in" 889 the missing G.711 symbols with something that is auditorially 890 pleasing and thus the loss may be not noticed by a listener. Such a 891 PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1 892 [G.711-AP1]. 894 An natural extension when creating G.711.0 frames for storage 895 environments is to employ such a PLC mechanism to create G.711 896 symbols for the span of time in which G.711.0 payloads were not 897 received - and then to compress the resulting "G.711 PLC symbols" via 898 G.711.0 compression. The G.711.0 frame(s) created by such a process 899 are called "G.711.0 PLC Frames". 901 Since PLC mechanisms are designed to render missing audio data with 902 the best fidelity and intelligibility, G.711.0 frames created via 903 such processing is likely best for most recording situations (such as 904 voicemail storage) unless there is a requirement not to fabricate 905 (audio) data not actually received. 907 After such PLC G.711 symbols have been generated and then encoded by 908 a G.711.0 encoder, the resulting frames may be stored in G.711.0 909 frame format. As a result, there is nothing to specify here - the 910 G.711.0 PLC Frames are stored as if they were received by the 911 receiving endpoint. In other words, PLC-generated G.711.0 frames 912 appear as "normal" or "ordinary" G.711.0 frames in the storage mode 913 file. 915 6.2. G.711.0 Erasure Frame 917 "Erasure Frames", or equivalently "Null Frames", have been designed 918 for many frame-based codecs since G.711 was standardized. These null 919 /erasure frames explicitly represent data from incoming audio that 920 were either not received by the receiving system or represent data 921 that a transmitting system decided not to send. Transmitting systems 922 may choose not to send data for a variety of reasons (e.g., not 923 enough wireless link capacity in radio-based systems) and can choose 924 to send a "null frame" in lieu of the actual audio. It is also 925 envisioned that erasure frames would be used in storage mode 926 applications for specific archival purposes where there is a 927 requirement not to fabricate audio data that was not actually 928 received. 930 Thus, a G.711.0 erasure frame is a representation of the amount of 931 time in G.711.0 frames that were not received or not encoded by the 932 transmitting system. 934 Prior to defining a G.711.0 erasure frame it is beneficial to note 935 what many G.711 RTP systems send when the endpoint is "muted". When 936 muted, many of these systems will send an entire G.711 payload of 937 either 0+ or 0- (i.e., one of the two levels closest to "analog zero" 938 in either G.711 companding law). Next we note that a desirable 939 property for a G.711.0 erasure frame is for "non G.711.0 Erasure 940 Frame aware" endpoints to be able to playback a G.711.0 erasure frame 941 with the existing G.711.0 ITU-T reference code. 943 A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the 944 corresponding G.711 sample values are either the value 0++ or the 945 value 0-- for the entirety of the G.711.0 frame. The levels of 0++ 946 and 0-- are defined two levels above or below analog zero, 947 respectively. An entire frame of value 0++ or 0-- is expected to be 948 extraordinarily rare when the frame was in fact generated by a 949 natural signal (on the order of one in 2^{ptime in samples, minus 950 one}), as analog inputs such as speech and music are zero-mean and 951 are typically acoustically coupled to digital sampling systems. Note 952 that the playback of a G.711.0 frame characterized as an erasure 953 frame is auditorially equivalent to a muted signal (a very low value 954 constant). 956 These G.711.0 erasure frames can be reasonably characterized as null 957 or erasure frames while meeting the desired playback goal of being 958 decoded by the G.711.0 ITU-T reference code. Thus, similarly to 959 G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or 960 "ordinary" G.711.0 frames in the storage mode format. 962 6.3. G.711.0 Storage Mode Definition 964 The storage format is used for storing G.711.0 encoded frames. The 965 format for the G.711.0 storage mode file defined by this RFC is shown 966 below. 968 G.711.0 Storage Mode Format 970 |---------------------------|----------|--------------| 971 | Magic Number | | | 972 | | Version | Concatenated | 973 | "#!G7110A\n" (for A-law) | Octet | G.711.0 | 974 | or | | Frames | 975 | "#!G7110M\n" (for Mu-law) | "0x00" | | 976 |___________________________|__________|______________| 978 Figure 5 980 The storage mode file consists of a magic number and a version octet 981 followed by the individual G.711.0 frames concatenated together. 983 The magic number for G.711.0 A-law corresponds to the ASCII character 984 string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41 985 0x0A". Likewise, the magic number for G.711.0 MU-law corresponds to 986 the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37 987 0x31 0x31 0x4E 0x4D 0x0A". 989 The version number octet allows for the future specification of other 990 G.711.0 storage mode formats. The specification of other storage 991 mode formats may be desireable as G.711.0 frames are of variable 992 length and a future format may include an indexing methodology that 993 would enable playout far into a long G.711.0 recording without the 994 necessity of decoding all the G.711.0 frames since the beginning of 995 the recording. Other future format specification may include support 996 for multiple channels, metadata and the like. For these reasons it 997 was determined that a versioning strategy was desirable for the 998 G.711.0 storage mode definition specified by this RFC. This RFC only 999 specifies Version 0 and thus the value of "0x00" must be used for the 1000 storage mode defined by this RFC. 1002 The G.711.0 codec data frames, including any necessary erasure or PLC 1003 frames, are stored in consecutive order concatenated together as 1004 shown in Section 4.2.2. 1006 To decode the individual G.711.0 frames, the heuristic presented in 1007 Section 4.2.2 may be used to decode the individual G.711.0 frames. 1008 If the version octet is determined not to be zero, the remainder of 1009 the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T 1010 G.711.0 reference decoder can only decode concatenated G.711.0 frames 1011 and has not been designed to decode elements in yet to be specified 1012 future storage mode formats. 1014 7. Acknowledgements 1016 There have been many people contributing to G.711.0 in the course of 1017 its development. The people listed here deserve special mention: 1018 Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke 1019 Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick 1020 Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs, 1021 Yutaka Kamamoto, and Csaba Kos. 1023 8. Contributors 1025 The authors thank everyone who have contributed to this document. 1026 The people listed here deserve special mention: Ali Begen, Roni Even, 1027 and Hadriel Kaplan. 1029 9. IANA Considerations 1031 One media type (audio/G7110) has been defined and requires IANA 1032 registration in the media types registry. See Section 5.1 for 1033 details. 1035 10. Security Considerations 1037 RTP packets using the payload format defined in this specification 1038 are subject to the security considerations discussed in the RTP 1039 specification [RFC3550], and in any appropriate RTP profile (for 1040 example RFC 3551 [RFC3551] or [RFC4585]. This implies that 1041 confidentiality of the media streams is achieved by encryption; for 1042 example, through the application of SRTP [RFC3711]. Because the data 1043 compression used with this payload format is applied end-to-end, any 1044 encryption needs to be performed after compression. 1046 Note that the appropriate mechanism to ensure confidentiality and 1047 integrity of RTP packets and their payloads is very dependent on the 1048 application and on the transport and signaling protocols employed. 1049 Thus, although SRTP is given as an example above, other possible 1050 choices exist. 1052 Note that end-to-end security with either authentication, integrity 1053 or confidentiality protection will prevent a network element not 1054 within the security context from performing media-aware operations 1055 other than discarding complete packets. To allow any (media-aware) 1056 intermediate network element to perform its operations, it is 1057 required to be a trusted entity which is included in the security 1058 context establishment. 1060 G.711.0 has no known denial-of-service attacks due to decoding, as 1061 data posing as a desired G711.0 payload will be decoded into 1062 something (as per the decoding algorithm) with a finite amount of 1063 computation. This is due to the decompression algorithm having a 1064 finite worst-case processing path (no infinite computational loops 1065 are possible). 1067 G.711.0 is a variable bit rate (VBR) audio codec. There have been 1068 recent concerns with VBR speech codecs where a passive observer can 1069 identify phrases from a standard speech corpus by means of the 1070 lengths produced by the encoder even when the payload is encrypted 1072 [IEEE]. In this paper, it was determined that some code excited 1073 linear prediction (CELP) codecs would produce discrete packet lengths 1074 for some phonemes. And furthermore with the use of appropriately 1075 designed Hidden Markov Models (HMMs) that such a system could predict 1076 phrases with unexpected accuracy. One CELP codec studied, SPEEX, had 1077 the property that it produced 21 different packet lengths in its 1078 wideband mode and that these packet lengths probabilistically mapped 1079 to phonemes that a HMM system could be trained on. In this paper it 1080 was determined that a mitigation technique would be to pad the output 1081 of the encoder with random padding lengths to the effect: 1) that 1082 more discrete payload sizes would result, and 2) that the 1083 probabilistic mapping to phonemes would become less clear. As G.711 1084 is not a speech model based codec, neither is G.711.0. A G.711.0 1085 encoding, during talking periods, produces frames of varying frame 1086 lengths which are not likely to have a strong mapping to phonemes. 1087 Thus G.711.0 is not expected to have this same vulnerability. It 1088 should be noted that "silence" (only one value of G.711 in the entire 1089 G.711 input frame)" or "near silence" (only a few G.711 values) is 1090 easily detectable as G.711.0 frame lengths or one or a few octets. 1091 If one desires to mitigate for silence/non-silence detection, 1092 statistically variable padding should be added to G.711.0 frames that 1093 resulted in very small G.711.0 frames (less than about 20% of the 1094 symbols of the corresponding G.711 input frame). Methods of 1095 introducing padding in the G.711.0 payloads have been provided in the 1096 G.711.0 RTP payload definitions in Section 4.2.1 and Section 4.2.2. 1098 11. References 1100 11.1. Normative References 1102 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1103 Requirement Levels", BCP 14, RFC 2119, March 1997. 1105 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1106 Description Protocol", RFC 4566, July 2006. 1108 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1109 Registration Procedures", RFC 4288, December 2005. 1111 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1112 Formats", RFC 4855, February 2007. 1114 [RFC4856] Casner, S., "Media Type Registration of Payload Formats in 1115 the RTP Profile for Audio and Video Conferences", RFC 1116 4856, February 2007. 1118 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1119 Jacobson, "RTP: A Transport Protocol for Real-Time 1120 Applications", STD 64, RFC 3550, July 2003. 1122 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1123 Video Conferences with Minimal Control", STD 65, RFC 3551, 1124 July 2003. 1126 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1127 "Extended RTP Profile for Real-time Transport Control 1128 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1129 2006. 1131 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1132 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1133 RFC 3711, March 2004. 1135 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1136 with Session Description Protocol (SDP)", RFC 3264, June 1137 2002. 1139 [G.711.0] ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless 1140 Compression of G.711 Pulse Code Modulation", September 1141 2009. 1143 [G.711] ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code 1144 Modulation (PCM) of Voice Frequencies", November 1988. 1146 [G.711-AP1] 1147 ITU-T G.711 Appendix 1, , "Recommendation G.711 1148 Appendix 1: A high quality low-complexity algorithm for 1149 packet loss concealment with G.711", September 1999. 1151 [G.711-A1] 1152 ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711 1153 Amendment 1 - Amendment 1: New Annex A on Lossless 1154 Encoding of PCM Frames", September 2009. 1156 11.2. Informative References 1158 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 1159 June 1999. 1161 [G.729] ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of 1162 speech at 8 kbit/s using conjugate-structure algebraic- 1163 code-excited linear prediction (CS-ACELP)", January 2007. 1165 [G.722] ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio- 1166 coding within 64 kbit/s", November 1988. 1168 [ICASSP] N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M. 1169 A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H. 1170 Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0 1171 - Lossless Compression of G.711 Pulse Code Modulation, 1172 International Conference on Acoustics Speech and Signal 1173 Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9", 1174 March 2010. 1176 [IEEE] C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, , 1177 and G.M. Masson, "Spot Me if You Can: Uncovering Spoken 1178 Phrases in Encrypted VoIP Conversations, IEEE Symposium on 1179 Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May 1180 2008. 1182 Authors' Addresses 1184 Michael A. Ramalho (editor) 1185 Cisco Systems, Inc. 1186 8000 Hawkins Road 1187 Sarasota, FL 34241 1188 USA 1190 Phone: +1 919 476 2038 1191 Email: mramalho@cisco.com 1193 Paul E. Jones 1194 Cisco Systems, Inc. 1195 7025 Kit Creek Rd. 1196 Research Triangle Park, NC 27709 1197 USA 1199 Phone: +1 919 476 2048 1200 Email: paulej@packetizer.com 1202 Noboru Harada 1203 NTT Communications Science Labs. 1204 3-1 Morinosato-Wakamiya 1205 Atsugi, Kanagawa 243-0198 1206 JAPAN 1208 Phone: +81 46 240 3676 1209 Email: harada.noboru@lab.ntt.co.jp 1210 Muthu Arul Mozhi Perumal 1211 Cisco Systems, Inc. 1212 Cessna Business Park 1213 Sarjapur-Marathahalli Outer Ring Road 1214 Bangalore, Karnataka 560103 1215 India 1217 Phone: +91 9449288768 1218 Email: mperumal@cisco.com 1220 Lei Miao 1221 Huawei Technologies Co. Ltd 1222 Q22-2-A15R, Enviroment Protection Park 1223 No. 156 Beiqing Road 1224 HaiDian District 1225 Beijing 100095 1226 China 1228 Phone: +86 1059728300 1229 Email: lei.miao@huawei.com