idnits 2.17.1 

draft-ietf-payload-g7110-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (August 22, 2014) is 3534 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1'


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    M. Ramalho, Ed.
3	Internet-Draft                                                  P. Jones
4	Intended status: Standards Track                           Cisco Systems
5	Expires: February 23, 2015                                     N. Harada
6	                                                                     NTT
7	                                                              M. Perumal
8	                                                                Ericsson
9	                                                                 L. Miao
10	                                                     Huawei Technologies
11	                                                         August 22, 2014

13	                     RTP Payload Format for G.711.0
14	                      draft-ietf-payload-g7110-03

16	Abstract

18	   This document specifies the Real-Time Transport Protocol (RTP)
19	   payload format for ITU-T Recommendation G.711.0.  ITU-T Rec. G.711.0
20	   defines a lossless and stateless compression for G.711 packet
21	   payloads typically used in IP networks.  This document also defines a
22	   storage mode format for G.711.0 and a media type registration for the
23	   G.711.0 RTP payload format.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on February 23, 2015.

42	Copyright Notice

44	   Copyright (c) 2014 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
61	   3.  G.711.0 Codec Background  . . . . . . . . . . . . . . . . . .   3
62	     3.1.  General Information and Use of the ITU-T G.711.0 Codec  .   3
63	     3.2.  Key Properties of G.711.0 Design  . . . . . . . . . . . .   4
64	     3.3.  G.711 Input Frames to G.711.0 Output Frames . . . . . . .   6
65	       3.3.1.  Multiple G.711.0 Output Frames per RTP Payload
66	               Considerations  . . . . . . . . . . . . . . . . . . .   8
67	   4.  RTP Header and Payload  . . . . . . . . . . . . . . . . . . .   9
68	     4.1.  G.711.0 RTP Header  . . . . . . . . . . . . . . . . . . .   9
69	     4.2.  G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . .  10
70	       4.2.1.  Single G.711.0 Frame per RTP Payload Example  . . . .  10
71	       4.2.2.  G.711.0 RTP Payload Definition  . . . . . . . . . . .  11
72	       4.2.3.  G.711.0 RTP Payload Decoding Process  . . . . . . . .  12
73	       4.2.4.  G.711.0 RTP Payload for Multiple Channels . . . . . .  14
74	   5.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  17
75	     5.1.  Media Type Registration . . . . . . . . . . . . . . . . .  17
76	     5.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .  19
77	     5.3.  Offer/Answer Considerations . . . . . . . . . . . . . . .  19
78	     5.4.  SDP Examples  . . . . . . . . . . . . . . . . . . . . . .  20
79	       5.4.1.  SDP Example 1 . . . . . . . . . . . . . . . . . . . .  20
80	       5.4.2.  SDP Example 2 . . . . . . . . . . . . . . . . . . . .  20
81	   6.  G.711.0 Storage Mode Conventions and Definition . . . . . . .  21
82	     6.1.  G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . .  21
83	     6.2.  G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . .  22
84	     6.3.  G.711.0 Storage Mode Definition . . . . . . . . . . . . .  23
85	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  24
86	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  24
87	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  24
88	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  24
89	   11. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  26
90	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
91	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  26
92	     12.2.  Informative References . . . . . . . . . . . . . . . . .  27
93	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  28

95	1.  Introduction

97	   The International Telecommunication Union (ITU-T) Recommendation
98	   G.711.0 [G.711.0] specifies a stateless and lossless compression for
99	   G.711 packet payloads typically used in Voice over IP (VoIP)
100	   networks.  This document specifies the Real-Time Transport Protocol
101	   (RTP) RFC 3550 [RFC3550] payload format and storage modes for this
102	   compression.

104	2.  Requirements Language

106	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
107	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
108	   document are to be interpreted as described in RFC 2119 [RFC2119].

110	3.  G.711.0 Codec Background

112	   ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless
113	   compression mechanism for ITU-T Recommendation G.711 [G.711] and thus
114	   is not a "codec" in the sense of "lossy" codecs typically carried by
115	   RTP.  When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as
116	   if it were a codec, with the understanding that ITU-T Rec. G.711.0
117	   losslessly encoded the underlying (lossy) G.711 pulse code modulation
118	   (PCM) sample representation of an audio signal.  For this reason
119	   ITU-T Rec. G.711.0 will be interchangeably referred to in this
120	   document as a "lossless data compression algorithm" or a "codec",
121	   depending on context.  Within this document, individual G.711 PCM
122	   samples will be referred to as "G.711 symbols" or just "symbols" for
123	   brevity.

125	   This section describes the ITU-T Recommendation G.711 [G.711] codec,
126	   its properties, typical uses cases and its key design properties.

128	3.1.  General Information and Use of the ITU-T G.711.0 Codec

130	   ITU-T Recommendation G.711 is the benchmark standard for narrowband
131	   telephony.  It has been successful for many decades because of its
132	   proven voice quality, ubiquity and utility.  A new ITU-T
133	   recommendation, G.711.0, has been established for defining a
134	   stateless and lossless compression for G.711 packet payloads
135	   typically used in VoIP networks.  ITU-T Rec. G.711.0 is also known as
136	   ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is
137	   effectively a pointer ITU-T Rec. G.711.0.  Henceforth in this
138	   document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0"
139	   and ITU-T Rec. G.711 simply as "G.711".

141	   G.711.0 may be employed end-to-end; in which case the RTP payload
142	   format specification and use is nearly identical to the G.711 RTP
143	   specification found in RFC 3551 [RFC3551].  The only significant
144	   difference for G.711.0 is the required use of a dynamic payload type
145	   (the static PT of 0 or 8 is presently almost always used with G.711
146	   even though dynamic assignment of other payload types is allowed) and
147	   the recommendation not to use Voice Activity Detection (see
148	   Section 4.1).

150	   G.711.0, being both lossless and stateless, may also be employed as a
151	   lossless compression mechanism anywhere between end systems which
152	   have negotiated use of G.711.  Because the only significance between
153	   the G.711 RTP payload format header and the G.711.0 payload format
154	   header is the payload type, a G.711 RTP packet can be losslessly
155	   converted to a G.711.0 RTP packet simply by compressing the G.711
156	   payload (thus creating a G.711.0 payload), changing the payload type
157	   to the dynamic value desired and copying all the remaining G.711 RTP
158	   header fields into the corresponding G.711.0 RTP header.  Conversely,
159	   the corresponding decompression of a G.711.0 RTP packet back to the
160	   original source G.711 RTP packet can be accomplished by losslessly
161	   decompressing the G.711.0 payload back to the original source G.711
162	   payload, changing the payload type back to the payload type of the
163	   original G.711 RTP packet and copying all the remaining G.711.0 RTP
164	   header fields into the corresponding G.711 RTP header.

166	   It is special to note that G.711.0, being both lossless and
167	   stateless, can be employed multiple times (e.g., on multiple,
168	   individual hops or series of hops) of a given flow with no
169	   degradation of quality relative to end-to-end G.711.  Stated another
170	   way, multiple "lossless transcodes" from/to G.711.0/G.711 do not
171	   affect voice quality as typically occurs with lossy transcodes to/
172	   from dissimilar codecs.

174	   Lastly, it is expected that G.711.0 will be used as an archival
175	   format for recorded G.711 streams.  Therefore, a G.711.0 Storage Mode
176	   Format is also included in this document.

178	3.2.  Key Properties of G.711.0 Design

180	   The fundamental design of G.711.0 resulted from the desire to
181	   losslessly encode and compress frames of G.711 symbols independent of
182	   what types of signals those G.711 frames contained.  The primary
183	   G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals
184	   (such as speech and music).

186	   G.711.0 attributes are below:

188	   A1  Compression for zero-mean acoustic signals: G.711.0 was designed
189	         as its primary use case for the compression of G.711 payloads
190	         that contained "speech" or other zero-mean acoustic signals.

192	         G.711.0 obtains greater than 50% average compression in service
193	         provider environments [ICASSP].

195	   A2  Lossless for any G.711 payload: G.711.0 was designed to be
196	         lossless for any valid G.711 payload - even if the payload
197	         consisted of apparently random G.711 symbols (e.g., a modem or
198	         FAX payload).  G.711.0 could be used for "aggregate 64 kbps
199	         G.711 channels" carried over IP without explicit concern if a
200	         subset of these channels happened to be carrying something
201	         other than voice or general audio.  To the extent that a
202	         particular channel carried something other than voice or
203	         general audio, G.711.0 ensured that it was carried losslessly,
204	         if not significantly compressed.

206	   A3  Stateless: Compression of a frame of G.711 symbols was only to be
207	         dependent on that frame and not on any prior frame.  Although
208	         greater compression is usually available by observing a longer
209	         history of past G.711 symbols, it was decided that the
210	         compression design would be stateless to completely eliminate
211	         error propagation common in many lossy codec designs (e.g.,
212	         ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]).  That is,
213	         the decoding process need not be concerned about lost prior
214	         packets because the decompression of a given G.711.0 frame is
215	         not dependent on potentially lost prior G.711.0 frames.  Owing
216	         to this stateless property, the frames input to the G.711.0
217	         encoder may be changed "on-the-fly" (a 5 ms encoding could be
218	         followed by a 20 ms encoding).

220	   A4  Self-describing: This property is defined as the ability to
221	         determine how many source G.711 samples are contained within
222	         the G.711.0 frame solely by information contained within the
223	         G.711.0 frame.  Generally, the number of source G.711 symbols
224	         can be determined by decoding the initial octets of the
225	         compressed G.711.0 frame (these octets are called "prefix
226	         codes" in the standard) [ICASSP].  A G.711.0 decoder need not
227	         know what ptime is, as it is able to decompress the G.711.0
228	         frame presented to it without signaling knowledge.

230	   A5  Accommodate G.711 payload sizes typically used in IP: G.711 input
231	         frames of length typically found in VoIP applications represent
232	         SDP ptimes (see RFC 4566 [RFC4566]) of 5 ms, 10 ms, 20 ms, 30
233	         ms or 40 ms.  Since the dominant sampling frequency for G.711
234	         is 8000 samples per second, G.711.0 was designed to compress
235	         G.711 input frames of 40, 80, 160, 240 or 320 samples.

237	   A6  Bounded expansion: Since attribute A2 above requires G.711.0 to
238	         be lossless for any payload, by definition there exists at
239	         least one potential G.711 payload which must be
240	         "uncompressible".  Since the quantum of compression is an
241	         octet, the minimum expansion of such an uncompressible payload
242	         was designed to be the minimum possible of one octet.  Thus
243	         G.711.0 "compressed" frames can be of length one octet to X+1
244	         octets, where X is the size of the input G.711 frame in octets.
245	         G.711.0 can therefore be viewed as a Variable Bit Rate (VBR)
246	         encoding in which the size of the G.711.0 output frame is a
247	         function of the G.711 symbols input to it.

249	   A7  Algorithmic delay: G.711.0 was designed to have the algorithmic
250	         delay equal to the time represented by the number of samples in
251	         the G.711 input frame (i.e., no "look-ahead").

253	   A8  Low Complexity: Less than 1.0 WMOPS average and low memory
254	         footprint (~5k octets RAM, ~5.7k octets ROM and ~3.6 basic
255	         operations) [ICASSP] [G.711.0].

257	   A9  Both A-law and mu-law supported: G.711 has two operating laws,
258	         A-law and mu-law.  These two laws are also known as PCMA and
259	         PCMU in RTP applications RFC 3551 [RFC3551].

261	   These attributes generally make it trivial to compress a G.711 input
262	   frame consisting of 40, 80, 160, 240 or 320 samples.  After the input
263	   frame is presented to a G.711.0 encoder, a G.711.0 "self-describing"
264	   output frame is produced.  The number of samples contained within
265	   this frame is easily determined at the G.711.0 decoder by virtue of
266	   attribute A4.  The G.711.0 decoder can decode the G.711.0 frame back
267	   to a G.711 frame by using only data within the G.711.0 frame.

269	   Lastly we note that losing a G.711.0 encoded packet is identical in
270	   effect of losing a G.711 packet (when using RTP); this is because a
271	   G.711.0 payload, like the corresponding G.711 payload, is stateless.
272	   Thus, it is anticipated that existing G.711 PLC mechanisms will be
273	   employed when a G.711.0 packet is lost and an identical MOS
274	   degradation relative to G.711 loss will be achieved.

276	3.3.  G.711 Input Frames to G.711.0 Output Frames

278	   G.711.0 is a lossless and stateless compression of G.711 frames.  The
279	   following figure depicts this where "A" is the process of G.711.0
280	   encoding and "B" is the process of G.711.0 decoding.

282	        1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame

284	    |--------------------------|  A   |------------------------------|
285	    |    G.711 Input Frame     |----->|     G.711.0 Output Frame     |
286	    |       of X Octets        |      |  containing 1 to X+1 Octets  |
287	    | (where X MUST be 40, 80, |      | (precise value dependent on  |
288	    | 160, 240 or 320 octets)  |<-----| G.711.0 ability to compress) |
289	    |__________________________|  B   |______________________________|

291	                                 Figure 1

293	   Note that the mapping is 1:1 (lossless) in both directions, subject
294	   to two constraints.  The first constraint is that the input frame
295	   provided to the G.711.0 encoder (process "A") has a specific number
296	   of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240
297	   or 320 octets).  The second constraint is that the compression law
298	   used to create the G.711 input frame (A-law or mu-law) must be known,
299	   consistent with attribute A9.

301	   Subject to these two constraints, the input G.711 frame is processed
302	   by the G.711.0 encoder ("A") and produces a "self-describing" G.711.0
303	   output frame, consistent with attribute A4.  Depending on the source
304	   G.711 symbols, the G.711.0 output frame can contain anywhere from 1
305	   to X+1 octets, where X is the number of input G.711 symbols.
306	   Compression results for virtually every zero-mean acoustic signal
307	   encoded by G.711.0.

309	   Since the G.711.0 output frame is "self-describing", a G.711.0
310	   decoder (process "B") can losslessly reproduce the original G.711
311	   input frame with only the knowledge of which companding law was used
312	   (A-law or mu-law).  The G.711.0 frame, being "self-describing",
313	   allows for the G.711.0 decoder ("B") to know precisely how many G.711
314	   symbols to create.

316	   Since G.711.0 was designed with typical G.711 payload lengths as a
317	   design constraint (attribute A5), this lossless encoding can be
318	   performed only with knowledge of the companding law being used.  This
319	   information is anticipated to be signaled in SDP and will be
320	   described later in this document.

322	   If the original inputs were known to be from a zero-mean acoustic
323	   signal coded by G.711, an intelligent G.711.0 encoder could infer the
324	   G.711 companding law in use (via G.711 input signal amplitude
325	   histogram statistics).  Likewise, an intelligent G.711.0 decoder
326	   producing G.711 from the G.711.0 frames could also infer which
327	   encoding law in use.  Thus G.711.0 could be designed for use in
328	   applications that have limited stream signaling between the G.711
329	   endpoints (i.e., they only know "G.711 at 8k sampling is being used",
330	   but nothing more).  Such usage is not further described in this
331	   document.  Additionally, if the original inputs were known to come
332	   from zero-mean acoustic signals, an intelligent G.711.0 encoder could
333	   tell if the G.711.0 payload had been encrypted - as the symbols would
334	   not have the distribution expected in either companding law and would
335	   appear random.  Such determination is also not further discussed in
336	   this document.

338	   It is easily seen that this process is 1:1 and that G.711.0 based
339	   lossless compression can be employed multiple times, as the original
340	   G.711 input symbols are always reproduced with 100% fidelity.

342	3.3.1.  Multiple G.711.0 Output Frames per RTP Payload Considerations

344	   As a general rule, G.711.0 frames containing more source G.711
345	   symbols (from a given channel) will typically result in higher
346	   compression, but there are exceptions to this rule.  A G.711.0
347	   encoder may choose to encode 20 ms of input G.711 symbols as: 1) a
348	   single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3)
349	   any other combination of 5 ms or 10 ms G.711.0 frames - depending on
350	   which encoding resulted in fewer bits.  As an example, an intelligent
351	   encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0
352	   frames if the first 10 ms was "silence" and two G.711.0 frames took
353	   fewer bits than any other possible encoding combination of G.711.0
354	   frame sizes.

356	   During the process of G.711.0 standardization it was recognized that
357	   although it is sometimes advantageous to encode integer multiples of
358	   40 G.711 symbols in whatever input symbol format resulted in the most
359	   compression (as per above), the simplest choice is to encode the
360	   entire ptime's worth of input G.711 symbols into one G.711.0 frame
361	   (if the ptime supported it).  This is especially so since the larger
362	   number of source G.711 symbols typically resulted in the highest
363	   compression anyway and there is added complexity in searching for
364	   other possibilities (involving more G.711.0 frames) which were
365	   unlikely to produce a more bit efficient result.

367	   The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of
368	   multiple G.711.0 input frames in that the decoder was defined to
369	   decode what it refers to as an incoming "bit stream".  For this
370	   specification, the bit stream is the G.711.0 RTP payload itself.
371	   Thus, the decoder will take the G.711.0 RTP payload and will produce
372	   an output frame containing the original G.711 symbols independent of
373	   how many G.711.0 frames were present in it.  Additionally, any number
374	   of 0x00 padding octets placed between the G.711.0 frames will be
375	   silently (and safely) ignored by the G.711.0 decoding process
376	   Section 4.2.3).

378	   To recap, a G.711.0 encoder may choose to encode incoming G.711
379	   symbols into one or more than one G.711.0 frames and put the
380	   resultant frame(s) into the G.711.0 RTP payload.  Zero or more 0x00
381	   padding octets may also be included in the G.711.0 RTP payload.  The
382	   G.711.0 decoder, being insensitive to the number of G.711.0 encoded
383	   frames that are contained within it, will decode the G.711.0 RTP
384	   payload into the source G.711 symbols.  Although examples of single
385	   or multiple G.711 frame cases will be illustrated in Section 4.2, the
386	   multiple G.711.0 frame cases MUST be supported and there is no need
387	   for negotiation (SDP or otherwise) required for it.

389	4.  RTP Header and Payload

391	   In this section we describe the precise format for G.711.0 frames
392	   carried via RTP.  We begin with RTP header description relative to
393	   G.711, then provide two G.711.0 payload examples.

395	4.1.  G.711.0 RTP Header

397	   Relative to G.711 RTP headers, the utilization of G.711.0 does not
398	   create any special requirements with respect to the contents of the
399	   RTP packet header.  The only significant difference is that the
400	   payload type (PT) RTP header field will have a value corresponding to
401	   the dynamic payload type assigned to the flow.  This is in contrast
402	   to most current uses of G.711 which typically use the static payload
403	   assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though
404	   the negotiation and use of dynamic payload types is allowed for
405	   G.711.

407	   Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is
408	   negotiated because G.711.0 obtains high compression during "VAD
409	   silence intervals" and one of the advantages of G.711.0 over G.711
410	   with VAD is the lack of any VAD-inducing artifacts in the received
411	   signal.  However, if VAD is employed, the Marker bit (M) MUST be set
412	   in the first packet of a talkspurt (the first packet after a silence
413	   period in which packets have not been transmitted contiguously as per
414	   rules specified in [RFC3551] for G.711 payloads).  This definition,
415	   being consistent with the G.711 RTP VAD use, further allows lossless
416	   transcoding between G.711 RTP packets and G.711.0 RTP packets as
417	   described in Section 3.1.

419	   With this introduction, the RTP packet header fields are defined as
420	   follows:

422	      V - As per [RFC3550]
423	      P - As per [RFC3550]

425	      X - As per [RFC3550]

427	      CC - As per [RFC3550]

429	      M - As per [RFC3550] and [RFC3551]

431	      PT - The assignment of an RTP payload type for the format defined
432	      in this memo is outside the scope of this document.  The RTP
433	      profiles in use currently mandate binding the payload type
434	      dynamically for this payload format.

436	      SN - As per [RFC3550]

438	      timestamp - As per [RFC3550]

440	      SSRC - As per [RFC3550]

442	      CSRC - As per [RFC3550]

444	   Where V (version bits), P (padding bit), X (extension bit), CC (CSRC
445	   count), M (marker bit), PT (payload type), SN (sequence number),
446	   timestamp, SSRC (synchronizing source) and CSRC (contributing
447	   sources) are as defined in [RFC3550] and as typically used with
448	   G.711.  PT (payload type) is as defined in [RFC3551].

450	4.2.  G.711.0 RTP Payload

452	   This section defines the G.711.0 RTP payload and illustrates it by
453	   means of two examples.

455	   The first example, in Section 4.2.1, depicts the case when it is
456	   desired to carry only one G.711.0 frame in the RTP payload.  This
457	   case is expected to be the dominant use case and is shown separately
458	   for the purposes of clarity.

460	   The second example, in Section 4.2.2, depicts the general case when
461	   it is desired to carry one or more G.711.0 frames in the RTP payload.
462	   This is the actual definition of the G.711.0 RTP payload.

464	4.2.1.  Single G.711.0 Frame per RTP Payload Example

466	   This example depicts a single G.711.0 frame in the RTP payload.  This
467	   is expected to be the dominant RTP payload case for G.711.0, as the
468	   G.711.0 encoding process supports the SDP packet times (ptime and
469	   maxptime, see [RFC4566]) commonly used when G.711 is transported in
470	   RTP.  Additionally, as mentioned previously, larger G.711.0 frames
471	   generally compress more effectively than a multiplicity of smaller
472	   G.711.0 frames.

474	   The following Figure illustrates the single G.711.0 frame per RTP
475	   payload case.

477	                 Single G.711.0 Frame in RTP Payload Case

479	                 |-------------------|-------------------|
480	                 | One G.711.0 Frame | Zero or more 0x00 |
481	                 |                   |   Padding Octets  |
482	                 |___________________|___________________|

484	                                 Figure 2

486	   Encoding Process: A single G.711.0 frame is inserted into the RTP
487	   payload.  The amount of time represented by the G.711 symbols
488	   compressed in the G.711.0 frame MUST correspond to the ptime signaled
489	   for applications using SDP.  Although generally not desired, padding
490	   desired in the RTP payload after the G.711.0 frame MAY be created by
491	   placing one or more 0x00 octets after the G.711.0 frame.  Such
492	   padding may be desired based on security considerations (see
493	   Section 10).

495	   Decoding Process: Passing the entire RTP payload to the G.711.0
496	   decoder is sufficient for the G.711.0 decoder to create the source
497	   G.711 symbols.  Any padding inserted after the G.711.0 frame (i.e.,
498	   the 0x00 octets) present in the RTP payload is silently ignored by
499	   the G.711.0 decoding process.  The decoding process is fully
500	   described in Section 4.2.3 below.

502	4.2.2.  G.711.0 RTP Payload Definition

504	   This section defines the G.711.0 RTP payload and illustrates the case
505	   of when one or more G.711.0 frames are to be placed in the payload.
506	   All G.711.0 RTP decoders MUST support the general case described in
507	   this section (rationale presented previously in Section 3.3.1).

509	   Note that since each G.711.0 frame is self-describing (see Attribute
510	   A4 in Section 3.2), the individual G.711.0 frames in the RTP payload
511	   need not represent the same duration of time (i.e., a 5 ms G.711.0
512	   frame could be followed by a 20 ms G.711.0 frame).  Owing to this,
513	   the amount of time represented in the RTP payload MAY be any integer
514	   multiple of 5 ms (as 5 ms is the smallest interval of time that can
515	   be represented in a G.711.0 frame).

517	   The following Figure illustrates the one or more G.711.0 frames per
518	   RTP payload case where the number of G.711.0 frames placed in the RTP
519	   payload is N.  We note that when N is equal to 1 that this case is
520	   identical to the previous example.

522	              One or More G.711.0 Frames in RTP Payload Case

524	       |----------|---------|----------|---------|----------------|
525	       | First    | Second  |          | Nth     | Zero or more   |
526	       | G.711.0  | G.711.0 |   ...    | G.711.0 |     0x00       |
527	       | Frame    | Frame   |          | Frame   | Padding Octets |
528	       |__________|_________|__________|_________|________________|

530	                                 Figure 3

532	   We note here that when we have multiple G.711.0 frames that the
533	   individual frames can be, and generally are, of different lengths.
534	   The decoding process in the following section is used to determine
535	   the frame boundaries.

537	   Encoding Process: One or more G.711.0 frames are placed in the RTP
538	   payload simply by concatenating the G.711.0 frames together.  The
539	   amount of time represented by the G.711 symbols compressed in all the
540	   G.711.0 frames in the RTP payload MUST correspond to the ptime
541	   signaled for applications using SDP.  Although not generally desired,
542	   padding in the RTP payload SHOULD be placed after the last G.711.0
543	   frame in the payload and MAY be created by placing one or more 0x00
544	   octets after the last G.711.0 frame.  Such padding may be desired
545	   based on security considerations (see Section 10).

547	   Decoding Process: As G.711.0 frames can be of varying length, the
548	   payload decoding process described in the following section is used
549	   to determine where the individual G.711.0 frame boundaries are.  Any
550	   padding octets inserted before or after any G.711.0 frame in the RTP
551	   payload is silently (and safely) ignored by the G.711.0 decoding
552	   process.

554	4.2.3.  G.711.0 RTP Payload Decoding Process

556	   The G.711.0 decoding process is a standard part of G.711.0 bit stream
557	   decoding and is implemented in the ITU-T Rec. G.711.0 reference code.
558	   The decoding process algorithm described in this section is a slight
559	   enhancement of the ITU-T reference code to explicitly accommodate RTP
560	   padding (as described above).

562	   Before describing the decoding, we note here that the largest
563	   possible G.711.0 frame is created whenever the largest number of
564	   G.711 symbols is encoded (320 from Section 3.2, property A5) and
565	   these 320 symbols are "uncompressible" by the G.711.0 encoder.  In
566	   this case (via property A6 in Section 3.2) the G.711.0 output frame
567	   will be 321 octets long.  We also note that the value 0x00 chosen for
568	   the optional padding cannot be the first octet of a valid ITU-T Rec.
569	   G.711.0 frame (see [G.711.0]).  We also note that whenever more than
570	   one G.711.0 frame is contained in the RTP payload, the decoding of
571	   the individual G.711.0 frames will occur multiple times.

573	   For the decoding algorithm below, let N be the number of octets in
574	   the RTP payload (i.e., excluding any RTP padding, but including any
575	   RTP payload padding), let P equal the number of RTP payload octets
576	   processed by the G.711.0 decoding process, let K be the number of
577	   G.711 symbols presently in the output buffer, let Q be the number of
578	   octets contained in the G.711.0 frame being processed and let "!="
579	   represent not equal to.  The keyword "STOP" is used below to indicate
580	   the end of the processing of G.711.0 frames in the RTP payload.  The
581	   algorithm below assumes an output buffer for the decoded G.711 source
582	   symbols of length sufficient to accommodate the expected number of
583	   G.711 symbols and an input buffer of length 321 octets.

585	   G.711.0 RTP Payload Decoding Heuristic:

587	   H1  Initialization of counters: Initialize P, the number of processed
588	         octets counter, to zero.  Initialize K, the counter for how
589	         many G.711 symbols are in the output buffer, to zero.
590	         Initialize N to the number of octets in the RTP payload
591	         (including any RTP payload padding).  Go to H2.

593	   H2  Read internal buffer: Read min{320+1, (N-P)-1} octets into the
594	         internal buffer from the (P+1) octet of the RTP payload.  We
595	         note at this point, N-P octets have yet to be processed and
596	         that 320+1 octets is the largest possible G.711.0 frame.  Also
597	         note that in the common case of zero-based array indexing of a
598	         uint8 array of octets, that this operation will read octets
599	         from index P through index [min{320+1, (N-P)}] from the RTP
600	         payload.  Go to H3.

602	   H3  Analyze the first octet in the internal buffer: If this octet
603	         0x00 (a padding octet) go to H4, otherwise go to H5 (process a
604	         G.711.0 frame).

606	   H4  Process padding octet (no G.711 symbols generated): Increment the
607	         processed packets counter by one (set P = P + 1).  If the
608	         result of this increment results in P >= N then STOP (as all
609	         RTP Payload octets have been processed), otherwise go to H2.

611	   H5  Process an individual G.711.0 frame (produce G.711 samples in the
612	         output frame): Pass the internal buffer to the G.711.0 decoder.
613	         The G.711.0 decoder will read the first octet (called the
614	         "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to
615	         determine the number of source G.711 samples M are contained in
616	         this G.711.0 frame.  The G.711.0 decoder will produce exactly M
617	         G.711 source symbols.  If K = 0, these M symbols will be the
618	         first in the output buffer and are placed at the beginning of
619	         the output buffer.  If K != 0, concatenate these M symbols with
620	         the prior symbols in the output buffer (there are K prior
621	         symbols in the buffer).  Set K = K + M (as there are now this
622	         many G.711 source symbols in the output buffer).  The G.711.0
623	         decoder will have consumed some number of octets, Q, in the
624	         internal buffer to produce the M G.711 symbols.  Increment the
625	         number of payload octet processed counter by this quantity (set
626	         P = P + Q).  If the result of this increment results in P >= N
627	         then STOP (as all RTP Payload octets have been processed),
628	         otherwise go to H2.

630	   At this point, the output buffer will contain precisely K G.711
631	   source symbols which should correspond to the ptime signaled if SDP
632	   was used and the encoding process was without error.

634	   We also note, as an aside, that the algorithm above (and the ITU-T
635	   G.711.0 reference code) accommodates padding octets (0x00) placed
636	   anywhere between G.711.0 frames in the RTP payload as well as prior
637	   to or after any or all G.711.0 frames.  The ITU-T G.711.0 reference
638	   code does not have Step H3 and H4 as separate steps (i.e., Step H5
639	   immediately follows H2) at the added computational cost of some
640	   additional buffer passing to/from the G.711.0 frame decoder
641	   functions.  That is the G.711.0 decoder in the reference code
642	   "silently ignores" 0x00 padding octets at the beginning of what it
643	   believes to be a G.711.0 encoded frame boundary.  Thus Step H3 and
644	   Step H4 above are an optimization over the reference code shown for
645	   clarity.

647	   If the decoder is at a playout endpoint location, this G.711 buffer
648	   SHOULD be used in the same manner as a received G.711 RTP payload
649	   would have been used (passed to a playout buffer, to a PLC
650	   implementation, etc.).

652	4.2.4.  G.711.0 RTP Payload for Multiple Channels

654	   In this section we describe the use of multiple "channels" of G.711
655	   data encoded by G.711.0 compression.

657	   The dominant use of G.711 in RTP transport has been for single
658	   channel use cases.  For this case, the above G.711.0 encoding and
659	   decoding process is used.  However, the multiple channel case for
660	   G.711.0 (a frame-based compression) is different from G.711 (a
661	   sample-based encoding) and is described separately here.

663	   RFC 3551 [RFC3551] provides guidelines for encoding audio channels
664	   (Section 4) and for the ordering of the channels within the RTP
665	   payload (Section 4.1).  The ordering guidelines in RFC 3551,
666	   Section 4.1 SHOULD be used unless an application-specific channel
667	   ordering is more appropriate.

669	   An implicit assumption in RFC 3551 is that all the channel data
670	   multiplexed into a RTP payload MUST represent the same physical time
671	   span.  The case for G.711.0 is no different; the underlying G.711
672	   data for all channels in a G.711.0 RTP payload MUST span the same
673	   interval in time (e.g., the same "ptime" for a SDP-specified codec
674	   negotiation).

676	   RFC 3551 provides guidelines for sample-based encodings such as G.711
677	   in Section 4.2.  This guidance is tantamount to interleaving the
678	   individual samples in that they SHOULD be packed in consecutive
679	   octets.

681	   RFC 3551 provides guidelines for frame-based encodings in which the
682	   frames are interleaved.  However, this guidance stems from the
683	   assumption that "the frame size for frame-oriented codecs is a
684	   given".  However, this assumption is not valid for G.711.0 in that
685	   individual consecutive G.711.0 frames (as per Section 4.2.2) can:

687	      1) represent different time spans (e.g., two 5 ms G.711.0 frames
688	      in lieu of one 10 ms G.711.0 frame), and

690	      2) be of different lengths in octets (and typically are).

692	   Therefore a different, but also simple, concatenation-based approach
693	   is specified in this RFC.

695	   For the multiple channel G.711.0 case, each G.711 channel is
696	   independently encoded into one or more G.711.0 frames defined here as
697	   a "G.711.0 channel superframe".  Each one of these superframes is
698	   identical to the multiple G.711.0 frame case illustrated in Figure 3
699	   of Section 4.2.2 in which each superframe can have one or more
700	   individual G.711.0 frames within it.  Then each G.711.0 channel
701	   superframe is concatenated - in channel order - into a G.711.0 RTP
702	   payload.  Then, if optional G.711.0 padding octets (0x00) are
703	   desired, it is RECOMMENDED that these octets are placed after the
704	   last G.711.0 channel superframe.  As per above, such padding may be
705	   desired based on security considerations (see Section 10).  This is
706	   depicted in the following Figure 4 below.

708	            Multiple G.711.0 Channel Superframes in RTP Payload

710	           |----------|---------|----------|---------|---------|
711	           | First    | Second  |          | Nth     | Zero    |
712	           | G.711.0  | G.711.0 |   ...    | G.711.0 | or more |
713	           | Channel  | Channel |          | Channel | 0x00    |
714	           | Super-   | Super-  |          | Super   | Padding |
715	           | Frame    | Frame   |          | Frame   | Octets  |
716	           |__________|_________|__________|_________|_________|

718	                                 Figure 4

720	   We note that although the individual superframes can be of different
721	   lengths in octets (and usually are), that the number of G.711 source
722	   symbols represented - in compressed form - in each channel superframe
723	   is identical (since all the channels represent the identically same
724	   time interval).

726	   The G.711.0 decoder at the receiving end simply decodes the entire
727	   G.711.0 (multiple channel) payload into individual G.711 symbols.  If
728	   M such G.711 symbols result and there were N channels, then the first
729	   M/N G.711 samples would be from the first channel, the second M/N
730	   G.711 samples would be from the second channel, and so on until the
731	   Nth set of G.711 samples are found.  Similarly, if the number of
732	   channels was not known, but the payload "ptime" was known, one could
733	   infer (knowing the sampling rate) how many G.711 symbols each channel
734	   contained; then with this knowledge determine how many channels of
735	   data were contained in the payload.  When SDP is used, the number of
736	   channels is known because the optional parameter is a MUST when there
737	   is more than one channel negotiated (see Section 5.1).  Additionally,
738	   when SDP is used the parameter ptime is a RECOMMENDED optional
739	   parameter.  We note that if both parameters channels and ptime are
740	   known that one could provide a check for the other and the converse.
741	   Whichever algorithm is used to determine the number of channels, if
742	   the length of the source G.711 symbols in the payload (M) is not an
743	   integer multiple of the number of channels (N), then the packet
744	   SHOULD be discarded.

746	   Lastly we note that although any padding for the multiple channel
747	   G.711.0 payload is RECOMMENDED to be placed at the end of the
748	   payload, the G.711.0 decoding algorithm described in Section 4.2.3
749	   will successfully decode the payload in Figure 4 if the 0x00 padding
750	   octet is placed anywhere before or after any individual G.711.0 frame
751	   in the RTP payload.  The number of padding octets introduced at any
752	   G.711.0 frame boundary therefore does not affect the number M of the
753	   source G.711 symbols produced.  Thus the decision for padding MAY be
754	   made on a per-superframe basis.

756	5.  Payload Format Parameters

758	   This section defines the parameters that may be used to configure
759	   optional features in the G.711.0 RTP transmission.

761	   The parameters defined here as a part of the media subtype
762	   registration for the G.711.0 codec.  Mapping of the parameters into
763	   Session Description Protocol (SDP) RFC 4566 [RFC4566] is also
764	   provided for those applications that use SDP.

766	5.1.  Media Type Registration

768	   Type name: audio

770	   Subtype name: G711-0

772	   Required parameters:

774	      clock rate: The RTP timestamp clock rate, which is equal to the
775	      sampling rate.  The typical rate used with G.711 encoding is 8000,
776	      but other rates may be specified.  The default rate is 8000.

778	      complaw: This format specific parameter, specified on the "a=fmtp:
779	      line", indicates the companding law (A-law or mu-law) employed.
780	      This format specific parameter, as per RFC 4566 [RFC4566], is
781	      given unchanged to the media tool using this format.  The case-
782	      insensitive values are "complaw=al" or "complaw=mu" are used for
783	      A-law and mu-law, respectively.

785	   Optional parameters:

787	      channels: See RFC 4566 [RFC4566] for definition.  Specifies how
788	      many audio streams are represented in the G.711.0 payload and MUST
789	      be present if the number of channels is greater than one.  This
790	      parameter defaults to 1 if not present (as per RFC 4566) and is
791	      typically a non-zero small-valued positive integer.  It is
792	      expected that implementations that specify multiple channels will
793	      also define a mechanism to map the channels appropriately within
794	      their system design, otherwise the channel order specified in RFC
795	      3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right,
796	      center, ... ).  Similar to the usual interpretation in RFC 3551
797	      [RFC3551], the number of channels SHALL be a non-zero positive
798	      integer.

800	      maxptime: See RFC 4566 [RFC4566] for definition.

802	      ptime: See RFC 4566 [RFC4566] for definition.  The inclusion of
803	      "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an
804	      application specific reason not to include it (e.g., an
805	      application that has a variable ptime on a packet-by-packet
806	      basis).  For constant ptime applications, it is considered good
807	      form to include "ptime" in the SDP for session diagnostic
808	      purposes.  For the constant ptime multiple channel case described
809	      in Section 4.2.2, the inclusion of "ptime" can provide a desirable
810	      payload check.

812	   Encoding considerations:

814	      This media type is framed binary data (see Section 4.8 in RFC 6838
815	      [RFC6838]) compressed as per ITU-T Rec. G.711.0.

817	   Security considerations:

819	      See Section 10.

821	   Interoperability considerations: none

823	   Published specification:

825	      ITU-T Rec. G.711.0 and RFC XXXX.

827	      [ RFC Editor: please replace XXXXX with a reference to this RFC ]

829	   Applications that use this media type:

831	      Although initially conceived for VoIP, the use of G.711.0, like
832	      G.711 before it, may find use within audio and video streaming
833	      and/or conferencing applications for the audio portion of those
834	      applications.

836	   Additional information:

838	   The following applies to stored-file transfer methods:

840	         Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law
841	         encodings respectively, see Section 6).

843	         File Extensions: None

845	         Macintosh file type code: None

847	         Object identifier or OIL: None

849	   Person & email address to contact for further information:

851	      Michael A.  Ramalho <mramalho@cisco.com> or <mar42@cornell.edu>

853	   Intended usage: COMMON

855	   Restrictions on usage:

857	      This media type depends on RTP framing, and hence is only defined
858	      for transfer via RTP [RFC3550].  Transport within other framing
859	      protocols is not defined at this time.

861	   Author: Michael A.  Ramalho

863	   Change controller:

865	      IETF Payload working group delegated from the IESG.

867	5.2.  Mapping to SDP Parameters

869	   The information carried in the media type specification has a
870	   specific mapping to fields in the Session Description Protocol (SDP),
871	   which is commonly used to describe RTP sessions.  When SDP is used to
872	   specify sessions employing G.711.0, the mapping is as follows:

874	   o  The media type ("audio") goes in SDP "m=" as the media name.

876	   o  The media subtype ("G711-0") goes in SDP "a=rtpmap" as the
877	      encoding name.

879	   o  The required parameter "rate" also goes in "a=rtpmap" as the clock
880	      rate.

882	   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
883	      "a=maxptime" attributes, respectively.

885	   o  Remaining parameters go in the SDP "a=fmtp" attribute by copying
886	      them directly from the media type string as a semicolon-separated
887	      list of parameter=value pairs.

889	5.3.  Offer/Answer Considerations

891	   The following considerations apply when using the SDP offer/answer
892	   RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute.

894	   o  If the offering endpoint specifies a value for the optional
895	      channels parameter greater than one and the answering endpoint
896	      both understands the parameter and cannot support that value
897	      requested, the answer MUST contain the optional channels parameter
898	      with the highest value it can support.

900	   o  If the offering endpoint specifies a value for the optional
901	      channels parameter the answer MUST contain the optional channels
902	      parameter unless the only value the answering endpoint can support
903	      is one, in which case the answer MAY contain the optional channels
904	      parameter with value of 1.

906	   o  If the offering endpoint specifies a value for the ptime parameter
907	      that the answering endpoint cannot support, the answer MUST
908	      contain the optional ptime parameter.

910	   o  If the offering endpoint specifies a value for the maxptime
911	      parameter that the answering endpoint cannot support, the answer
912	      MUST contain the optional maxptime parameter.

914	5.4.  SDP Examples

916	   The following examples illustrate how to signal G.711.0 via SDP.

918	5.4.1.  SDP Example 1

920	         m=audio RTP/AVP 98
921	         a=rtpmap:98 G711-0/8000
922	         a=fmtp:98 complaw=mu

924	   In the above example the dynamic payload type 98 is mapped to G.711.0
925	   via the "a=rtpmap" parameter.  The mandatory "complaw" is on the
926	   "a=fmtp" parameter line.  Note that neither optional parameters
927	   "ptime" nor "channels" is present; although it is generally good form
928	   to include "ptime" in the SDP for session diagnostic purposes.

930	5.4.2.  SDP Example 2

932	   The following example illustrates an offering endpoint requesting 2
933	   channels, but the answering endpoint can only support (or render) one
934	   channel.

936	   Offer:

938	         m=audio RTP/AVP 98
939	         a=rtpmap:98 G711-0/8000/2
940	         a=ptime:20
941	         a=fmtp:98 complaw=al

943	   Answer:

945	         m=audio RTP/AVP 98
946	         a=rtpmap: 98 G711-0/8000/1
947	         a=ptime: 20
948	         a=fmtp:98 complaw=al

950	   In this example the offer had an optional channels parameter.  The
951	   answer must have the optional channels parameter also unless the
952	   value in the answer is one.  Shown here is when the answer explicitly
953	   contains the channels parameter (it need not have and it would be
954	   interpreted as one channel).  As mentioned previously, it is
955	   considered good form to include "ptime" in the SDP for session
956	   diagnostic purposes if the session is a constant ptime session.

958	6.  G.711.0 Storage Mode Conventions and Definition

960	   The G.711.0 storage mode definition in this section is similar to
961	   many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a
962	   concatenation of individual G.711.0 frames.

964	   We note that something must be stored for any G.711.0 frames that not
965	   received at the receiving endpoint, no matter what the cause.  In
966	   this section we describe two mechanisms, a "G.711.0 PLC Frame" and a
967	   "G.711.0 Erasure Frame".  These G.711.0 PLC and G.711.0 Erasure
968	   Frames are described prior to the G.711.0 storage mode definition for
969	   clarity.

971	6.1.  G.711.0 PLC Frame

973	   When G.711 RTP payloads not received by a rendering endpoint a Packet
974	   Loss Concealment (PLC) mechanism is typically employed to "fill in"
975	   the missing G.711 symbols with something that is auditorially
976	   pleasing and thus the loss may be not noticed by a listener.  Such a
977	   PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1
978	   [G.711-AP1].

980	   An natural extension when creating G.711.0 frames for storage
981	   environments is to employ such a PLC mechanism to create G.711
982	   symbols for the span of time in which G.711.0 payloads were not
983	   received - and then to compress the resulting "G.711 PLC symbols" via
984	   G.711.0 compression.  The G.711.0 frame(s) created by such a process
985	   are called "G.711.0 PLC Frames".

987	   Since PLC mechanisms are designed to render missing audio data with
988	   the best fidelity and intelligibility, G.711.0 frames created via
989	   such processing is likely best for most recording situations (such as
990	   voicemail storage) unless there is a requirement not to fabricate
991	   (audio) data not actually received.

993	   After such PLC G.711 symbols have been generated and then encoded by
994	   a G.711.0 encoder, the resulting frames may be stored in G.711.0
995	   frame format.  As a result, there is nothing to specify here - the
996	   G.711.0 PLC Frames are stored as if they were received by the
997	   receiving endpoint.  In other words, PLC-generated G.711.0 frames
998	   appear as "normal" or "ordinary" G.711.0 frames in the storage mode
999	   file.

1001	6.2.  G.711.0 Erasure Frame

1003	   "Erasure Frames", or equivalently "Null Frames", have been designed
1004	   for many frame-based codecs since G.711 was standardized.  These
1005	   null/erasure frames explicitly represent data from incoming audio
1006	   that were either not received by the receiving system or represent
1007	   data that a transmitting system decided not to send.  Transmitting
1008	   systems may choose not to send data for a variety of reasons (e.g.,
1009	   not enough wireless link capacity in radio-based systems) and can
1010	   choose to send a "null frame" in lieu of the actual audio.  It is
1011	   also envisioned that erasure frames would be used in storage mode
1012	   applications for specific archival purposes where there is a
1013	   requirement not to fabricate audio data that was not actually
1014	   received.

1016	   Thus, a G.711.0 erasure frame is a representation of the amount of
1017	   time in G.711.0 frames that were not received or not encoded by the
1018	   transmitting system.

1020	   Prior to defining a G.711.0 erasure frame it is beneficial to note
1021	   what many G.711 RTP systems send when the endpoint is "muted".  When
1022	   muted, many of these systems will send an entire G.711 payload of
1023	   either 0+ or 0- (i.e., one of the two levels closest to "analog zero"
1024	   in either G.711 companding law).  Next we note that a desirable
1025	   property for a G.711.0 erasure frame is for "non G.711.0 Erasure
1026	   Frame aware" endpoints to be able to playback a G.711.0 erasure frame
1027	   with the existing G.711.0 ITU-T reference code.

1029	   A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the
1030	   corresponding G.711 sample values are either the value 0++ or the
1031	   value 0-- for the entirety of the G.711.0 frame.  The levels of 0++
1032	   and 0-- are defined to be the two levels above or below analog zero,
1033	   respectively.  An entire frame of value 0++ or 0-- is expected to be
1034	   extraordinarily rare when the frame was in fact generated by a
1035	   natural signal (on the order of one in 2^{ptime in samples, minus
1036	   one}), as analog inputs such as speech and music are zero-mean and
1037	   are typically acoustically coupled to digital sampling systems.  Note
1038	   that the playback of a G.711.0 frame characterized as an erasure
1039	   frame is auditorially equivalent to a muted signal (a very low value
1040	   constant).

1042	   These G.711.0 erasure frames can be reasonably characterized as null
1043	   or erasure frames while meeting the desired playback goal of being
1044	   decoded by the G.711.0 ITU-T reference code.  Thus, similarly to
1045	   G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or
1046	   "ordinary" G.711.0 frames in the storage mode format.

1048	6.3.  G.711.0 Storage Mode Definition

1050	   The storage format is used for storing G.711.0 encoded frames.  The
1051	   format for the G.711.0 storage mode file defined by this RFC is shown
1052	   below.

1054	                        G.711.0 Storage Mode Format

1056	          |---------------------------|----------|--------------|
1057	          |       Magic Number        |          |              |
1058	          |                           |  Version | Concatenated |
1059	          | "#!G7110A\n" (for A-law)  |   Octet  |   G.711.0    |
1060	          |            or             |          |    Frames    |
1061	          | "#!G7110M\n" (for mu-law) |  "0x00"  |              |
1062	          |___________________________|__________|______________|

1064	                                 Figure 5

1066	   The storage mode file consists of a magic number and a version octet
1067	   followed by the individual G.711.0 frames concatenated together.

1069	   The magic number for G.711.0 A-law corresponds to the ASCII character
1070	   string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41
1071	   0x0A".  Likewise, the magic number for G.711.0 MU-law corresponds to
1072	   the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37
1073	   0x31 0x31 0x4E 0x4D 0x0A".

1075	   The version number octet allows for the future specification of other
1076	   G.711.0 storage mode formats.  The specification of other storage
1077	   mode formats may be desirable as G.711.0 frames are of variable
1078	   length and a future format may include an indexing methodology that
1079	   would enable playout far into a long G.711.0 recording without the
1080	   necessity of decoding all the G.711.0 frames since the beginning of
1081	   the recording.  Other future format specification may include support
1082	   for multiple channels, metadata and the like.  For these reasons it
1083	   was determined that a versioning strategy was desirable for the
1084	   G.711.0 storage mode definition specified by this RFC.  This RFC only
1085	   specifies Version 0 and thus the value of "0x00" MUST be used for the
1086	   storage mode defined by this RFC.

1088	   The G.711.0 codec data frames, including any necessary erasure or PLC
1089	   frames, are stored in consecutive order concatenated together as
1090	   shown in Section 4.2.2.  As the Version 0 storage mode only supports
1091	   a single channel, the RTP payload format supporting multiple channels
1092	   defined in Section 4.2.4 is not supported in this storage mode
1093	   definition.

1095	   To decode the individual G.711.0 frames, the algorithm presented in
1096	   Section 4.2.2 may be used to decode the individual G.711.0 frames.
1097	   If the version octet is determined not to be zero, the remainder of
1098	   the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T
1099	   G.711.0 reference decoder can only decode concatenated G.711.0 frames
1100	   and has not been designed to decode elements in yet to be specified
1101	   future storage mode formats.

1103	7.  Acknowledgements

1105	   There have been many people contributing to G.711.0 in the course of
1106	   its development.  The people listed here deserve special mention:
1107	   Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
1108	   Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
1109	   Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
1110	   Yutaka Kamamoto, and Csaba Kos.  The review and oversight by the IETF
1111	   Payload Working Group chairs Ali Begen and Roni Even during the
1112	   development of this RFC is appreciated.  Additionally, the careful
1113	   review and comments by Richard Barnes is likewise very much
1114	   appreciated.

1116	8.  Contributors

1118	   The authors thank everyone who have contributed to this document.
1119	   The people listed here deserve special mention: Ali Begen, Roni Even,
1120	   and Hadriel Kaplan.

1122	9.  IANA Considerations

1124	   One media type (audio/G711-0) has been defined and requires IANA
1125	   registration in the media types registry.  See Section 5.1 for
1126	   details.

1128	10.  Security Considerations

1130	   RTP packets using the payload format defined in this specification
1131	   are subject to the security considerations discussed in the RTP
1132	   specification [RFC3550], and in any appropriate RTP profile (for
1133	   example RFC 3551 [RFC3551] or [RFC4585]).  This implies that
1134	   confidentiality of the media streams is achieved by encryption; for
1135	   example, through the application of SRTP [RFC3711].  Because the data
1136	   compression used with this payload format is applied end-to-end, any
1137	   encryption needs to be performed after compression.

1139	   Note that the appropriate mechanism to ensure confidentiality and
1140	   integrity of RTP packets and their payloads is very dependent on the
1141	   application and on the transport and signaling protocols employed.
1142	   Thus, although SRTP is given as an example above, other possible
1143	   choices exist.

1145	   Note that end-to-end security with either authentication, integrity
1146	   or confidentiality protection will prevent a network element not
1147	   within the security context from performing media-aware operations
1148	   other than discarding complete packets.  To allow any (media-aware)
1149	   intermediate network element to perform its operations, it is
1150	   required to be a trusted entity which is included in the security
1151	   context establishment.

1153	   G.711.0 has no known denial-of-service attacks due to decoding, as
1154	   data posing as a desired G711.0 payload will be decoded into
1155	   something (as per the decoding algorithm) with a finite amount of
1156	   computation.  This is due to the decompression algorithm having a
1157	   finite worst-case processing path (no infinite computational loops
1158	   are possible).  We also note that the data read by the G.711.0
1159	   decoder is controlled by the length of the individual encoded G.711.0
1160	   frame(s) contained in the RTP payload.  The decoding algorithm
1161	   specified in Section 4.2.3 above ensures that the G.711.0 decoder
1162	   will not read beyond the length of the internal buffer specified
1163	   (which is in turn specified to be no greater than the largest
1164	   possible G.711.0 frame of 321 octets).  Therefore a G.711.0 payload
1165	   does not carry "active content" that could impose malicious side-
1166	   effects upon the receiver.

1168	   G.711.0 is a variable bit rate (VBR) audio codec.  There have been
1169	   recent concerns with VBR speech codecs where a passive observer can
1170	   identify phrases from a standard speech corpus by means of the
1171	   lengths produced by the encoder even when the payload is encrypted
1172	   [IEEE].  In this paper, it was determined that some code excited
1173	   linear prediction (CELP) codecs would produce discrete packet lengths
1174	   for some phonemes.  And furthermore with the use of appropriately
1175	   designed Hidden Markov Models (HMMs) that such a system could predict
1176	   phrases with unexpected accuracy.  One CELP codec studied, SPEEX, had
1177	   the property that it produced 21 different packet lengths in its
1178	   wideband mode and that these packet lengths probabilistically mapped
1179	   to phonemes that a HMM system could be trained on.  In this paper it
1180	   was determined that a mitigation technique would be to pad the output
1181	   of the encoder with random padding lengths to the effect: 1) that
1182	   more discrete payload sizes would result, and 2) that the
1183	   probabilistic mapping to phonemes would become less clear.  As G.711
1184	   is not a speech model based codec, neither is G.711.0.  A G.711.0
1185	   encoding, during talking periods, produces frames of varying frame
1186	   lengths which are not likely to have a strong mapping to phonemes.

1188	   Thus G.711.0 is not expected to have this same vulnerability.  It
1189	   should be noted that "silence" (only one value of G.711 in the entire
1190	   G.711 input frame)" or "near silence" (only a few G.711 values) is
1191	   easily detectable as G.711.0 frame lengths or one or a few octets.
1192	   If one desires to mitigate for silence/non-silence detection,
1193	   statistically variable padding should be added to G.711.0 frames that
1194	   resulted in very small G.711.0 frames (less than about 20% of the
1195	   symbols of the corresponding G.711 input frame).  Methods of
1196	   introducing padding in the G.711.0 payloads have been provided in the
1197	   G.711.0 RTP payload definition in Section 4.2.2.

1199	11.  Congestion Control

1201	   The G.711 codec is a Constant Bit Rate (CBR) codec which does not
1202	   have a means to regulate the bitrate.  The G.711.0 lossless
1203	   compression algorithm typically compresses the G.711 CBR stream into
1204	   a smaller VBR stream.  However, being lossless, it does not possess
1205	   means of further reducing the bitrate beyond the G.711.0-based
1206	   compression result.  The G.711.0 RTP payloads can be made arbitrarily
1207	   large by means of adding optional padding bytes (subject only to MTU
1208	   limitations).

1210	   Therefore, there are no explicit ways to regulate the bit-rate of the
1211	   transmissions outlined in this RTP Payload format except by means of
1212	   modulating the number of optional padding bytes in the RTP payload.

1214	12.  References

1216	12.1.  Normative References

1218	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1219	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1221	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1222	              Description Protocol", RFC 4566, July 2006.

1224	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
1225	              Specifications and Registration Procedures", BCP 13, RFC
1226	              6838, January 2013.

1228	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1229	              Jacobson, "RTP: A Transport Protocol for Real-Time
1230	              Applications", STD 64, RFC 3550, July 2003.

1232	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1233	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1234	              July 2003.

1236	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1237	              "Extended RTP Profile for Real-time Transport Control
1238	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
1239	              2006.

1241	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1242	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1243	              RFC 3711, March 2004.

1245	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1246	              with Session Description Protocol (SDP)", RFC 3264, June
1247	              2002.

1249	   [G.711.0]  ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless
1250	              Compression of G.711 Pulse Code Modulation", September
1251	              2009.

1253	   [G.711]    ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code
1254	              Modulation (PCM) of Voice Frequencies", November 1988.

1256	   [G.711-AP1]
1257	              ITU-T G.711 Appendix 1, , "Recommendation G.711
1258	              Appendix 1: A high quality low-complexity algorithm for
1259	              packet loss concealment with G.711", September 1999.

1261	   [G.711-A1]
1262	              ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711
1263	              Amendment 1 - Amendment 1: New Annex A on Lossless
1264	              Encoding of PCM Frames", September 2009.

1266	12.2.  Informative References

1268	   [G.729]    ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of
1269	              speech at 8 kbit/s using conjugate-structure algebraic-
1270	              code-excited linear prediction (CS-ACELP)", January 2007.

1272	   [G.722]    ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio-
1273	              coding within 64 kbit/s", November 1988.

1275	   [ICASSP]   N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M.
1276	              A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H.
1277	              Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0
1278	              - Lossless Compression of G.711 Pulse Code Modulation,
1279	              International Conference on Acoustics Speech and Signal
1280	              Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
1281	              March 2010.

1283	   [IEEE]     C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, ,
1284	              and G.M. Masson, "Spot Me if You Can: Uncovering Spoken
1285	              Phrases in Encrypted VoIP Conversations, IEEE Symposium on
1286	              Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May
1287	              2008.

1289	Authors' Addresses

1291	   Michael A. Ramalho (editor)
1292	   Cisco Systems, Inc.
1293	   6310 Watercrest Way Unit 203
1294	   Lakewood Ranch, FL  34202
1295	   USA

1297	   Phone: +1 919 476 2038
1298	   Email: mramalho@cisco.com

1300	   Paul E. Jones
1301	   Cisco Systems, Inc.
1302	   7025 Kit Creek Rd.
1303	   Research Triangle Park, NC  27709
1304	   USA

1306	   Phone: +1 919 476 2048
1307	   Email: paulej@packetizer.com

1309	   Noboru Harada
1310	   NTT Communications Science Labs.
1311	   3-1 Morinosato-Wakamiya
1312	   Atsugi, Kanagawa  243-0198
1313	   JAPAN

1315	   Phone: +81 46 240 3676
1316	   Email: harada.noboru@lab.ntt.co.jp

1318	   Muthu Arul Mozhi Perumal
1319	   Ericsson
1320	   Ferns Icon
1321	   Doddanekundi, Mahadevapura
1322	   Bangalore, Karnataka  560037
1323	   India

1325	   Phone: +91 9449288768
1326	   Email: muthu.arul@gmail.com
1327	   Lei Miao
1328	   Huawei Technologies Co. Ltd
1329	   Q22-2-A15R, Enviroment Protection Park
1330	   No. 156 Beiqing Road
1331	   HaiDian District
1332	   Beijing  100095
1333	   China

1335	   Phone: +86 1059728300
1336	   Email: lei.miao@huawei.com