idnits 2.17.1 

draft-ietf-payload-g7110-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 11, 2015) is 3266 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1'


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    M. Ramalho, Ed.
3	Internet-Draft                                                  P. Jones
4	Intended status: Standards Track                           Cisco Systems
5	Expires: November 12, 2015                                     N. Harada
6	                                                                     NTT
7	                                                              M. Perumal
8	                                                                Ericsson
9	                                                                 L. Miao
10	                                                     Huawei Technologies
11	                                                            May 11, 2015

13	                     RTP Payload Format for G.711.0
14	                      draft-ietf-payload-g7110-06

16	Abstract

18	   This document specifies the Real-Time Transport Protocol (RTP)
19	   payload format for ITU-T Recommendation G.711.0.  ITU-T Rec. G.711.0
20	   defines a lossless and stateless compression for G.711 packet
21	   payloads typically used in IP networks.  This document also defines a
22	   storage mode format for G.711.0 and a media type registration for the
23	   G.711.0 RTP payload format.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on November 12, 2015.

42	Copyright Notice

44	   Copyright (c) 2015 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
61	   3.  G.711.0 Codec Background  . . . . . . . . . . . . . . . . . .   3
62	     3.1.  General Information and Use of the ITU-T G.711.0 Codec  .   3
63	     3.2.  Key Properties of G.711.0 Design  . . . . . . . . . . . .   4
64	     3.3.  G.711 Input Frames to G.711.0 Output Frames . . . . . . .   7
65	       3.3.1.  Multiple G.711.0 Output Frames per RTP Payload
66	               Considerations  . . . . . . . . . . . . . . . . . . .   8
67	   4.  RTP Header and Payload  . . . . . . . . . . . . . . . . . . .   9
68	     4.1.  G.711.0 RTP Header  . . . . . . . . . . . . . . . . . . .   9
69	     4.2.  G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . .  10
70	       4.2.1.  Single G.711.0 Frame per RTP Payload Example  . . . .  11
71	       4.2.2.  G.711.0 RTP Payload Definition  . . . . . . . . . . .  12
72	         4.2.2.1.  G.711.0 RTP Payload Encoding Process  . . . . . .  13
73	       4.2.3.  G.711.0 RTP Payload Decoding Process  . . . . . . . .  14
74	       4.2.4.  G.711.0 RTP Payload for Multiple Channels . . . . . .  16
75	   5.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  18
76	     5.1.  Media Type Registration . . . . . . . . . . . . . . . . .  18
77	     5.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .  21
78	     5.3.  Offer/Answer Considerations . . . . . . . . . . . . . . .  21
79	     5.4.  SDP Examples  . . . . . . . . . . . . . . . . . . . . . .  22
80	       5.4.1.  SDP Example 1 . . . . . . . . . . . . . . . . . . . .  22
81	       5.4.2.  SDP Example 2 . . . . . . . . . . . . . . . . . . . .  22
82	   6.  G.711.0 Storage Mode Conventions and Definition . . . . . . .  23
83	     6.1.  G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . .  23
84	     6.2.  G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . .  23
85	     6.3.  G.711.0 Storage Mode Definition . . . . . . . . . . . . .  24
86	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  26
87	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  26
88	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  26
89	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  26
90	   11. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  28
91	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  28
92	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  28
93	     12.2.  Informative References . . . . . . . . . . . . . . . . .  29
94	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  30

96	1.  Introduction

98	   The International Telecommunication Union (ITU-T) Recommendation
99	   G.711.0 [G.711.0] specifies a stateless and lossless compression for
100	   G.711 packet payloads typically used in Voice over IP (VoIP)
101	   networks.  This document specifies the Real-Time Transport Protocol
102	   (RTP) RFC 3550 [RFC3550] payload format and storage modes for this
103	   compression.

105	2.  Requirements Language

107	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
108	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
109	   document are to be interpreted as described in RFC 2119 [RFC2119].

111	3.  G.711.0 Codec Background

113	   ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless
114	   compression mechanism for ITU-T Recommendation G.711 [G.711] and thus
115	   is not a "codec" in the sense of "lossy" codecs typically carried by
116	   RTP.  When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as
117	   if it were a codec, with the understanding that ITU-T Rec. G.711.0
118	   losslessly encoded the underlying (lossy) G.711 pulse code modulation
119	   (PCM) sample representation of an audio signal.  For this reason
120	   ITU-T Rec. G.711.0 will be interchangeably referred to in this
121	   document as a "lossless data compression algorithm" or a "codec",
122	   depending on context.  Within this document, individual G.711 PCM
123	   samples will be referred to as "G.711 symbols" or just "symbols" for
124	   brevity.

126	   This section describes the ITU-T Recommendation G.711 [G.711] codec,
127	   its properties, typical uses cases and its key design properties.

129	3.1.  General Information and Use of the ITU-T G.711.0 Codec

131	   ITU-T Recommendation G.711 is the benchmark standard for narrowband
132	   telephony.  It has been successful for many decades because of its
133	   proven voice quality, ubiquity and utility.  A new ITU-T
134	   recommendation, G.711.0, has been established for defining a
135	   stateless and lossless compression for G.711 packet payloads
136	   typically used in VoIP networks.  ITU-T Rec. G.711.0 is also known as
137	   ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is
138	   effectively a pointer ITU-T Rec. G.711.0.  Henceforth in this
139	   document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0"
140	   and ITU-T Rec. G.711 simply as "G.711".

142	   G.711.0 may be employed end-to-end; in which case the RTP payload
143	   format specification and use is nearly identical to the G.711 RTP
144	   specification found in RFC 3551 [RFC3551].  The only significant
145	   difference for G.711.0 is the required use of a dynamic payload type
146	   (the static PT of 0 or 8 is presently almost always used with G.711
147	   even though dynamic assignment of other payload types is allowed) and
148	   the recommendation not to use Voice Activity Detection (see
149	   Section 4.1).

151	   G.711.0, being both lossless and stateless, may also be employed as a
152	   lossless compression mechanism for G.711 payloads anywhere between
153	   end systems which have negotiated use of G.711.  Because the only
154	   significance between the G.711 RTP payload format header and the
155	   G.711.0 payload format header defined in this document is the payload
156	   type, a G.711 RTP packet can be losslessly converted to a G.711.0 RTP
157	   packet simply by compressing the G.711 payload (thus creating a
158	   G.711.0 payload), changing the payload type to the dynamic value
159	   desired and copying all the remaining G.711 RTP header fields into
160	   the corresponding G.711.0 RTP header.  In a similar manner, the
161	   corresponding decompression of the G.711.0 RTP packet thus created
162	   back to the original source G.711 RTP packet can be accomplished by
163	   losslessly decompressing the G.711.0 payload back to the original
164	   source G.711 payload, changing the payload type back to the payload
165	   type of the original G.711 RTP packet and copying all the remaining
166	   G.711.0 RTP header fields into the corresponding G.711 RTP header.
167	   As a packet produced by the compression and decompression as
168	   described above is indistinguishable in every detail to the source
169	   G.711 packet, such compression can be made invisible to the end
170	   systems.  Specification of how systems on the path between the end
171	   systems discover each other and negotiate the use of G.711.0
172	   compression as described in this paragraph is outside the scope of
173	   this document.

175	   It is special to note that G.711.0, being both lossless and
176	   stateless, can be employed multiple times (e.g., on multiple,
177	   individual hops or series of hops) of a given flow with no
178	   degradation of quality relative to end-to-end G.711.  Stated another
179	   way, multiple "lossless transcodes" from/to G.711.0/G.711 do not
180	   affect voice quality as typically occurs with lossy transcodes to/
181	   from dissimilar codecs.

183	   Lastly, it is expected that G.711.0 will be used as an archival
184	   format for recorded G.711 streams.  Therefore, a G.711.0 Storage Mode
185	   Format is also included in this document.

187	3.2.  Key Properties of G.711.0 Design

189	   The fundamental design of G.711.0 resulted from the desire to
190	   losslessly encode and compress frames of G.711 symbols independent of
191	   what types of signals those G.711 frames contained.  The primary
192	   G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals
193	   (such as speech and music).

195	   G.711.0 attributes are below:

197	   A1  Compression for zero-mean acoustic signals: G.711.0 was designed
198	         as its primary use case for the compression of G.711 payloads
199	         that contained "speech" or other zero-mean acoustic signals.
200	         G.711.0 obtains greater than 50% average compression in service
201	         provider environments [ICASSP].

203	   A2  Lossless for any G.711 payload: G.711.0 was designed to be
204	         lossless for any valid G.711 payload - even if the payload
205	         consisted of apparently random G.711 symbols (e.g., a modem or
206	         FAX payload).  G.711.0 could be used for "aggregate 64 kbps
207	         G.711 channels" carried over IP without explicit concern if a
208	         subset of these channels happened to be carrying something
209	         other than voice or general audio.  To the extent that a
210	         particular channel carried something other than voice or
211	         general audio, G.711.0 ensured that it was carried losslessly,
212	         if not significantly compressed.

214	   A3  Stateless: Compression of a frame of G.711 symbols was only to be
215	         dependent on that frame and not on any prior frame.  Although
216	         greater compression is usually available by observing a longer
217	         history of past G.711 symbols, it was decided that the
218	         compression design would be stateless to completely eliminate
219	         error propagation common in many lossy codec designs (e.g.,
220	         ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]).  That is,
221	         the decoding process need not be concerned about lost prior
222	         packets because the decompression of a given G.711.0 frame is
223	         not dependent on potentially lost prior G.711.0 frames.  Owing
224	         to this stateless property, the frames input to the G.711.0
225	         encoder may be changed "on-the-fly" (a 5 ms encoding could be
226	         followed by a 20 ms encoding).

228	   A4  Self-describing: This property is defined as the ability to
229	         determine how many source G.711 samples are contained within
230	         the G.711.0 frame solely by information contained within the
231	         G.711.0 frame.  Generally, the number of source G.711 symbols
232	         can be determined by decoding the initial octets of the
233	         compressed G.711.0 frame (these octets are called "prefix
234	         codes" in the standard).  A G.711.0 decoder need not know how
235	         many symbols are contained in the original G.711 frame (e.g.,
236	         parameter ptime in Session Description Protocol, SDP,
237	         [RFC4566]), as it is able to decompress the G.711.0 frame
238	         presented to it without signaling knowledge.

240	   A5  Accommodate G.711 payload sizes typically used in IP: G.711 input
241	         frames of length typically found in VoIP applications represent
242	         SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms or 40 ms.  Since
243	         the dominant sampling frequency for G.711 is 8000 samples per
244	         second, G.711.0 was designed to compress G.711 input frames of
245	         40, 80, 160, 240 or 320 samples.

247	   A6  Bounded expansion: Since attribute A2 above requires G.711.0 to
248	         be lossless for any payload (which could consist of any
249	         combination of octets with each octet spanning the entire space
250	         of 2^8 values), by definition there exists at least one
251	         potential G.711 payload which must be "uncompressible".  Since
252	         the quantum of compression is an octet, the minimum expansion
253	         of such an uncompressible payload was designed to be the
254	         minimum possible of one octet.  Thus G.711.0 "compressed"
255	         frames can be of length one octet to X+1 octets, where X is the
256	         size of the input G.711 frame in octets.  G.711.0 can therefore
257	         be viewed as a Variable Bit Rate (VBR) encoding in which the
258	         size of the G.711.0 output frame is a function of the G.711
259	         symbols input to it.

261	   A7  Algorithmic delay: G.711.0 was designed to have the algorithmic
262	         delay equal to the time represented by the number of samples in
263	         the G.711 input frame (i.e., no "look-ahead").

265	   A8  Low Complexity: Less than 1.0 Weighted Million Operations Per
266	         Second (WMOPS) average and low memory footprint (~5k octets
267	         RAM, ~5.7k octets ROM and ~3.6 basic operations) [ICASSP]
268	         [G.711.0].

270	   A9  Both A-law and mu-law supported: G.711 has two operating laws,
271	         A-law and mu-law.  These two laws are also known as PCMA and
272	         PCMU in RTP applications RFC 3551 [RFC3551].

274	   These attributes generally make it trivial to compress a G.711 input
275	   frame consisting of 40, 80, 160, 240 or 320 samples.  After the input
276	   frame is presented to a G.711.0 encoder, a G.711.0 "self-describing"
277	   output frame is produced.  The number of samples contained within
278	   this frame is easily determined at the G.711.0 decoder by virtue of
279	   attribute A4.  The G.711.0 decoder can decode the G.711.0 frame back
280	   to a G.711 frame by using only data within the G.711.0 frame.

282	   Lastly we note that losing a G.711.0 encoded packet is identical in
283	   effect of losing a G.711 packet (when using RTP); this is because a
284	   G.711.0 payload, like the corresponding G.711 payload, is stateless.
285	   Thus, it is anticipated that existing G.711 PLC mechanisms will be
286	   employed when a G.711.0 packet is lost and an identical MOS
287	   degradation relative to G.711 loss will be achieved.

289	3.3.  G.711 Input Frames to G.711.0 Output Frames

291	   G.711.0 is a lossless and stateless compression of G.711 frames.  The
292	   following figure depicts this where "A" is the process of G.711.0
293	   encoding and "B" is the process of G.711.0 decoding.

295	        1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame

297	    |--------------------------|  A   |------------------------------|
298	    |    G.711 Input Frame     |----->|     G.711.0 Output Frame     |
299	    |       of X Octets        |      |  containing 1 to X+1 Octets  |
300	    | (where X MUST be 40, 80, |      | (precise value dependent on  |
301	    | 160, 240 or 320 octets)  |<-----| G.711.0 ability to compress) |
302	    |__________________________|  B   |______________________________|

304	                                 Figure 1

306	   Note that the mapping is 1:1 (lossless) in both directions, subject
307	   to two constraints.  The first constraint is that the input frame
308	   provided to the G.711.0 encoder (process "A") has a specific number
309	   of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240
310	   or 320 octets).  The second constraint is that the companding law
311	   used to create the G.711 input frame (A-law or mu-law) must be known,
312	   consistent with attribute A9.

314	   Subject to these two constraints, the input G.711 frame is processed
315	   by the G.711.0 encoder ("process A") and produces a "self-describing"
316	   G.711.0 output frame, consistent with attribute A4.  Depending on the
317	   source G.711 symbols, the G.711.0 output frame can contain anywhere
318	   from 1 to X+1 octets, where X is the number of input G.711 symbols.
319	   Compression results for virtually every zero-mean acoustic signal
320	   encoded by G.711.0.

322	   Since the G.711.0 output frame is "self-describing", a G.711.0
323	   decoder (process "B") can losslessly reproduce the original G.711
324	   input frame with only the knowledge of which companding law was used
325	   (A-law or mu-law).  The first octet of a G.711.0 frame is called the
326	   "Prefix Code" octet; the information within this octet conveys how
327	   many G.711 symbols the decoder is to create from a given G.711.0
328	   input frame (i.e., 0, 40, 80, 160, 240 or 320).  The Prefix Code
329	   value of 0x00 is used to denote zero G.711 source symbols, which
330	   allows the use of 0x00 as a payload padding octet (to be described
331	   later in Section 3.3.1).

333	   Since G.711.0 was designed with typical G.711 payload lengths as a
334	   design constraint (attribute A5), this lossless encoding can be
335	   performed only with knowledge of the companding law being used.  This
336	   information is anticipated to be signaled in SDP and will be
337	   described later in this document.

339	   If the original inputs were known to be from a zero-mean acoustic
340	   signal coded by G.711, an intelligent G.711.0 encoder could infer the
341	   G.711 companding law in use (via G.711 input signal amplitude
342	   histogram statistics).  Likewise, an intelligent G.711.0 decoder
343	   producing G.711 from the G.711.0 frames could also infer which
344	   encoding law in use.  Thus G.711.0 could be designed for use in
345	   applications that have limited stream signaling between the G.711
346	   endpoints (i.e., they only know "G.711 at 8k sampling is being used",
347	   but nothing more).  Such usage is not further described in this
348	   document.  Additionally, if the original inputs were known to come
349	   from zero-mean acoustic signals, an intelligent G.711.0 encoder could
350	   tell if the G.711.0 payload had been encrypted - as the symbols would
351	   not have the distribution expected in either companding law and would
352	   appear random.  Such determination is also not further discussed in
353	   this document.

355	   It is easily seen that this process is 1:1 and that G.711.0 based
356	   lossless compression can be employed multiple times, as the original
357	   G.711 input symbols are always reproduced with 100% fidelity.

359	3.3.1.  Multiple G.711.0 Output Frames per RTP Payload Considerations

361	   As a general rule, G.711.0 frames containing more source G.711
362	   symbols (from a given channel) will typically result in higher
363	   compression, but there are exceptions to this rule.  A G.711.0
364	   encoder may choose to encode 20 ms of input G.711 symbols as: 1) a
365	   single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3)
366	   any other combination of 5 ms or 10 ms G.711.0 frames - depending on
367	   which encoding resulted in fewer bits.  As an example, an intelligent
368	   encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0
369	   frames if the first 10 ms was "silence" and two G.711.0 frames took
370	   fewer bits than any other possible encoding combination of G.711.0
371	   frame sizes.

373	   During the process of G.711.0 standardization it was recognized that
374	   although it is sometimes advantageous to encode integer multiples of
375	   40 G.711 symbols in whatever input symbol format resulted in the most
376	   compression (as per above), the simplest choice is to encode the
377	   entire ptime's worth of input G.711 symbols into one G.711.0 frame
378	   (if the ptime supported it).  This is especially so since the larger
379	   number of source G.711 symbols typically resulted in the highest
380	   compression anyway and there is added complexity in searching for
381	   other possibilities (involving more G.711.0 frames) which were
382	   unlikely to produce a more bit efficient result.

384	   The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of
385	   multiple G.711.0 input frames in that the decoder was defined to
386	   decode what it refers to as an incoming "bit stream".  For this
387	   specification, the bit stream is the G.711.0 RTP payload itself.
388	   Thus, the decoder will take the G.711.0 RTP payload and will produce
389	   an output frame containing the original G.711 symbols independent of
390	   how many G.711.0 frames were present in it.  Additionally, any number
391	   of 0x00 padding octets placed between the G.711.0 frames will be
392	   silently (and safely) ignored by the G.711.0 decoding process
393	   Section 4.2.3).

395	   To recap, a G.711.0 encoder may choose to encode incoming G.711
396	   symbols into one or more than one G.711.0 frames and put the
397	   resultant frame(s) into the G.711.0 RTP payload.  Zero or more 0x00
398	   padding octets may also be included in the G.711.0 RTP payload.  The
399	   G.711.0 decoder, being insensitive to the number of G.711.0 encoded
400	   frames that are contained within it, will decode the G.711.0 RTP
401	   payload into the source G.711 symbols.  Although examples of single
402	   or multiple G.711 frame cases will be illustrated in Section 4.2, the
403	   multiple G.711.0 frame cases MUST be supported and there is no need
404	   for negotiation (SDP or otherwise) required for it.

406	4.  RTP Header and Payload

408	   In this section we describe the precise format for G.711.0 frames
409	   carried via RTP.  We begin with RTP header description relative to
410	   G.711, then provide two G.711.0 payload examples.

412	4.1.  G.711.0 RTP Header

414	   Relative to G.711 RTP headers, the utilization of G.711.0 does not
415	   create any special requirements with respect to the contents of the
416	   RTP packet header.  The only significant difference is that the
417	   payload type (PT) RTP header field MUST have a value corresponding to
418	   the dynamic payload type assigned to the flow.  This is in contrast
419	   to most current uses of G.711 which typically use the static payload
420	   assignment of PT = 0 (PCMU) or PT = 8 (PCMA) [RFC3551] even though
421	   the negotiation and use of dynamic payload types is allowed for
422	   G.711.  With the exception of rare PT exhaustion cases, the existing
423	   G.711 PT values of 0 and 8 MUST NOT be used for G.711.0 (helping to
424	   avoid possible payload confusion with G.711 payloads).

426	   Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is
427	   negotiated because G.711.0 obtains high compression during "VAD
428	   silence intervals" and one of the advantages of G.711.0 over G.711
429	   with VAD is the lack of any VAD-inducing artifacts in the received
430	   signal.  However, if VAD is employed, the Marker bit (M) MUST be set
431	   in the first packet of a talkspurt (the first packet after a silence
432	   period in which packets have not been transmitted contiguously as per
433	   rules specified in [RFC3551] for G.711 payloads).  This definition,
434	   being consistent with the G.711 RTP VAD use, further allows lossless
435	   transcoding between G.711 RTP packets and G.711.0 RTP packets as
436	   described in Section 3.1.

438	   With this introduction, the RTP packet header fields are defined as
439	   follows:

441	      V - As per [RFC3550]

443	      P - As per [RFC3550]

445	      X - As per [RFC3550]

447	      CC - As per [RFC3550]

449	      M - As per [RFC3550] and [RFC3551]

451	      PT - The assignment of an RTP payload type for the format defined
452	      in this memo is outside the scope of this document.  The RTP
453	      profiles in use currently mandate binding the payload type
454	      dynamically for this payload format (see [RFC3550], [RFC4585]).

456	      SN - As per [RFC3550]

458	      timestamp - As per [RFC3550]

460	      SSRC - As per [RFC3550]

462	      CSRC - As per [RFC3550]

464	   Where V (version bits), P (padding bit), X (extension bit), CC (CSRC
465	   count), M (marker bit), PT (payload type), SN (sequence number),
466	   timestamp, SSRC (synchronizing source) and CSRC (contributing
467	   sources) are as defined in [RFC3550] and as typically used with
468	   G.711.  PT (payload type) is as defined in [RFC3551].

470	4.2.  G.711.0 RTP Payload

472	   This section defines the G.711.0 RTP payload and illustrates it by
473	   means of two examples.

475	   The first example, in Section 4.2.1, depicts the case when it is
476	   desired to carry only one G.711.0 frame in the RTP payload.  This
477	   case is expected to be the dominant use case and is shown separately
478	   for the purposes of clarity.

480	   The second example, in Section 4.2.2, depicts the general case when
481	   it is desired to carry one or more G.711.0 frames in the RTP payload.
482	   This is the actual definition of the G.711.0 RTP payload.

484	4.2.1.  Single G.711.0 Frame per RTP Payload Example

486	   This example depicts a single G.711.0 frame in the RTP payload.  This
487	   is expected to be the dominant RTP payload case for G.711.0, as the
488	   G.711.0 encoding process supports the SDP packet times (ptime and
489	   maxptime, see [RFC4566]) commonly used when G.711 is transported in
490	   RTP.  Additionally, as mentioned previously, larger G.711.0 frames
491	   generally compress more effectively than a multiplicity of smaller
492	   G.711.0 frames.

494	   The following Figure illustrates the single G.711.0 frame per RTP
495	   payload case.

497	                 Single G.711.0 Frame in RTP Payload Case

499	                 |-------------------|-------------------|
500	                 | One G.711.0 Frame | Zero or more 0x00 |
501	                 |                   |   Padding Octets  |
502	                 |___________________|___________________|

504	                                 Figure 2

506	   Encoding Process: A single G.711.0 frame is inserted into the RTP
507	   payload.  The amount of time represented by the G.711 symbols
508	   compressed in the G.711.0 frame MUST correspond to the ptime signaled
509	   for applications using SDP.  Although generally not desired, padding
510	   desired in the RTP payload after the G.711.0 frame MAY be created by
511	   placing one or more 0x00 octets after the G.711.0 frame.  Such
512	   padding may be desired based on security considerations (see
513	   Section 10).

515	   Decoding Process: Passing the entire RTP payload to the G.711.0
516	   decoder is sufficient for the G.711.0 decoder to create the source
517	   G.711 symbols.  Any padding inserted after the G.711.0 frame (i.e.,
518	   the 0x00 octets) present in the RTP payload is silently ignored by
519	   the G.711.0 decoding process.  The decoding process is fully
520	   described in Section 4.2.3 below.

522	4.2.2.  G.711.0 RTP Payload Definition

524	   This section defines the G.711.0 RTP payload and illustrates the case
525	   of when one or more G.711.0 frames are to be placed in the payload.
526	   All G.711.0 RTP decoders MUST support the general case described in
527	   this section (rationale presented previously in Section 3.3.1).

529	   Note that since each G.711.0 frame is self-describing (see Attribute
530	   A4 in Section 3.2), the individual G.711.0 frames in the RTP payload
531	   need not represent the same duration of time (i.e., a 5 ms G.711.0
532	   frame could be followed by a 20 ms G.711.0 frame).  Owing to this,
533	   the amount of time represented in the RTP payload MAY be any integer
534	   multiple of 5 ms (as 5 ms is the smallest interval of time that can
535	   be represented in a G.711.0 frame).

537	   The following Figure illustrates the one or more G.711.0 frames per
538	   RTP payload case where the number of G.711.0 frames placed in the RTP
539	   payload is N.  We note that when N is equal to 1 that this case is
540	   identical to the previous example.

542	              One or More G.711.0 Frames in RTP Payload Case

544	       |----------|---------|----------|---------|----------------|
545	       | First    | Second  |          | Nth     | Zero or more   |
546	       | G.711.0  | G.711.0 |   ...    | G.711.0 |     0x00       |
547	       | Frame    | Frame   |          | Frame   | Padding Octets |
548	       |__________|_________|__________|_________|________________|

550	                                 Figure 3

552	   We note here that when we have multiple G.711.0 frames that the
553	   individual frames can be, and generally are, of different lengths.
554	   The decoding process described in Section 4.2.3 is used to determine
555	   the frame boundaries.

557	   Encoding Process: One or more G.711.0 frames are placed in the RTP
558	   payload simply by concatenating the G.711.0 frames together.  The
559	   amount of time represented by the G.711 symbols compressed in all the
560	   G.711.0 frames in the RTP payload MUST correspond to the ptime
561	   signaled for applications using SDP.  Although not generally desired,
562	   padding in the RTP payload SHOULD be placed after the last G.711.0
563	   frame in the payload and MAY be created by placing one or more 0x00
564	   octets after the last G.711.0 frame.  Such padding may be desired
565	   based on security considerations (see Section 10).  Additional
566	   encoding process details and considerations are specified later in
567	   Section 4.2.2.1.

569	   Decoding Process: As G.711.0 frames can be of varying length, the
570	   payload decoding process described in Section 4.2.3 is used to
571	   determine where the individual G.711.0 frame boundaries are.  Any
572	   padding octets inserted before or after any G.711.0 frame in the RTP
573	   payload is silently (and safely) ignored by the G.711.0 decoding
574	   process specified in Section 4.2.3.

576	4.2.2.1.  G.711.0 RTP Payload Encoding Process

578	   ITU-T G.711.0 supports five possible input frame lengths: 40, 80,
579	   160, 240, and 320 samples per frame and the rationale for choosing
580	   those lengths was given in the description of property A5 in
581	   Section 3.2.  Assuming 8000 sample per second, these lengths
582	   correspond to input frames representing 5 ms, 10 ms, 20 ms, 30 ms or
583	   40 ms.  So while the standard assumed the input "bit stream"
584	   consisted of G.711 symbols of some integer multiple of 5 ms in
585	   length, it did not specify exactly what frame lengths to use as input
586	   to the G.711.0 encoder itself.  The intent of this section is to
587	   provide some guidance for the selection.

589	   Consider a typical IETF use case of 20 ms (160 octets) of G.711 input
590	   samples represented in a G.711.0 payload and signaled by using the
591	   SDP parameter ptime.  As described in Section 3.3.1, the simplest way
592	   to encode these 160 octets is to pass the entire 160 octet to the
593	   G.711.0 encoder, resulting in precisely one G.711.0 compressed frame,
594	   and put that singular frame into the G.711.0 RTP payload.  However,
595	   neither the ITU-T G.711.0 standard nor this IETF payload format
596	   mandates this.  In fact 20 ms of input G.711 symbols can be encoded
597	   as 1, 2, 3 or 4 G.711.0 frames in any one of six combinations (i.e.,
598	   {20ms}, {10ms:10ms}, {10ms:5ms:5ms}, {5ms:10ms:5ms}, {5ms:5ms:10ms},
599	   {5ms:5ms:5ms:5ms}) and any of these combinations would decompress
600	   into the same source 160 G.711 octets.  As an aside, we note that the
601	   first octet of any G.711.0 frame will be the prefix code octet and
602	   information in this octet determines how many G.711 symbols are
603	   represented in the G.711.0 frame.

605	   Notwithstanding the above, we expect one of two encodings to be used
606	   by implementers: the simplest possible (one 160 byte input to the
607	   G.711.0 encoder which usually results in the highest compression) or
608	   the combination of possible input frames to a G.711.0 encoder that
609	   resulted in the highest compression for the payload.  The explicit
610	   mention of this issue in this IETF document was deemed important
611	   because the ITU-T G.711.0 standard is silent on this issue and there
612	   is a desire for this issue to be documented in a formal Standards
613	   Developing Organization (SDO) document (i.e., here).

615	4.2.3.  G.711.0 RTP Payload Decoding Process

617	   The G.711.0 decoding process is a standard part of G.711.0 bit stream
618	   decoding and is implemented in the ITU-T Rec. G.711.0 reference code.
619	   The decoding process algorithm described in this section is a slight
620	   enhancement of the ITU-T reference code to explicitly accommodate RTP
621	   padding (as described above).

623	   Before describing the decoding, we note here that the largest
624	   possible G.711.0 frame is created whenever the largest number of
625	   G.711 symbols is encoded (320 from Section 3.2, property A5) and
626	   these 320 symbols are "uncompressible" by the G.711.0 encoder.  In
627	   this case (via property A6 in Section 3.2) the G.711.0 output frame
628	   will be 321 octets long.  We also note that the value 0x00 chosen for
629	   the optional padding cannot be the first octet of a valid ITU-T Rec.
630	   G.711.0 frame (see [G.711.0]).  We also note that whenever more than
631	   one G.711.0 frame is contained in the RTP payload, the decoding of
632	   the individual G.711.0 frames will occur multiple times.

634	   For the decoding algorithm below, let N be the number of octets in
635	   the RTP payload (i.e., excluding any RTP padding, but including any
636	   RTP payload padding), let P equal the number of RTP payload octets
637	   processed by the G.711.0 decoding process, let K be the number of
638	   G.711 symbols presently in the output buffer, let Q be the number of
639	   octets contained in the G.711.0 frame being processed and let "!="
640	   represent not equal to.  The keyword "STOP" is used below to indicate
641	   the end of the processing of G.711.0 frames in the RTP payload.  The
642	   algorithm below assumes an output buffer for the decoded G.711 source
643	   symbols of length sufficient to accommodate the expected number of
644	   G.711 symbols and an input buffer of length 321 octets.

646	   G.711.0 RTP Payload Decoding Heuristic:

648	   H1  Initialization of counters: Initialize P, the number of processed
649	         octets counter, to zero.  Initialize K, the counter for how
650	         many G.711 symbols are in the output buffer, to zero.
651	         Initialize N to the number of octets in the RTP payload
652	         (including any RTP payload padding).  Go to H2.

654	   H2  Read internal buffer: Read min{320+1, (N-P)-1} octets into the
655	         internal buffer from the (P+1) octet of the RTP payload.  We
656	         note at this point, N-P octets have yet to be processed and
657	         that 320+1 octets is the largest possible G.711.0 frame.  Also
658	         note that in the common case of zero-based array indexing of a
659	         uint8 array of octets, that this operation will read octets
660	         from index P through index [min{320+1, (N-P)}] from the RTP
661	         payload.  Go to H3.

663	   H3  Analyze the first octet in the internal buffer: If this octet
664	         0x00 (a padding octet) go to H4, otherwise go to H5 (process a
665	         G.711.0 frame).

667	   H4  Process padding octet (no G.711 symbols generated): Increment the
668	         processed packets counter by one (set P = P + 1).  If the
669	         result of this increment results in P >= N then STOP (as all
670	         RTP Payload octets have been processed), otherwise go to H2.

672	   H5  Process an individual G.711.0 frame (produce G.711 samples in the
673	         output frame): Pass the internal buffer to the G.711.0 decoder.
674	         The G.711.0 decoder will read the first octet (called the
675	         "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to
676	         determine the number of source G.711 samples M are contained in
677	         this G.711.0 frame.  The G.711.0 decoder will produce exactly M
678	         G.711 source symbols (M can only have values of 0, 40, 80, 160,
679	         240 or 320).  If K = 0, these M symbols will be the first in
680	         the output buffer and are placed at the beginning of the output
681	         buffer.  If K != 0, concatenate these M symbols with the prior
682	         symbols in the output buffer (there are K prior symbols in the
683	         buffer).  Set K = K + M (as there are now this many G.711
684	         source symbols in the output buffer).  The G.711.0 decoder will
685	         have consumed some number of octets, Q, in the internal buffer
686	         to produce the M G.711 symbols.  Increment the number of
687	         payload octet processed counter by this quantity (set P = P +
688	         Q).  If the result of this increment results in P >= N then
689	         STOP (as all RTP Payload octets have been processed), otherwise
690	         go to H2.

692	   At this point, the output buffer will contain precisely K G.711
693	   source symbols which should correspond to the ptime signaled if SDP
694	   was used and the encoding process was without error.  If ptime was
695	   signaled via SDP and the number of G.711 symbols in the output buffer
696	   is other than what corresponds to ptime, the packet MUST be discarded
697	   unless other system design knowledge allows for otherwise (e.g.,
698	   occasional 5 ms clock slips causing one more or one less G.711.0
699	   frame than nominal to be in the payload).  Lastly, due to the buffer
700	   reads in H2 being bounded (to 321 octets or less), N being bounded to
701	   the size of the G.711.0 RTP payload, and M being bounded to the
702	   number of source G.711 symbols, there is no buffer overrun risk.

704	   We also note, as an aside, that the algorithm above (and the ITU-T
705	   G.711.0 reference code) accommodates padding octets (0x00) placed
706	   anywhere between G.711.0 frames in the RTP payload as well as prior
707	   to or after any or all G.711.0 frames.  The ITU-T G.711.0 reference
708	   code does not have Step H3 and H4 as separate steps (i.e., Step H5
709	   immediately follows H2) at the added computational cost of some
710	   additional buffer passing to/from the G.711.0 frame decoder
711	   functions.  That is the G.711.0 decoder in the reference code
712	   "silently ignores" 0x00 padding octets at the beginning of what it
713	   believes to be a G.711.0 encoded frame boundary.  Thus Step H3 and
714	   Step H4 above are an optimization over the reference code shown for
715	   clarity.

717	   If the decoder is at a playout endpoint location, this G.711 buffer
718	   SHOULD be used in the same manner as a received G.711 RTP payload
719	   would have been used (passed to a playout buffer, to a PLC
720	   implementation, etc.).

722	   We explicitly note that a framing error condition will result
723	   whenever the buffer sent to a G.711.0 decoder does not begin with a
724	   valid first G.711.0 frame octet (i.e., a valid G.711.0 prefix code or
725	   a 0x00 padding octet).  The expected result is that the decoder will
726	   not produce the desired/correct G.711 source symbols.  However, as
727	   already noted, the output returned by the G.711.0 decoder will be
728	   bounded (to less than 321 octets per G.711.0 decode request) and if
729	   the number of the (presumed) G.711 symbols produced is known to be in
730	   error, the decoded output MUST be discarded.

732	4.2.4.  G.711.0 RTP Payload for Multiple Channels

734	   In this section we describe the use of multiple "channels" of G.711
735	   data encoded by G.711.0 compression.

737	   The dominant use of G.711 in RTP transport has been for single
738	   channel use cases.  For this case, the above G.711.0 encoding and
739	   decoding process is used.  However, the multiple channel case for
740	   G.711.0 (a frame-based compression) is different from G.711 (a
741	   sample-based encoding) and is described separately here.

743	   RFC 3551 [RFC3551] provides guidelines for encoding audio channels
744	   (Section 4) and for the ordering of the channels within the RTP
745	   payload (Section 4.1).  The ordering guidelines in RFC 3551,
746	   Section 4.1 SHOULD be used unless an application-specific channel
747	   ordering is more appropriate.

749	   An implicit assumption in RFC 3551 is that all the channel data
750	   multiplexed into a RTP payload MUST represent the same physical time
751	   span.  The case for G.711.0 is no different; the underlying G.711
752	   data for all channels in a G.711.0 RTP payload MUST span the same
753	   interval in time (e.g., the same "ptime" for a SDP-specified codec
754	   negotiation).

756	   RFC 3551 provides guidelines for sample-based encodings such as G.711
757	   in Section 4.2.  This guidance is tantamount to interleaving the
758	   individual samples in that they SHOULD be packed in consecutive
759	   octets.

761	   RFC 3551 provides guidelines for frame-based encodings in which the
762	   frames are interleaved.  However, this guidance stems from the
763	   assumption that "the frame size for frame-oriented codecs is a
764	   given".  However, this assumption is not valid for G.711.0 in that
765	   individual consecutive G.711.0 frames (as per Section 4.2.2) can:

767	      1) represent different time spans (e.g., two 5 ms G.711.0 frames
768	      in lieu of one 10 ms G.711.0 frame), and

770	      2) be of different lengths in octets (and typically are).

772	   Therefore a different, but also simple, concatenation-based approach
773	   is specified in this RFC.

775	   For the multiple channel G.711.0 case, each G.711 channel is
776	   independently encoded into one or more G.711.0 frames defined here as
777	   a "G.711.0 channel superframe".  Each one of these superframes is
778	   identical to the multiple G.711.0 frame case illustrated in Figure 3
779	   of Section 4.2.2 in which each superframe can have one or more
780	   individual G.711.0 frames within it.  Then each G.711.0 channel
781	   superframe is concatenated - in channel order - into a G.711.0 RTP
782	   payload.  Then, if optional G.711.0 padding octets (0x00) are
783	   desired, it is RECOMMENDED that these octets are placed after the
784	   last G.711.0 channel superframe.  As per above, such padding may be
785	   desired based on security considerations (see Section 10).  This is
786	   depicted in the following Figure 4 below.

788	            Multiple G.711.0 Channel Superframes in RTP Payload

790	           |----------|---------|----------|---------|---------|
791	           | First    | Second  |          | Nth     | Zero    |
792	           | G.711.0  | G.711.0 |   ...    | G.711.0 | or more |
793	           | Channel  | Channel |          | Channel | 0x00    |
794	           | Super-   | Super-  |          | Super   | Padding |
795	           | Frame    | Frame   |          | Frame   | Octets  |
796	           |__________|_________|__________|_________|_________|

798	                                 Figure 4

800	   We note that although the individual superframes can be of different
801	   lengths in octets (and usually are), that the number of G.711 source
802	   symbols represented - in compressed form - in each channel superframe
803	   is identical (since all the channels represent the identically same
804	   time interval).

806	   The G.711.0 decoder at the receiving end simply decodes the entire
807	   G.711.0 (multiple channel) payload into individual G.711 symbols.  If
808	   M such G.711 symbols result and there were N channels, then the first
809	   M/N G.711 samples would be from the first channel, the second M/N
810	   G.711 samples would be from the second channel, and so on until the
811	   Nth set of G.711 samples are found.  Similarly, if the number of
812	   channels was not known, but the payload "ptime" was known, one could
813	   infer (knowing the sampling rate) how many G.711 symbols each channel
814	   contained; then with this knowledge determine how many channels of
815	   data were contained in the payload.  When SDP is used, the number of
816	   channels is known because the optional parameter is a MUST when there
817	   is more than one channel negotiated (see Section 5.1).  Additionally,
818	   when SDP is used the parameter ptime is a RECOMMENDED optional
819	   parameter.  We note that if both parameters channels and ptime are
820	   known that one could provide a check for the other and the converse.
821	   Whichever algorithm is used to determine the number of channels, if
822	   the length of the source G.711 symbols in the payload (M) is not an
823	   integer multiple of the number of channels (N), then the packet
824	   SHOULD be discarded.

826	   Lastly we note that although any padding for the multiple channel
827	   G.711.0 payload is RECOMMENDED to be placed at the end of the
828	   payload, the G.711.0 decoding algorithm described in Section 4.2.3
829	   will successfully decode the payload in Figure 4 if the 0x00 padding
830	   octet is placed anywhere before or after any individual G.711.0 frame
831	   in the RTP payload.  The number of padding octets introduced at any
832	   G.711.0 frame boundary therefore does not affect the number M of the
833	   source G.711 symbols produced.  Thus the decision for padding MAY be
834	   made on a per-superframe basis.

836	5.  Payload Format Parameters

838	   This section defines the parameters that may be used to configure
839	   optional features in the G.711.0 RTP transmission.

841	   The parameters defined here are a part of the media subtype
842	   registration for the G.711.0 codec.  Mapping of the parameters into
843	   Session Description Protocol (SDP) RFC 4566 [RFC4566] is also
844	   provided for those applications that use SDP.

846	5.1.  Media Type Registration

848	   Type name: audio

850	   Subtype name: G711-0
851	   Required parameters:

853	      clock rate: The RTP timestamp clock rate, which is equal to the
854	      sampling rate.  The typical rate used with G.711 encoding is 8000,
855	      but other rates may be specified.  The default rate is 8000.

857	      complaw: This format specific parameter, specified on the "a=fmtp:
858	      line", indicates the companding law (A-law or mu-law) employed.
859	      This format specific parameter, as per RFC 4566 [RFC4566], is
860	      given unchanged to the media tool using this format.  The case-
861	      insensitive values are "complaw=al" or "complaw=mu" are used for
862	      A-law and mu-law, respectively.

864	   Optional parameters:

866	      channels: See RFC 4566 [RFC4566] for definition.  Specifies how
867	      many audio streams are represented in the G.711.0 payload and MUST
868	      be present if the number of channels is greater than one.  This
869	      parameter defaults to 1 if not present (as per RFC 4566) and is
870	      typically a non-zero small-valued positive integer.  It is
871	      expected that implementations that specify multiple channels will
872	      also define a mechanism to map the channels appropriately within
873	      their system design, otherwise the channel order specified in RFC
874	      3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right,
875	      center, ... ).  Similar to the usual interpretation in RFC 3551
876	      [RFC3551], the number of channels SHALL be a non-zero positive
877	      integer.

879	      maxptime: See RFC 4566 [RFC4566] for definition.

881	      ptime: See RFC 4566 [RFC4566] for definition.  The inclusion of
882	      "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an
883	      application specific reason not to include it (e.g., an
884	      application that has a variable ptime on a packet-by-packet
885	      basis).  For constant ptime applications, it is considered good
886	      form to include "ptime" in the SDP for session diagnostic
887	      purposes.  For the constant ptime multiple channel case described
888	      in Section 4.2.2, the inclusion of "ptime" can provide a desirable
889	      payload check.

891	   Encoding considerations:

893	      This media type is framed binary data (see Section 4.8 in RFC 6838
894	      [RFC6838]) compressed as per ITU-T Rec. G.711.0.

896	   Security considerations:

898	      See Section 10.

900	   Interoperability considerations: none

902	   Published specification:

904	      ITU-T Rec. G.711.0 and RFC XXXX.

906	      [ RFC Editor: please replace XXXXX with a reference to this RFC ]

908	   Applications that use this media type:

910	      Although initially conceived for VoIP, the use of G.711.0, like
911	      G.711 before it, may find use within audio and video streaming
912	      and/or conferencing applications for the audio portion of those
913	      applications.

915	   Additional information:

917	   The following applies to stored-file transfer methods:

919	         Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law
920	         encodings respectively, see Section 6).

922	         File Extensions: None

924	         Macintosh file type code: None

926	         Object identifier or OIL: None

928	   Person & email address to contact for further information:

930	      Michael A.  Ramalho <mramalho@cisco.com> or <mar42@cornell.edu>

932	   Intended usage: COMMON

934	   Restrictions on usage:

936	      This media type depends on RTP framing, and hence is only defined
937	      for transfer via RTP [RFC3550].  Transport within other framing
938	      protocols is not defined at this time.

940	   Author: Michael A.  Ramalho

942	   Change controller:

944	      IETF Payload working group delegated from the IESG.

946	5.2.  Mapping to SDP Parameters

948	   The information carried in the media type specification has a
949	   specific mapping to fields in the Session Description Protocol (SDP),
950	   which is commonly used to describe a RTP session.  When SDP is used
951	   to specify sessions employing G.711.0, the mapping is as follows:

953	   o  The media type ("audio") goes in SDP "m=" as the media name.

955	   o  The media subtype ("G711-0") goes in SDP "a=rtpmap" as the
956	      encoding name.

958	   o  The required parameter "rate" also goes in "a=rtpmap" as the clock
959	      rate.

961	   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
962	      "a=maxptime" attributes, respectively.

964	   o  Remaining parameters go in the SDP "a=fmtp" attribute by copying
965	      them directly from the media type string as a semicolon-separated
966	      list of parameter=value pairs.

968	5.3.  Offer/Answer Considerations

970	   The following considerations apply when using the SDP offer/answer
971	   RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute.

973	   o  If the offering endpoint specifies a value for the optional
974	      channels parameter greater than one and the answering endpoint
975	      both understands the parameter and cannot support that value
976	      requested, the answer MUST contain the optional channels parameter
977	      with the highest value it can support.

979	   o  If the offering endpoint specifies a value for the optional
980	      channels parameter the answer MUST contain the optional channels
981	      parameter unless the only value the answering endpoint can support
982	      is one, in which case the answer MAY contain the optional channels
983	      parameter with value of 1.

985	   o  If the offering endpoint specifies a value for the ptime parameter
986	      that the answering endpoint cannot support, the answer MUST
987	      contain the optional ptime parameter.

989	   o  If the offering endpoint specifies a value for the maxptime
990	      parameter that the answering endpoint cannot support, the answer
991	      MUST contain the optional maxptime parameter.

993	5.4.  SDP Examples

995	   The following examples illustrate how to signal G.711.0 via SDP.

997	5.4.1.  SDP Example 1

999	         m=audio RTP/AVP 98
1000	         a=rtpmap:98 G711-0/8000
1001	         a=fmtp:98 complaw=mu

1003	   In the above example the dynamic payload type 98 is mapped to G.711.0
1004	   via the "a=rtpmap" parameter.  The mandatory "complaw" is on the
1005	   "a=fmtp" parameter line.  Note that neither optional parameters
1006	   "ptime" nor "channels" is present; although it is generally good form
1007	   to include "ptime" in the SDP if the session is a constant ptime
1008	   session for diagnostic purposes.

1010	5.4.2.  SDP Example 2

1012	   The following example illustrates an offering endpoint requesting 2
1013	   channels, but the answering endpoint can only support (or render) one
1014	   channel.

1016	   Offer:

1018	         m=audio RTP/AVP 98
1019	         a=rtpmap:98 G711-0/8000/2
1020	         a=ptime:20
1021	         a=fmtp:98 complaw=al

1023	   Answer:

1025	         m=audio RTP/AVP 98
1026	         a=rtpmap: 98 G711-0/8000/1
1027	         a=ptime: 20
1028	         a=fmtp:98 complaw=al

1030	   In this example the offer had an optional channels parameter.  The
1031	   answer must have the optional channels parameter also unless the
1032	   value in the answer is one.  Shown here is when the answer explicitly
1033	   contains the channels parameter (it need not have and it would be
1034	   interpreted as one channel).  As mentioned previously, it is
1035	   considered good form to include "ptime" in the SDP for session
1036	   diagnostic purposes if the session is a constant ptime session.

1038	6.  G.711.0 Storage Mode Conventions and Definition

1040	   The G.711.0 storage mode definition in this section is similar to
1041	   many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a
1042	   concatenation of individual G.711.0 frames.

1044	   We note that something must be stored for any G.711.0 frames that are
1045	   not received at the receiving endpoint, no matter what the cause.  In
1046	   this section we describe two mechanisms, a "G.711.0 PLC Frame" and a
1047	   "G.711.0 Erasure Frame".  These G.711.0 PLC and G.711.0 Erasure
1048	   Frames are described prior to the G.711.0 storage mode definition for
1049	   clarity.

1051	6.1.  G.711.0 PLC Frame

1053	   When G.711 RTP payloads not received by a rendering endpoint a Packet
1054	   Loss Concealment (PLC) mechanism is typically employed to "fill in"
1055	   the missing G.711 symbols with something that is auditorially
1056	   pleasing and thus the loss may be not noticed by a listener.  Such a
1057	   PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1
1058	   [G.711-AP1].

1060	   An natural extension when creating G.711.0 frames for storage
1061	   environments is to employ such a PLC mechanism to create G.711
1062	   symbols for the span of time in which G.711.0 payloads were not
1063	   received - and then to compress the resulting "G.711 PLC symbols" via
1064	   G.711.0 compression.  The G.711.0 frame(s) created by such a process
1065	   are called "G.711.0 PLC Frames".

1067	   Since PLC mechanisms are designed to render missing audio data with
1068	   the best fidelity and intelligibility, G.711.0 frames created via
1069	   such processing is likely best for most recording situations (such as
1070	   voicemail storage) unless there is a requirement not to fabricate
1071	   (audio) data not actually received.

1073	   After such PLC G.711 symbols have been generated and then encoded by
1074	   a G.711.0 encoder, the resulting frames may be stored in G.711.0
1075	   frame format.  As a result, there is nothing to specify here - the
1076	   G.711.0 PLC Frames are stored as if they were received by the
1077	   receiving endpoint.  In other words, PLC-generated G.711.0 frames
1078	   appear as "normal" or "ordinary" G.711.0 frames in the storage mode
1079	   file.

1081	6.2.  G.711.0 Erasure Frame

1083	   "Erasure Frames", or equivalently "Null Frames", have been designed
1084	   for many frame-based codecs since G.711 was standardized.  These
1085	   null/erasure frames explicitly represent data from incoming audio
1086	   that were either not received by the receiving system or represent
1087	   data that a transmitting system decided not to send.  Transmitting
1088	   systems may choose not to send data for a variety of reasons (e.g.,
1089	   not enough wireless link capacity in radio-based systems) and can
1090	   choose to send a "null frame" in lieu of the actual audio.  It is
1091	   also envisioned that erasure frames would be used in storage mode
1092	   applications for specific archival purposes where there is a
1093	   requirement not to fabricate audio data that was not actually
1094	   received.

1096	   Thus, a G.711.0 erasure frame is a representation of the amount of
1097	   time in G.711.0 frames that were not received or not encoded by the
1098	   transmitting system.

1100	   Prior to defining a G.711.0 erasure frame it is beneficial to note
1101	   what many G.711 RTP systems send when the endpoint is "muted".  When
1102	   muted, many of these systems will send an entire G.711 payload of
1103	   either 0+ or 0- (i.e., one of the two levels closest to "analog zero"
1104	   in either G.711 companding law).  Next we note that a desirable
1105	   property for a G.711.0 erasure frame is for "non G.711.0 Erasure
1106	   Frame aware" endpoints to be able to playback a G.711.0 erasure frame
1107	   with the existing G.711.0 ITU-T reference code.

1109	   A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the
1110	   corresponding G.711 sample values are either the value 0++ or the
1111	   value 0-- for the entirety of the G.711.0 frame.  The levels of 0++
1112	   and 0-- are defined to be the two levels above or below analog zero,
1113	   respectively.  An entire frame of value 0++ or 0-- is expected to be
1114	   extraordinarily rare when the frame was in fact generated by a
1115	   natural signal, as analog inputs such as speech and music are zero-
1116	   mean and are typically acoustically coupled to digital sampling
1117	   systems.  Note that the playback of a G.711.0 frame characterized as
1118	   an erasure frame is auditorially equivalent to a muted signal (a very
1119	   low value constant).

1121	   These G.711.0 erasure frames can be reasonably characterized as null
1122	   or erasure frames while meeting the desired playback goal of being
1123	   decoded by the G.711.0 ITU-T reference code.  Thus, similarly to
1124	   G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or
1125	   "ordinary" G.711.0 frames in the storage mode format.

1127	6.3.  G.711.0 Storage Mode Definition

1129	   The storage format is used for storing G.711.0 encoded frames.  The
1130	   format for the G.711.0 storage mode file defined by this RFC is shown
1131	   below.

1133	                        G.711.0 Storage Mode Format

1135	          |---------------------------|----------|--------------|
1136	          |       Magic Number        |          |              |
1137	          |                           |  Version | Concatenated |
1138	          | "#!G7110A\n" (for A-law)  |   Octet  |   G.711.0    |
1139	          |            or             |          |    Frames    |
1140	          | "#!G7110M\n" (for mu-law) |  "0x00"  |              |
1141	          |___________________________|__________|______________|

1143	                                 Figure 5

1145	   The storage mode file consists of a magic number and a version octet
1146	   followed by the individual G.711.0 frames concatenated together.

1148	   The magic number for G.711.0 A-law corresponds to the ASCII character
1149	   string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41
1150	   0x0A".  Likewise, the magic number for G.711.0 MU-law corresponds to
1151	   the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37
1152	   0x31 0x31 0x4E 0x4D 0x0A".

1154	   The version number octet allows for the future specification of other
1155	   G.711.0 storage mode formats.  The specification of other storage
1156	   mode formats may be desirable as G.711.0 frames are of variable
1157	   length and a future format may include an indexing methodology that
1158	   would enable playout far into a long G.711.0 recording without the
1159	   necessity of decoding all the G.711.0 frames since the beginning of
1160	   the recording.  Other future format specification may include support
1161	   for multiple channels, metadata and the like.  For these reasons it
1162	   was determined that a versioning strategy was desirable for the
1163	   G.711.0 storage mode definition specified by this RFC.  This RFC only
1164	   specifies Version 0 and thus the value of "0x00" MUST be used for the
1165	   storage mode defined by this RFC.

1167	   The G.711.0 codec data frames, including any necessary erasure or PLC
1168	   frames, are stored in consecutive order concatenated together as
1169	   shown in Section 4.2.2.  As the Version 0 storage mode only supports
1170	   a single channel, the RTP payload format supporting multiple channels
1171	   defined in Section 4.2.4 is not supported in this storage mode
1172	   definition.

1174	   To decode the individual G.711.0 frames, the algorithm presented in
1175	   Section 4.2.2 may be used to decode the individual G.711.0 frames.
1176	   If the version octet is determined not to be zero, the remainder of
1177	   the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T
1178	   G.711.0 reference decoder can only decode concatenated G.711.0 frames
1179	   and has not been designed to decode elements in yet to be specified
1180	   future storage mode formats.

1182	7.  Acknowledgements

1184	   There have been many people contributing to G.711.0 in the course of
1185	   its development.  The people listed here deserve special mention:
1186	   Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
1187	   Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
1188	   Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
1189	   Yutaka Kamamoto, and Csaba Kos.  The review and oversight by the IETF
1190	   Payload Working Group chairs Ali Begen and Roni Even during the
1191	   development of this RFC is appreciated.  Additionally, the careful
1192	   review by Richard Barnes and extensive review by David Black and the
1193	   rest of the IESG is likewise very much appreciated.

1195	8.  Contributors

1197	   The authors thank everyone who have contributed to this document.
1198	   The people listed here deserve special mention: Ali Begen, Roni Even,
1199	   and Hadriel Kaplan.

1201	9.  IANA Considerations

1203	   One media type (audio/G711-0) has been defined and requires IANA
1204	   registration in the media types registry.  See Section 5.1 for
1205	   details.

1207	10.  Security Considerations

1209	   RTP packets using the payload format defined in this specification
1210	   are subject to the security considerations discussed in the RTP
1211	   specification [RFC3550], and in any appropriate RTP profile (for
1212	   example RFC 3551 [RFC3551] or [RFC4585]).  This implies that
1213	   confidentiality of the media streams is achieved by encryption; for
1214	   example, through the application of SRTP [RFC3711].  Because the data
1215	   compression used with this payload format is applied end-to-end, any
1216	   encryption needs to be performed after compression.

1218	   Note that the appropriate mechanism to ensure confidentiality and
1219	   integrity of RTP packets and their payloads is very dependent on the
1220	   application and on the transport and signaling protocols employed.
1221	   Thus, although SRTP is given as an example above, other possible
1222	   choices exist.

1224	   Note that end-to-end security with either authentication, integrity
1225	   or confidentiality protection will prevent a network element not
1226	   within the security context from performing media-aware operations
1227	   other than discarding complete packets.  To allow any (media-aware)
1228	   intermediate network element to perform its operations, it is
1229	   required to be a trusted entity which is included in the security
1230	   context establishment.

1232	   G.711.0 has no known denial-of-service attacks due to decoding, as
1233	   data posing as a desired G711.0 payload will be decoded into
1234	   something (as per the decoding algorithm) with a finite amount of
1235	   computation.  This is due to the decompression algorithm having a
1236	   finite worst-case processing path (no infinite computational loops
1237	   are possible).  We also note that the data read by the G.711.0
1238	   decoder is controlled by the length of the individual encoded G.711.0
1239	   frame(s) contained in the RTP payload.  The decoding algorithm
1240	   specified in Section 4.2.3 above ensures that the G.711.0 decoder
1241	   will not read beyond the length of the internal buffer specified
1242	   (which is in turn specified to be no greater than the largest
1243	   possible G.711.0 frame of 321 octets).  Therefore a G.711.0 payload
1244	   does not carry "active content" that could impose malicious side-
1245	   effects upon the receiver.

1247	   G.711.0 is a variable bit rate (VBR) audio codec.  There have been
1248	   recent concerns with VBR speech codecs where a passive observer can
1249	   identify phrases from a standard speech corpus by means of the
1250	   lengths produced by the encoder even when the payload is encrypted
1251	   [IEEE].  In this paper, it was determined that some code excited
1252	   linear prediction (CELP) codecs would produce discrete packet lengths
1253	   for some phonemes.  And furthermore with the use of appropriately
1254	   designed Hidden Markov Models (HMMs) that such a system could predict
1255	   phrases with unexpected accuracy.  One CELP codec studied, SPEEX, had
1256	   the property that it produced 21 different packet lengths in its
1257	   wideband mode and that these packet lengths probabilistically mapped
1258	   to phonemes that a HMM system could be trained on.  In this paper it
1259	   was determined that a mitigation technique would be to pad the output
1260	   of the encoder with random padding lengths to the effect: 1) that
1261	   more discrete payload sizes would result, and 2) that the
1262	   probabilistic mapping to phonemes would become less clear.  As G.711
1263	   is not a speech model based codec, neither is G.711.0.  A G.711.0
1264	   encoding, during talking periods, produces frames of varying frame
1265	   lengths which are not likely to have a strong mapping to phonemes.
1266	   Thus G.711.0 is not expected to have this same vulnerability.  It
1267	   should be noted that "silence" (only one value of G.711 in the entire
1268	   G.711 input frame)" or "near silence" (only a few G.711 values) is
1269	   easily detectable as G.711.0 frame lengths or one or a few octets.
1270	   If one desires to mitigate for silence/non-silence detection,
1271	   statistically variable padding should be added to G.711.0 frames that
1272	   resulted in very small G.711.0 frames (less than about 20% of the
1273	   symbols of the corresponding G.711 input frame).  Methods of
1274	   introducing padding in the G.711.0 payloads have been provided in the
1275	   G.711.0 RTP payload definition in Section 4.2.2.

1277	11.  Congestion Control

1279	   The G.711 codec is a Constant Bit Rate (CBR) codec which does not
1280	   have a means to regulate the bitrate.  The G.711.0 lossless
1281	   compression algorithm typically compresses the G.711 CBR stream into
1282	   a lower bandwidth VBR stream.  However, being lossless, it does not
1283	   possess means of further reducing the bitrate beyond the
1284	   G.711.0-based compression result.  The G.711.0 RTP payloads can be
1285	   made arbitrarily large by means of adding optional padding bytes
1286	   (subject only to MTU limitations).

1288	   Therefore, there are no explicit ways to regulate the bit-rate of the
1289	   transmissions outlined in this RTP Payload format except by means of
1290	   modulating the number of optional padding bytes in the RTP payload.

1292	12.  References

1294	12.1.  Normative References

1296	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1297	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1299	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1300	              Description Protocol", RFC 4566, July 2006.

1302	   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
1303	              Specifications and Registration Procedures", BCP 13, RFC
1304	              6838, January 2013.

1306	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1307	              Jacobson, "RTP: A Transport Protocol for Real-Time
1308	              Applications", STD 64, RFC 3550, July 2003.

1310	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1311	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1312	              July 2003.

1314	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1315	              "Extended RTP Profile for Real-time Transport Control
1316	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
1317	              2006.

1319	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1320	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1321	              RFC 3711, March 2004.

1323	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1324	              with Session Description Protocol (SDP)", RFC 3264, June
1325	              2002.

1327	   [G.711.0]  ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless
1328	              Compression of G.711 Pulse Code Modulation", September
1329	              2009.

1331	   [G.711]    ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code
1332	              Modulation (PCM) of Voice Frequencies", November 1988.

1334	   [G.711-AP1]
1335	              ITU-T G.711 Appendix 1, , "Recommendation G.711
1336	              Appendix 1: A high quality low-complexity algorithm for
1337	              packet loss concealment with G.711", September 1999.

1339	   [G.711-A1]
1340	              ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711
1341	              Amendment 1 - Amendment 1: New Annex A on Lossless
1342	              Encoding of PCM Frames", September 2009.

1344	12.2.  Informative References

1346	   [G.729]    ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of
1347	              speech at 8 kbit/s using conjugate-structure algebraic-
1348	              code-excited linear prediction (CS-ACELP)", January 2007.

1350	   [G.722]    ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio-
1351	              coding within 64 kbit/s", November 1988.

1353	   [ICASSP]   N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M.
1354	              A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H.
1355	              Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0
1356	              - Lossless Compression of G.711 Pulse Code Modulation,
1357	              International Conference on Acoustics Speech and Signal
1358	              Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
1359	              March 2010.

1361	   [IEEE]     C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, ,
1362	              and G.M. Masson, "Spot Me if You Can: Uncovering Spoken
1363	              Phrases in Encrypted VoIP Conversations, IEEE Symposium on
1364	              Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May
1365	              2008.

1367	Authors' Addresses

1369	   Michael A. Ramalho (editor)
1370	   Cisco Systems, Inc.
1371	   6310 Watercrest Way Unit 203
1372	   Lakewood Ranch, FL  34202
1373	   USA

1375	   Phone: +1 919 476 2038
1376	   Email: mramalho@cisco.com

1378	   Paul E. Jones
1379	   Cisco Systems, Inc.
1380	   7025 Kit Creek Rd.
1381	   Research Triangle Park, NC  27709
1382	   USA

1384	   Phone: +1 919 476 2048
1385	   Email: paulej@packetizer.com

1387	   Noboru Harada
1388	   NTT Communications Science Labs.
1389	   3-1 Morinosato-Wakamiya
1390	   Atsugi, Kanagawa  243-0198
1391	   JAPAN

1393	   Phone: +81 46 240 3676
1394	   Email: harada.noboru@lab.ntt.co.jp

1396	   Muthu Arul Mozhi Perumal
1397	   Ericsson
1398	   Ferns Icon
1399	   Doddanekundi, Mahadevapura
1400	   Bangalore, Karnataka  560037
1401	   India

1403	   Phone: +91 9449288768
1404	   Email: muthu.arul@gmail.com
1405	   Lei Miao
1406	   Huawei Technologies Co. Ltd
1407	   Q22-2-A15R, Enviroment Protection Park
1408	   No. 156 Beiqing Road
1409	   HaiDian District
1410	   Beijing  100095
1411	   China

1413	   Phone: +86 1059728300
1414	   Email: lei.miao@huawei.com