idnits 2.17.1 

draft-ietf-payload-g7110-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 11, 2013) is 3782 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC4855' is defined on line 1111, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2629' is defined on line 1158, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711.0'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-AP1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'G.711-A1'

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    M. Ramalho, Ed.
3	Internet-Draft                                                  P. Jones
4	Intended status: Standards Track                           Cisco Systems
5	Expires: June 14, 2014                                         N. Harada
6	                                                                     NTT
7	                                                              M. Perumal
8	                                                           Cisco Systems
9	                                                                 L. Miao
10	                                                     Huawei Technologies
11	                                                       December 11, 2013

13	                     RTP Payload Format for G.711.0
14	                      draft-ietf-payload-g7110-01

16	Abstract

18	   This document specifies the Real-Time Transport Protocol (RTP)
19	   payload format for ITU-T Recommendation G.711.0.  ITU-T Rec. G.711.0
20	   defines a lossless and stateless compression for G.711 packet
21	   payloads typically used in IP networks.  This document also defines a
22	   storage mode format for G.711.0 and a media type registration for the
23	   G.711.0 RTP payload format.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on June 14, 2014.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
61	   3.  G.711.0 Codec Background  . . . . . . . . . . . . . . . . . .   3
62	     3.1.  General Information and Use of the ITU-T G.711.0 Codec  .   3
63	     3.2.  Key Properties of G.711.0 Design  . . . . . . . . . . . .   4
64	     3.3.  G.711 Input Frames to G.711.0 Output Frames . . . . . . .   6
65	   4.  RTP Header and Payload  . . . . . . . . . . . . . . . . . . .   8
66	     4.1.  G.711.0 RTP Header  . . . . . . . . . . . . . . . . . . .   8
67	     4.2.  G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . .   9
68	       4.2.1.  Single G.711.0 Frame per RTP Payload Example  . . . .   9
69	       4.2.2.  Multiple G.711.0 Frames per RTP Payload Example . . .  10
70	       4.2.3.  G.711.0 RTP Payload Decoding Process  . . . . . . . .  12
71	       4.2.4.  G.711.0 RTP Payload for Multiple Channels . . . . . .  13
72	   5.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  15
73	     5.1.  Media Type Registration . . . . . . . . . . . . . . . . .  16
74	     5.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .  17
75	     5.3.  Offer/Answer Considerations . . . . . . . . . . . . . . .  18
76	     5.4.  SDP Examples  . . . . . . . . . . . . . . . . . . . . . .  18
77	       5.4.1.  SDP Example 1 . . . . . . . . . . . . . . . . . . . .  18
78	       5.4.2.  SDP Example 2 . . . . . . . . . . . . . . . . . . . .  19
79	   6.  G.711.0 Storage Mode Conventions and Definition . . . . . . .  19
80	     6.1.  G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . .  20
81	     6.2.  G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . .  20
82	     6.3.  G.711.0 Storage Mode Definition . . . . . . . . . . . . .  21
83	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  22
84	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  23
85	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  23
86	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  23
87	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  24
88	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  24
89	     11.2.  Informative References . . . . . . . . . . . . . . . . .  25
90	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  26

92	1.  Introduction

94	   The International Telecommunication Union (ITU-T) Recommendation
95	   G.711.0 [G.711.0] specifies a stateless and lossless compression for
96	   G.711 packet payloads typically used in Voice over IP (VoIP)
97	   networks.  This document specifies the Real-Time Transport Protocol
98	   (RTP) RFC 3550 [RFC3550] payload format and storage modes for this
99	   compression.

101	2.  Requirements Language

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in RFC 2119 [RFC2119].

107	3.  G.711.0 Codec Background

109	   ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless
110	   compression mechanism for ITU-T Recommendation G.711 [G.711] and thus
111	   is not a "codec" in the sense of "lossy" codecs typically carried by
112	   RTP.  When negotiated end-to-end ITU-T Rec. G.711.0 is negotiated as
113	   if it were a codec, with the understanding that ITU-T Rec. G.711.0
114	   losslessly encoded the underlying (lossy) G.711 pulse code modulation
115	   (PCM) sample representation of an audio signal.  For this reason
116	   ITU-T Rec. G.711.0 will be interchangeably referred to in this
117	   document as a "lossless data compression algorithm" or a "codec",
118	   depending on context.  Within this document, individual G.711 PCM
119	   samples will be referred to as "G.711 symbols" or just "symbols" for
120	   brevity.

122	   This section describes the ITU-T Recommendation G.711 [G.711] codec,
123	   its properties, typical uses cases and its key design properties.

125	3.1.  General Information and Use of the ITU-T G.711.0 Codec

127	   ITU-T Recommendation G.711 is the benchmark standard for narrowband
128	   telephony.  It has been successful for many decades because of its
129	   proven voice quality, ubiquity and utility.  A new ITU-T
130	   recommendation, G.711.0, has been established for defining a
131	   stateless and lossless compression for G.711 packet payloads
132	   typically used in VoIP networks.  ITU-T Rec. G.711.0 is also known as
133	   ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is
134	   effectively a pointer ITU-T Rec. G.711.0.  Henceforth in this
135	   document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0"
136	   and ITU-T Rec. G.711 simply as "G.711".

138	   G.711.0 may be employed end-to-end; in which case the RTP payload
139	   format specification and use is nearly identical to the G.711 RTP
140	   specification found in RFC 3550 [RFC3550].  The only significant
141	   difference for G.711.0 is the use of a dynamic payload type (the
142	   static PT of 0 or 8 are virtually always used with G.711) and the
143	   recommendation not to use Voice Activity Detection (see Section 4.1).

145	   G.711.0, being both lossless and stateless, may also be employed as a
146	   lossless compression mechanism anywhere in between end systems which
147	   have negotiated use of G.711.  Because the only significance between
148	   the G.711 RTP payload format header and the G.711.0 payload format
149	   header is the payload type, a G.711 RTP packet can be losslessly
150	   converted to a G.711.0 RTP packet simply by compressing the G.711
151	   payload (thus creating a G.711.0 payload), changing the payload type
152	   to the dynamic value desired and copying all the remaining G.711 RTP
153	   header fields into the corresponding G.711.0 RTP header.  Conversely,
154	   the corresponding decompression of a G.711.0 RTP packet back to the
155	   original source G.711 RTP packet can be accomplished by losslessly
156	   decompressing the G.711.0 payload back to the original source G.711
157	   payload, changing the payload type back to the payload type of the
158	   original G.711 RTP packet and copying all the remaining G.711.0 RTP
159	   header fields into the corresponding G.711 RTP header.

161	   It is special to note that G.711.0, being both lossless and
162	   stateless, can be employed multiple times (e.g., on multiple,
163	   individual hops or series of hops) of a given flow with no
164	   degradation of quality relative to end-to-end G.711.  Stated another
165	   way, multiple "lossless transcodes" from/to G.711.0/G.711 do not
166	   affect voice quality as typically occurs with lossy transcodes to/
167	   from dissimilar codecs.

169	   Lastly, it is expected that G.711.0 will be used as an archival
170	   format for recorded G.711 streams.  Therefore, a G.711.0 Storage Mode
171	   Format is also included in this document.

173	3.2.  Key Properties of G.711.0 Design

175	   The fundamental design of G.711.0 resulted from the desire to
176	   losslessly encode and compress frames of G.711 symbols independent of
177	   what types of signals those G.711 frames contained.  The primary
178	   G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals
179	   (such as speech and music).

181	   G.711.0 attributes are below:

183	   A1  Compression for zero-mean acoustic signals: G.711.0 was designed
184	         as its primary use case for the compression of G.711 payloads
185	         which contained "speech" or other zero-mean acoustic signals.
186	         G.711.0 obtains greater than 50% average compression in service
187	         provider environments [ICASSP].

189	   A2  Lossless for any G.711 payload: G.711.0 was designed to be
190	         lossless for any valid G.711 payload - even if the payload
191	         consisted of apparently random G.711 symbols (e.g., a modem or
192	         FAX payload).  G.711.0 could be used for "aggregate 64 kbps
193	         G.711 channels" carried over IP without explicit concern if a
194	         subset of these channels happened to be carrying something
195	         other than voice or general audio.  To the extent that a
196	         particular channel carried something other than voice or
197	         general audio, G.711.0 ensured that it was carried losslessly,
198	         if not significantly compressed.

200	   A3  Stateless: Compression of a frame of G.711 symbols was only to be
201	         dependent on that frame and not on any prior frame.  Although
202	         greater compression is usually available by observing a longer
203	         history of past G.711 symbols, it was decided that the
204	         compression design would be stateless to completely eliminate
205	         error propagation common in many lossy codec designs (e.g.,
206	         ITU-T Rec. G.729 [G.729], ITU-T Rec. G.722 [G.722]).  That is,
207	         the decoding process need not be concerned about lost prior
208	         packets because the decompression of a given G.711.0 frame is
209	         not dependent on potentially lost prior G.711.0 frames.  Owing
210	         to this stateless property, the frames input to the G.711.0
211	         encoder may be changed "on-the-fly" (a 5 ms encoding could be
212	         followed by a 20 ms encoding).

214	   A4  Self-describing: This property is defined as the ability to
215	         determine how many source G.711 samples are contained within
216	         the G.711.0 frame solely by information contained within the
217	         G.711.0 frame.  Generally, the number of source G.711 symbols
218	         can be determined by decoding the initial octets of the
219	         compressed G.711.0 frame (these octets are called "prefix
220	         codes" in the standard) [ICASSP].  A G.711.0 decoder need not
221	         know what ptime is, as it is able to decompress the G.711.0
222	         frame presented to it without signaling knowledge.

224	   A5  Accommodate G.711 payload sizes typically used in IP: G.711 input
225	         frames of length typically found in VoIP applications represent
226	         SDP ptimes (see RFC 4566 [RFC4566]) of 5 ms, 10 ms, 20 ms, 30
227	         ms or 40 ms.  Since the dominant sampling frequency for G.711
228	         is 8000 samples per second, G.711.0 was designed to compress
229	         G.711 input frames of 40, 80, 160, 240 or 320 samples.

231	   A6  Bounded expansion: Since attribute A2 above requires G.711.0 to
232	         be lossless for any payload, by definition there exists at
233	         least one potential G.711 payload which must be
234	         "uncompressible".  Since the quantum of compression is an
235	         octet, the minimum expansion of such an uncompressible payload
236	         was designed to be the minimum possible of one octet.  Thus
237	         G.711.0 "compressed" frames can be of length one octet to X+1
238	         octets, where X is the size of the input G.711 frame in octets.
239	         G.711.0 can therefore be viewed as a Variable Bit Rate (VBR)
240	         encoding in which the size of the G.711.0 output frame is a
241	         function of the G.711 symbols input to it.

243	   A7  Algorithmic delay: G.711.0 was designed to have the algorithmic
244	         delay equal to the time represented by the number of samples in
245	         the G.711 input frame (i.e., no "look-ahead").

247	   A8  Low Complexity: Less than 1.0 WMOPS average and low memory
248	         footprint (~5k octets RAM, ~5.7k octets ROM and ~3.6 basic
249	         operations) [ICASSP] [G.711.0].

251	   A9  Both A-law and Mu-law supported: G.711 has two operating laws,
252	         A-law and Mu-law.  These two laws are also known as PCMA and
253	         PCMU in RTP applicaitons RFC 3550 [RFC3550].

255	   These attributes generally make it trivial to compress a G.711 input
256	   frame consisting of 40, 80, 160, 240 or 320 samples.  After the input
257	   frame is presented to a G.711.0 encoder, a G.711.0 "self-describing"
258	   output frame is produced.  The number of samples contained within
259	   this frame is easily determined at the G.711.0 decoder by virtue of
260	   attribute A4.  The G.711.0 decoder can decode the G.711.0 frame back
261	   to a G.711 frame by using only data within the G.711.0 frame.

263	   Lastly we note that losing a G.711.0 encoded packet is identical in
264	   effect of losing a G.711 packet (when using RTP); this is because a
265	   G.711.0 payload, like the corresponding G.711 payload, is stateless.
266	   Thus, it is anticipated that existing G.711 PLC mechanisms will be
267	   employed when a G.711.0 packet is lost and an identical MOS
268	   degradation relative to G.711 loss will be achieved.

270	3.3.  G.711 Input Frames to G.711.0 Output Frames

272	   G.711.0 is a lossless and stateless compression of G.711 frames.  The
273	   following figure depicts this where "A" is the process of G.711.0
274	   encoding and "B" is the process of G.711.0 decoding.

276	        1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame

278	    |--------------------------|  A   |------------------------------|
279	    |    G.711 Input Frame     |----->|     G.711.0 Output Frame     |
280	    |       of X Octets        |      |  containing 1 to X+1 Octets  |
281	    | (where X MUST be 40, 80, |      | (precise value dependent on  |
282	    | 160, 240 or 320 octets)  |<-----| G.711.0 ability to compress) |
283	    |__________________________|  B   |______________________________|

285	                                 Figure 1

287	   Note that the mapping is 1:1 (lossless) in both directions, subject
288	   to two constraints.  The first constraint is that the input frame
289	   provided to the G.711.0 encoder (process "A") has a specific number
290	   of input G.711 symbols consistent with attribute A5 (40, 80, 160, 240
291	   or 320 octets).  The second constraint is that the compression law
292	   used to create the G.711 input frame (A-law or Mu-law) must be known,
293	   consistent with attribute A9.

295	   Subject to these two constraints, the input G.711 frame is processed
296	   by the G.711.0 encoder ("A") and produces a "self-describing" G.711.0
297	   output frame, consistent with attribute A4.  Depending on the source
298	   G.711 symbols, the G.711.0 output frame can contain anywhere from 1
299	   to X+1 octets, where X is the number of input G.711 symbols.
300	   Compression results for virtually every zero-mean acoustic signal
301	   encoded by G.711.0.

303	   Since the G.711.0 output frame is "self-describing", a G.711.0
304	   decoder (process "B") can losslessly reproduce the original G.711
305	   input frame with only the knowledge of which companding law was used
306	   (A-law or Mu-law).  The G.711.0 frame, being "self-describing",
307	   allows for the G.711.0 decoder ("B") to know precisely how many G.711
308	   symbols to create.

310	   Since G.711.0 was designed with typical G.711 payload lengths as a
311	   design constraint (attribute A5), this lossless encoding can be
312	   performed only with knowledge of the companding law being used.  This
313	   information is anticipated to be signaled in SDP and will be
314	   described later in this document.

316	   If the original inputs were known to be from a zero-mean acoustic
317	   signal coded by G.711, an intelligent G.711.0 encoder could infer the
318	   G.711 companding law in use (via G.711 input signal amplitude
319	   histogram statistics).  Likewise, an intelligent G.711.0 decoder
320	   producing G.711 from the G.711.0 frames could also infer which
321	   encoding law in use.  Thus G.711.0 could be designed for use in
322	   applications that have limited stream signaling between the G.711
323	   endpoints (i.e., they only know "G.711 at 8k sampling is being used",
324	   but nothing more).  Such usage is not further described in this
325	   document.  Additionally, if the original inputs were known to come
326	   from zero-mean acoustic signals, an intelligent G.711.0 encoder could
327	   tell if the G.711.0 payload had been encrypted - as the symbols would
328	   not have the distribution expected in either companding law and would
329	   appear random.  Such determination is also not further discussed in
330	   this document.

332	   It is easily seen that this process is 1:1 and that G.711.0 based
333	   lossless compression can be employed multiple times, as the original
334	   G.711 input symbols are always reproduced with 100% fidelity.

336	   G.711.0 frames containing more source G.711 symbols from a given
337	   channel will typically result in higher compression as a general
338	   rule, but there are exceptions.  For example, an intelligent G.711.0
339	   encoder may choose to encode 20 ms of G.711 as two individual 10 ms
340	   G.711.0 frames if a higher overall compression will result (this
341	   might occur if the first 10 ms was "silence" and two, 10 ms G.711.0
342	   frames contained fewer octets than one 20 ms G.711.0 frame).  For
343	   this reason, we will explicitly allow multiple G.711.0 encoded frames
344	   in the G.711.0 RTP payload in Section 4.2.2 below even though the
345	   usual case is anticipated to be only one G.711.0 frame per RTP
346	   payload.

348	4.  RTP Header and Payload

350	   In this section we describe the precise format for G.711.0 frames
351	   carried via RTP.  We begin with RTP header description relative to
352	   G.711, then provide two G.711.0 payload examples.

354	4.1.  G.711.0 RTP Header

356	   Relative to G.711 RTP headers, the utilization of G.711.0 does not
357	   create any special requirements with respect to the contents of the
358	   RTP packet header.  The only significant difference is that the
359	   payload type (PT) RTP header field will have a value corresponding to
360	   the dynamic payload type assigned to the flow (whereas G.711 PCMU
361	   typically has a static PT = 0 and G.711 PCMA typically has a static
362	   PT = 8 [RFC3551]).

364	   Voice Activity Detection (VAD) SHOULD NOT be used when G.711.0 is
365	   negotiated because G.711.0 obtains high compression during "VAD
366	   silence intervals" and one of the advantages of G.711.0 over G.711
367	   with VAD is the lack of any VAD-inducing artifacts in the received
368	   signal.  However, if VAD is employed, the Marker bit (M) MUST be set
369	   in the first packet of a talkspurt (the first packet after a silence
370	   period in which packets have not been transmitted contiguously as per
371	   rules specified in [RFC3550] for G.711 payloads).  This definition,
372	   being consistent with the G.711 RTP VAD use, further allows lossless
373	   transcoding between G.711 RTP packets and G.711.0 RTP packets as
374	   described in Section 3.1.

376	   With this introduction, the RTP packet header fields are defined as
377	   follows:

379	      V - As per [RFC3550]

381	      P - As per [RFC3550]

383	      X - As per [RFC3550]

385	      CC - As per [RFC3550]

387	      M - As per [RFC3550]

389	      PT- Dynamic PT assigned, consistent with MIME allocation for
390	      G711.0 defined in Media Type Definition (Section 5.1).

392	      SN - As per [RFC3550]

394	      timestamp - As per [RFC3550]

396	      SSRC - As per [RFC3550]

398	      CSRC - As per [RFC3550]

400	   Where V (version bits), P (padding bit), X (extension bit), CC (CSRC
401	   count), M (marker bit), PT (payload type), SN (sequence number),
402	   timestamp, SSRC (synchronizing source) and CSRC (contributing
403	   sources) are as defined in [RFC3550] and as typically used with
404	   G.711.  PT (payload type) is as defined in [RFC3550].

406	4.2.  G.711.0 RTP Payload

408	   In this section we provide two examples for carrying G.711.0 frames
409	   in RTP payloads.  The first example is used when it is desired to
410	   carry only one G.711.0 frame in the RTP payload.  This example is a
411	   subset of the second and shown separately for clarity.

413	4.2.1.  Single G.711.0 Frame per RTP Payload Example
414	   This example depicts a single G.711.0 frame in the RTP payload.  This
415	   is expected to be the dominant RTP payload case for G.711.0, as the
416	   G.711.0 encoding process supports the SDP packet times (ptime and
417	   maxptime, see [RFC4566]) commonly used when G.711 is transported in
418	   RTP.  Additionally, as mentioned previously, larger G.711.0 frames
419	   generally compress more effectively than a multiplicity of smaller
420	   G.711.0 frames.

422	   The following Figure illustrates the single G.711.0 frame per RTP
423	   payload case.

425	                 Single G.711.0 Frame in RTP Payload Case

427	                 |-------------------|-------------------|
428	                 | One G.711.0 Frame | Zero or more 0x00 |
429	                 |                   |   Padding Octets  |
430	                 |___________________|___________________|

432	                                 Figure 2

434	   Encoding Process: A single G.711.0 frame is inserted into the RTP
435	   payload.  The amount of time represented by the G.711 symbols
436	   compressed in the G.711.0 frame MUST correspond to the ptime signaled
437	   for applications using SDP.  Although generally not desired, padding
438	   desired in the RTP payload after the G.711.0 frame MAY be created by
439	   placing one or more 0x00 octets after the G.711.0 frame.  Such
440	   padding may be desired based on security considerations (see
441	   Section 10).

443	   Decoding Process: Passing the entire RTP payload to the G.711.0
444	   decoder is sufficient for the G.711.0 decoder to create the source
445	   G.711 symbols.  Any padding inserted after the G.711.0 frame (i.e.,
446	   the 0x00 octets) present in the RTP payload is silently ignored by
447	   the G.711.0 decoding process.  The decoding process is fully
448	   described in Section 4.2.3 below.

450	4.2.2.  Multiple G.711.0 Frames per RTP Payload Example

452	   This example depicts the case where multiple G.711.0 frames are
453	   desired in the RTP payload.

455	   As described in Section 3.3, an "intelligent G.711.0 encoder" can
456	   decide to encode, let's say, 20 ms of G.711 symbols as two, 10 ms
457	   G.711.0 frames because a greater compression is attained for that
458	   particular 20 ms segment.  The "smart encoding" of such inputs is
459	   accommodated by the ability to have multiple G.711.0 frames in the
460	   RTP payload.

462	   Note that since each G.711.0 frame is self-describing (see Attribute
463	   A4 in Section 3.2), the individual G.711.0 frames in the RTP payload
464	   need not represent the same duration of time (i.e., a 5 ms G.711.0
465	   frame could be followed by a 20 ms G.711.0 frame).  Owing to this,
466	   the amount of time represented in the RTP payload MAY be any integer
467	   multiple of 5 ms (as 5 ms is the smallest interval of time that can
468	   be represented in a G.711.0 frame).

470	   The following Figure illustrates the multiple G.711.0 frame per RTP
471	   payload case where the number of G.711.0 frames placed in the RTP
472	   payload is N.

474	                Multiple G.711.0 Frames in RTP Payload Case

476	       |----------|---------|----------|---------|----------------|
477	       | First    | Second  |          | Nth     | Zero or more   |
478	       | G.711.0  | G.711.0 |   ...    | G.711.0 |     0x00       |
479	       | Frame    | Frame   |          | Frame   | Padding Octets |
480	       |__________|_________|__________|_________|________________|

482	                                 Figure 3

484	   We note here that the individual G.711.0 frames can be, and generally
485	   are, of different lengths.  The decoding process in the following
486	   section is used to determine the frame boundaries.

488	   Encoding Process: One or more G.711.0 frames are placed in the RTP
489	   payload simply by concatenating the G.711.0 frames together.  The
490	   amount of time represented by the G.711 symbols compressed in all the
491	   G.711.0 frames in the RTP payload MUST correspond to the ptime
492	   signaled for applications using SDP.  Although not generally desired,
493	   padding in the RTP payload SHOULD be placed after the last G.711.0
494	   frame in the payload and MAY be created by placing one or more 0x00
495	   octets after the last G.711.0 frame.  Such padding may be desired
496	   based on security considerations (see Section 10).

498	   Decoding Process: As G.711.0 frames can be of varying length, the
499	   payload decoding process described in the following section is used
500	   to determine where the individual G.711.0 frame boundaries are.

502	4.2.3.  G.711.0 RTP Payload Decoding Process

504	   The G.711.0 decoding process is a standard part of G.711.0 bit stream
505	   decoding and is implemented in the ITU-T Rec. G.711.0 reference code.
506	   The decoding process heuristic described in this section is a slight
507	   enhancement of the ITU-T reference code to explicitly accommodate RTP
508	   padding (as described above).

510	   Before describing the decoding, we note here that the largest
511	   possible G.711.0 frame is created whenever the largest number of
512	   G.711 symbols is encoded (320 from Section 3.2, property A5) and
513	   these 320 symbols are "uncompressible" by the G.711.0 encoder.  In
514	   this case (via property A6 in Section 3.2) the G.711.0 output frame
515	   will be 321 octets long.  We also note that the value 0x00 chosen for
516	   the optional padding cannot be the first octet of a valid ITU-T Rec.
517	   G.711.0 frame (see [G.711.0]).  We also note that whenever more than
518	   one G.711.0 frame is contained in the RTP payload, the decoding of
519	   the individual G.711.0 frames will occur multiple times.

521	   For the decoding heuristic below, let N be the number of octets in
522	   the RTP payload (i.e., excluding any RTP padding, but including any
523	   RTP payload padding), let P equal the number of RTP payload octets
524	   processed by the G.711.0 decoding process, let K be the number of
525	   G.711 symbols presently in the output buffer, let Q be the number of
526	   octets contained in the G.711.0 frame being processed and let "!="
527	   represent not equal to.  The keyword "STOP" is used below to indicate
528	   the end of the processing of G.711.0 frames in the RTP payload.  The
529	   heuristic below assumes an output buffer for the decoded G.711 source
530	   symbols of length sufficient to accommodate the expected number of
531	   G.711 symbols and an input buffer of length 321 octets.

533	   G.711.0 RTP Payload Decoding Heuristic:

535	   H1  Initialization: Initialize the number of processed octets to zero
536	         (P = 0).  Initialize the counter for how many G.711 symbols are
537	         in the output buffer to zero (K = 0).  Initialize N to the
538	         number of octets in the RTP payload.  Go to H2.

540	   H2  Read internal buffer: Read min{320+1, (N-P)} octets into the
541	         internal buffer from the (P+1) octet of the RTP payload.  We
542	         note at this point, N-P octets have yet to be processed and
543	         that 320+1 octets is the largest possible G.711.0 frame.  Go to
544	         H3.

546	   H3  Analyze the first octet in the internal buffer: If this octet
547	         0x00 (a padding octet) go to H4, otherwise go to H5 (process a
548	         G.711.0 frame).

550	   H4  Process padding octet (no G.711 symbols generated): Increment the
551	         processed packets counter by one (set P = P + 1).  If the
552	         result of this increment results in P >= N then STOP (as all
553	         RTP Payload octets have been processed), otherwise go to H2.

555	   H5  Process an individual G.711.0 frame (produce G.711 samples in the
556	         output frame): Pass the internal buffer to the G.711.0 decoder.
557	         The G.711.0 decoder will read the first octet (called the
558	         "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to
559	         determine the number of source G.711 samples M are contained in
560	         this G.711.0 frame.  The G.711.0 decoder will produce exactly M
561	         G.711 source symbols.  If K = 0, these M symbols will be the
562	         first in the output buffer and are placed at the beginning of
563	         the output buffer.  If K != 0, concatenate these M symbols with
564	         the prior symbols in the output buffer (there are K prior
565	         symbols in the buffer).  Set K = K + M (as there are now this
566	         many G.711 source symbols in the output buffer).  The G.711.0
567	         decoder will have consumed some number of packets, Q, in the
568	         internal buffer to produce the M G.711 symbols.  Increment the
569	         number of payload octet processed counter by this quantity (set
570	         P = P + Q).  If the result of this increment results in P >= N
571	         then STOP (as all RTP Payload octets have been processed),
572	         otherwise go to H2.

574	   At this point, the output buffer will contain precisely K G.711
575	   source symbols which should correspond to the ptime signaled if SDP
576	   was used and the encoding process was without error.

578	   We also note, as an aside, that the heuristic above (and the ITU-T
579	   G.711.0 reference code) accommodates padding octets (0x00) placed
580	   anywhere in between G.711.0 frames in the RTP payload as well as
581	   prior to or after any or all G.711.0 frames.  The ITU-T G.711.0
582	   reference code does not have Step H3 and H4 as separate steps (i.e.,
583	   Step H5 immediately follows H2) at the added computational cost of
584	   some additional buffer passing to/from the G.711.0 frame decoder
585	   functions.  That is the G.711.0 decoder in the reference code
586	   "silently ignores" 0x00 padding octets at the beginning of what it
587	   believes to be a G.711.0 encoded frame boundary.  Thus Step H3 and
588	   Step H4 above are an optimization over the reference code shown for
589	   clarity.

591	   If the decoder is at a playout endpoint location, this G.711 buffer
592	   SHOULD be used in the same manner as a received G.711 RTP payload
593	   would have been used (passed to a playout buffer, to a PLC
594	   implementation, etc.).

596	4.2.4.  G.711.0 RTP Payload for Multiple Channels
597	   In this section we describe the use of multiple "channels" of G.711
598	   data encoded by G.711.0 compression.

600	   The dominant use of G.711 in RTP transport has been for single
601	   channel use cases.  For this case, the above G.711.0 encoding and
602	   decoding process is used.  However, the multiple channel case for
603	   G.711.0 (a frame-based compression) is different from G.711 (a
604	   sample-based encoding) and is described separately here.

606	   RFC 3551 [RFC3551] provides guidelines for encoding audio channels
607	   (Section 4) and for the ordering of the channels within the RTP
608	   payload (Section 4.1).  The ordering guidelines in RFC 3551,
609	   Section 4.1 SHOULD be used unless an application-specific channel
610	   ordering is more appropriate.

612	   An implicit assumption in RFC 3551 is that all the channel data
613	   multiplexed into a RTP payload MUST represent the same physical time
614	   span.  The case for G.711.0 is no different; the underlying G.711
615	   data for all channels in a G.711.0 RTP payload MUST span the same
616	   interval in time (e.g., the same "ptime" for a SDP-specified codec
617	   negotiation).

619	   RFC 3551 provides guidelines for sample-based encodings such as G.711
620	   in Section 4.2.  This guidance is tantamount to interleaving the
621	   individual samples in that they SHOULD be packed in consecutive
622	   octets.

624	   RFC 3551 provides guidelines for frame-based encodings in which the
625	   frames are interleaved.  However, this guidance stems from the
626	   assumption that "the frame size for frame-oriented codecs is a
627	   given".  However, this assumption is not valid for G.711.0 in that
628	   individual consecutive G.711.0 frames (as per Section 4.2.2) can:

630	      1) represent different time spans (e.g., two 5 ms G.711.0 frames
631	      in lieu of one 10 ms G.711.0 frame), and

633	      2) be of different lengths in octets (and typically are).

635	   Therefore a different, but also simple, concatenation-based approach
636	   is specified in this RFC.

638	   For the multiple channel G.711.0 case, each G.711 channel is
639	   independently encoded into one or more G.711.0 frames defined here as
640	   a "G.711.0 channel superframe".  Each one of these superframes is
641	   identical to the multiple G.711.0 frame case illustrated in Figure 3
642	   of Section 4.2.2 in which each superframe can have one or more
643	   individual G.711.0 frames within it.  Then each G.711.0 channel
644	   superframe is concatenated - in channel order - into a G.711.0 RTP
645	   payload.  Then, if optional G.711.0 padding octets (0x00) are
646	   desired, it is RECOMMENDED that these octets are placed after the
647	   last G.711.0 channel superframe.  As per above, such padding may be
648	   desired based on security considerations (see Section 10).  This is
649	   depicted in the following Figure 4 below.

651	            Multiple G.711.0 Channel Superframes in RTP Payload

653	           |----------|---------|----------|---------|---------|
654	           | First    | Second  |          | Nth     | Zero    |
655	           | G.711.0  | G.711.0 |   ...    | G.711.0 | or more |
656	           | Channel  | Channel |          | Channel | 0x00    |
657	           | Super-   | Super-  |          | Super   | Padding |
658	           | Frame    | Frame   |          | Frame   | Octets  |
659	           |__________|_________|__________|_________|_________|

661	                                 Figure 4

663	   The G.711.0 decoder at the receiving end simply decodes the entire
664	   G.711.0 (multiple channel) payload into individual G.711 symbols.  If
665	   M such G.711 symbols result and there were N channels, then the first
666	   M/N G.711 samples would be from the first channel, the second M/N
667	   G.711 samples would be from the second channel, and so on until the
668	   Nth set of G.711 samples are found.  Similarly, if the number of
669	   channels was not known, but the payload "ptime" was known, one could
670	   infer (knowing the sampling rate) how many G.711 symbols each channel
671	   contained; then with this knowledge determine how many channels of
672	   data were contained in the payload.  When SDP is used, the number of
673	   channels is known because the optional parameter is a MUST when there
674	   is more than one channel negotiated (see Section 5.1).  Additionally,
675	   when SDP is used the parameter ptime is a RECOMMENDED optional
676	   parameter.  We note that if both parameters channels and ptime are
677	   known that one could provide a check for the other and the converse.

679	   Lastly we note that although any padding for the multiple channel
680	   G.711.0 payload is RECOMMENDED to be placed at the end of the
681	   payload, the G.711.0 decoding heuristic described in Section 4.2.3
682	   will successfully decode the payload in Figure 4 if the 0x00 padding
683	   octet is placed anywhere before or after any individual G.711.0 frame
684	   in the RTP payload.  The number of padding octets introduced at any
685	   G.711.0 frame boundary therefore does not affect the number M of the
686	   source G.711 symbols produced.  Thus the decision for padding MAY be
687	   made on a per-superframe basis.

689	5.  Payload Format Parameters
690	   This section defines the parameters that may be used to configure
691	   optional features in the G.711.0 RTP transmission.

693	   The parameters defined here as a part of the media subtype
694	   registration for the G.711.0 codec.  Mapping of the parameters into
695	   Session Description Protocol (SDP) RFC 4566 [RFC4566] is also
696	   provided for those applications that use SDP.

698	5.1.  Media Type Registration

700	   Type name: audio

702	   Subtype name: G7110

704	   Required Parameters:

706	      rate: The RTP timestamp clock rate, which is equal to the sampling
707	      rate.  The typical rate used with G.711 encoding is 8000, but
708	      other rates may be specified.  The default rate is 8000.

710	      complaw: Indicates the companding law (A-law or mu-law) employed.
711	      The case-insensitive values are "al" or "mu" for A-law and mu-law,
712	      respectively.

714	   Optional parameters:

716	      channels: See RFC 4566 [RFC4566] for definition.  Specifies how
717	      many audio streams are represented in the G.711.0 payload and MUST
718	      be present if the number of channels is greater than one.  This
719	      parameter defaults to 1 if not present (as per RFC 4566) an is
720	      typically a non-zero small-valued positive integer.  It is
721	      expected that implementations that specify multiple channels will
722	      also define a mechanism to map the channels appropriately within
723	      their system design, otherwise the channel order specified in RFC
724	      3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right,
725	      center, ... ).

727	      maxptime: See RFC 4566 [RFC4566] for definition.

729	      ptime: See RFC 4566 [RFC4566] for definition.  The inclusion of
730	      "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an
731	      application specific reason not to include it (e.g., an
732	      application that has a variable ptime on a packet-by-packet
733	      basis).  For constant ptime applications, it is considered good
734	      form to include "ptime" in the SDP for session diagnostic
735	      purposes.  For the constant ptime multiple channel case described
736	      in Section 4.2.2, the inclusion of "ptime" can provide a desirable
737	      payload check.

739	   Encoding considerations:

741	      This media type is framed binary data (see Section 4.8 in RFC 4288
742	      [RFC4288]) compressed as per ITU-T Rec. G.711.0.

744	   Security considerations:

746	      This media type does not carry active content.  It does transfer
747	      compressed data.  See Section 4 of RFC 4856 [RFC4856].

749	   Interoperability considerations: none

751	   Published specification:

753	      ITU-T Rec. G.711.0 and RFC QQQQ.

755	      [ RFC Editor: please replace QQQQ with a reference to this RFC ]

757	   Applications that use this media type:

759	      Audio and video streaming and conferencing tools.

761	   Additional information: none

763	   Person & email address to contact for further information:

765	      Michael Ramalho <mramalho@cisco.com> or <mar42@cornell.edu>

767	   Intended usage: COMMON

769	   Restrictions on usage:

771	      This media type depends on RTP framing, and hence is only defined
772	      for transfer via RTP [RFC3550].  Transport within other framing
773	      protocols is not defined at this time.

775	   Author: Michael Ramalho

777	   Change controller:

779	      IETF Audio/Video Transport working group delegated from the IESG.

781	5.2.  Mapping to SDP Parameters

783	   The information carried in the media type specification has a
784	   specific mapping to fields in the Session Description Protocol (SDP),
785	   which is commonly used to describe RTP sessions.  When SDP is used to
786	   specify sessions employing G.711.0, the mapping is as follows:

788	   o  The media type ("audio") goes in SDP "m=" as the media name.

790	   o  The media subtype ("G7110") goes in SDP "a=rtpmap" as the encoding
791	      name.

793	   o  The required parameter "rate" also goes in "a=rtpmap" as the clock
794	      rate.

796	   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
797	      "a=maxptime" attributes, respectively.

799	   o  Remaining parameters go in the SDP "a=fmtp" attribute by copying
800	      them directly from the media type string as a semicolon-separated
801	      list of parameter=value pairs.

803	5.3.  Offer/Answer Considerations

805	   The following considerations apply when using the SDP offer/answer
806	   RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute.

808	   o  If the offering endpoint specifies a value for the optional
809	      channels parameter greater than one and the answering endpoint
810	      both understands the parameter and cannot support that value
811	      requested, the answer MUST contain the optional channels parameter
812	      with the highest value it can support.

814	   o  If the offering endpoint specifies a value for the optional
815	      channels parameter the answer MUST contain the optional channels
816	      parameter unless the only value the answering endpoint can support
817	      is one, in which case the answer MAY contain the optional channels
818	      parameter with value of 1.

820	   o  If the offering endpoint specifies a value for the ptime parameter
821	      that the answering endpoint cannot support, the answer MUST
822	      contain the optional ptime parameter.

824	   o  If the offering endpoint specifies a value for the maxptime
825	      parameter that the answering endpoint cannot support, the answer
826	      MUST contain the optional maxptime parameter.

828	5.4.  SDP Examples

830	   The following examples illustrate how to signal G.711.0 via SDP.

832	5.4.1.  SDP Example 1

834	         m=audio RTP/AVP 98
835	         a=rtpmap: 98 G7110/8000
836	         a=fmtp:98 complaw = mu

838	   In the above example the dynamic payload type 98 is mapped to G.711.0
839	   via the "a=rtpmap" parameter.  The mandatory "complaw" is on the
840	   "a=fmtp" parameter line.  Note that neither optional parameters
841	   "ptime" nor "channels" is present; although it is generally good form
842	   to include "ptime" in the SDP for session diagnostic purposes.

844	5.4.2.  SDP Example 2

846	   The following example illustrates an offering endpoint requesting 2
847	   channels, but the answering endpoint can only support (or render) one
848	   channel.

850	   Offer:

852	         m=audio RTP/AVP 98
853	         a=rtpmap: 98 G7110/8000/2
854	         a=ptime: 20
855	         a=fmtp:98 complaw = al

857	   Answer:

859	         m=audio RTP/AVP 98
860	         a=rtpmap: 98 G7110/8000/1
861	         a=ptime: 20
862	         a=fmtp:98 complaw = al

864	   In this example the offer had an optional channels parameter.  The
865	   answer must have the optional channels parameter also unless the
866	   value in the answer is one.  Shown here is when the answer explicitly
867	   contains the channels parameter (it need not have and it would be
868	   interpreted as one channel).  As mentioned previously, it is
869	   considered good form to include "ptime" in the SDP for session
870	   diagnostic purposes if the session is a contstant ptime session.

872	6.  G.711.0 Storage Mode Conventions and Definition

874	   The G.711.0 storage mode definition in this section is similar to
875	   many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a
876	   concatenation of individual G.711.0 frames.

878	   We note that something must be stored for any G.711.0 frames that not
879	   received at the receiving endpoint, no matter what the cause.  In
880	   this section we describe two mechanisms, a "G.711.0 PLC Frame" and a
881	   "G.711.0 Erasure Frame".  These G.711.0 PLC and G.711.0 Erasure
882	   Frames are described prior to the G.711.0 storage mode definition for
883	   clarity.

885	6.1.  G.711.0 PLC Frame

887	   When G.711 RTP payloads not received by a rendering endpoint a Packet
888	   Loss Concealment (PLC) mechanism is typically employed to "fill in"
889	   the missing G.711 symbols with something that is auditorially
890	   pleasing and thus the loss may be not noticed by a listener.  Such a
891	   PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1
892	   [G.711-AP1].

894	   An natural extension when creating G.711.0 frames for storage
895	   environments is to employ such a PLC mechanism to create G.711
896	   symbols for the span of time in which G.711.0 payloads were not
897	   received - and then to compress the resulting "G.711 PLC symbols" via
898	   G.711.0 compression.  The G.711.0 frame(s) created by such a process
899	   are called "G.711.0 PLC Frames".

901	   Since PLC mechanisms are designed to render missing audio data with
902	   the best fidelity and intelligibility, G.711.0 frames created via
903	   such processing is likely best for most recording situations (such as
904	   voicemail storage) unless there is a requirement not to fabricate
905	   (audio) data not actually received.

907	   After such PLC G.711 symbols have been generated and then encoded by
908	   a G.711.0 encoder, the resulting frames may be stored in G.711.0
909	   frame format.  As a result, there is nothing to specify here - the
910	   G.711.0 PLC Frames are stored as if they were received by the
911	   receiving endpoint.  In other words, PLC-generated G.711.0 frames
912	   appear as "normal" or "ordinary" G.711.0 frames in the storage mode
913	   file.

915	6.2.  G.711.0 Erasure Frame

917	   "Erasure Frames", or equivalently "Null Frames", have been designed
918	   for many frame-based codecs since G.711 was standardized.  These null
919	   /erasure frames explicitly represent data from incoming audio that
920	   were either not received by the receiving system or represent data
921	   that a transmitting system decided not to send.  Transmitting systems
922	   may choose not to send data for a variety of reasons (e.g., not
923	   enough wireless link capacity in radio-based systems) and can choose
924	   to send a "null frame" in lieu of the actual audio.  It is also
925	   envisioned that erasure frames would be used in storage mode
926	   applications for specific archival purposes where there is a
927	   requirement not to fabricate audio data that was not actually
928	   received.

930	   Thus, a G.711.0 erasure frame is a representation of the amount of
931	   time in G.711.0 frames that were not received or not encoded by the
932	   transmitting system.

934	   Prior to defining a G.711.0 erasure frame it is beneficial to note
935	   what many G.711 RTP systems send when the endpoint is "muted".  When
936	   muted, many of these systems will send an entire G.711 payload of
937	   either 0+ or 0- (i.e., one of the two levels closest to "analog zero"
938	   in either G.711 companding law).  Next we note that a desirable
939	   property for a G.711.0 erasure frame is for "non G.711.0 Erasure
940	   Frame aware" endpoints to be able to playback a G.711.0 erasure frame
941	   with the existing G.711.0 ITU-T reference code.

943	   A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the
944	   corresponding G.711 sample values are either the value 0++ or the
945	   value 0-- for the entirety of the G.711.0 frame.  The levels of 0++
946	   and 0-- are defined two levels above or below analog zero,
947	   respectively.  An entire frame of value 0++ or 0-- is expected to be
948	   extraordinarily rare when the frame was in fact generated by a
949	   natural signal (on the order of one in 2^{ptime in samples, minus
950	   one}), as analog inputs such as speech and music are zero-mean and
951	   are typically acoustically coupled to digital sampling systems.  Note
952	   that the playback of a G.711.0 frame characterized as an erasure
953	   frame is auditorially equivalent to a muted signal (a very low value
954	   constant).

956	   These G.711.0 erasure frames can be reasonably characterized as null
957	   or erasure frames while meeting the desired playback goal of being
958	   decoded by the G.711.0 ITU-T reference code.  Thus, similarly to
959	   G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or
960	   "ordinary" G.711.0 frames in the storage mode format.

962	6.3.  G.711.0 Storage Mode Definition

964	   The storage format is used for storing G.711.0 encoded frames.  The
965	   format for the G.711.0 storage mode file defined by this RFC is shown
966	   below.

968	                        G.711.0 Storage Mode Format

970	          |---------------------------|----------|--------------|
971	          |       Magic Number        |          |              |
972	          |                           |  Version | Concatenated |
973	          | "#!G7110A\n" (for A-law)  |   Octet  |   G.711.0    |
974	          |            or             |          |    Frames    |
975	          | "#!G7110M\n" (for Mu-law) |  "0x00"  |              |
976	          |___________________________|__________|______________|

978	                                 Figure 5

980	   The storage mode file consists of a magic number and a version octet
981	   followed by the individual G.711.0 frames concatenated together.

983	   The magic number for G.711.0 A-law corresponds to the ASCII character
984	   string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41
985	   0x0A".  Likewise, the magic number for G.711.0 MU-law corresponds to
986	   the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37
987	   0x31 0x31 0x4E 0x4D 0x0A".

989	   The version number octet allows for the future specification of other
990	   G.711.0 storage mode formats.  The specification of other storage
991	   mode formats may be desireable as G.711.0 frames are of variable
992	   length and a future format may include an indexing methodology that
993	   would enable playout far into a long G.711.0 recording without the
994	   necessity of decoding all the G.711.0 frames since the beginning of
995	   the recording.  Other future format specification may include support
996	   for multiple channels, metadata and the like.  For these reasons it
997	   was determined that a versioning strategy was desirable for the
998	   G.711.0 storage mode definition specified by this RFC.  This RFC only
999	   specifies Version 0 and thus the value of "0x00" must be used for the
1000	   storage mode defined by this RFC.

1002	   The G.711.0 codec data frames, including any necessary erasure or PLC
1003	   frames, are stored in consecutive order concatenated together as
1004	   shown in Section 4.2.2.

1006	   To decode the individual G.711.0 frames, the heuristic presented in
1007	   Section 4.2.2 may be used to decode the individual G.711.0 frames.
1008	   If the version octet is determined not to be zero, the remainder of
1009	   the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T
1010	   G.711.0 reference decoder can only decode concatenated G.711.0 frames
1011	   and has not been designed to decode elements in yet to be specified
1012	   future storage mode formats.

1014	7.  Acknowledgements

1016	   There have been many people contributing to G.711.0 in the course of
1017	   its development.  The people listed here deserve special mention:
1018	   Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
1019	   Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
1020	   Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
1021	   Yutaka Kamamoto, and Csaba Kos.

1023	8.  Contributors

1025	   The authors thank everyone who have contributed to this document.
1026	   The people listed here deserve special mention: Ali Begen, Roni Even,
1027	   and Hadriel Kaplan.

1029	9.  IANA Considerations

1031	   One media type (audio/G7110) has been defined and requires IANA
1032	   registration in the media types registry.  See Section 5.1 for
1033	   details.

1035	10.  Security Considerations

1037	   RTP packets using the payload format defined in this specification
1038	   are subject to the security considerations discussed in the RTP
1039	   specification [RFC3550], and in any appropriate RTP profile (for
1040	   example RFC 3551 [RFC3551] or [RFC4585].  This implies that
1041	   confidentiality of the media streams is achieved by encryption; for
1042	   example, through the application of SRTP [RFC3711].  Because the data
1043	   compression used with this payload format is applied end-to-end, any
1044	   encryption needs to be performed after compression.

1046	   Note that the appropriate mechanism to ensure confidentiality and
1047	   integrity of RTP packets and their payloads is very dependent on the
1048	   application and on the transport and signaling protocols employed.
1049	   Thus, although SRTP is given as an example above, other possible
1050	   choices exist.

1052	   Note that end-to-end security with either authentication, integrity
1053	   or confidentiality protection will prevent a network element not
1054	   within the security context from performing media-aware operations
1055	   other than discarding complete packets.  To allow any (media-aware)
1056	   intermediate network element to perform its operations, it is
1057	   required to be a trusted entity which is included in the security
1058	   context establishment.

1060	   G.711.0 has no known denial-of-service attacks due to decoding, as
1061	   data posing as a desired G711.0 payload will be decoded into
1062	   something (as per the decoding algorithm) with a finite amount of
1063	   computation.  This is due to the decompression algorithm having a
1064	   finite worst-case processing path (no infinite computational loops
1065	   are possible).

1067	   G.711.0 is a variable bit rate (VBR) audio codec.  There have been
1068	   recent concerns with VBR speech codecs where a passive observer can
1069	   identify phrases from a standard speech corpus by means of the
1070	   lengths produced by the encoder even when the payload is encrypted

1072	   [IEEE].  In this paper, it was determined that some code excited
1073	   linear prediction (CELP) codecs would produce discrete packet lengths
1074	   for some phonemes.  And furthermore with the use of appropriately
1075	   designed Hidden Markov Models (HMMs) that such a system could predict
1076	   phrases with unexpected accuracy.  One CELP codec studied, SPEEX, had
1077	   the property that it produced 21 different packet lengths in its
1078	   wideband mode and that these packet lengths probabilistically mapped
1079	   to phonemes that a HMM system could be trained on.  In this paper it
1080	   was determined that a mitigation technique would be to pad the output
1081	   of the encoder with random padding lengths to the effect: 1) that
1082	   more discrete payload sizes would result, and 2) that the
1083	   probabilistic mapping to phonemes would become less clear.  As G.711
1084	   is not a speech model based codec, neither is G.711.0.  A G.711.0
1085	   encoding, during talking periods, produces frames of varying frame
1086	   lengths which are not likely to have a strong mapping to phonemes.
1087	   Thus G.711.0 is not expected to have this same vulnerability.  It
1088	   should be noted that "silence" (only one value of G.711 in the entire
1089	   G.711 input frame)" or "near silence" (only a few G.711 values) is
1090	   easily detectable as G.711.0 frame lengths or one or a few octets.
1091	   If one desires to mitigate for silence/non-silence detection,
1092	   statistically variable padding should be added to G.711.0 frames that
1093	   resulted in very small G.711.0 frames (less than about 20% of the
1094	   symbols of the corresponding G.711 input frame).  Methods of
1095	   introducing padding in the G.711.0 payloads have been provided in the
1096	   G.711.0 RTP payload definitions in Section 4.2.1 and Section 4.2.2.

1098	11.  References

1100	11.1.  Normative References

1102	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1103	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1105	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1106	              Description Protocol", RFC 4566, July 2006.

1108	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
1109	              Registration Procedures", RFC 4288, December 2005.

1111	   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
1112	              Formats", RFC 4855, February 2007.

1114	   [RFC4856]  Casner, S., "Media Type Registration of Payload Formats in
1115	              the RTP Profile for Audio and Video Conferences", RFC
1116	              4856, February 2007.

1118	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1119	              Jacobson, "RTP: A Transport Protocol for Real-Time
1120	              Applications", STD 64, RFC 3550, July 2003.

1122	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1123	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1124	              July 2003.

1126	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1127	              "Extended RTP Profile for Real-time Transport Control
1128	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
1129	              2006.

1131	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1132	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1133	              RFC 3711, March 2004.

1135	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1136	              with Session Description Protocol (SDP)", RFC 3264, June
1137	              2002.

1139	   [G.711.0]  ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless
1140	              Compression of G.711 Pulse Code Modulation", September
1141	              2009.

1143	   [G.711]    ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code
1144	              Modulation (PCM) of Voice Frequencies", November 1988.

1146	   [G.711-AP1]
1147	              ITU-T G.711 Appendix 1, , "Recommendation G.711
1148	              Appendix 1: A high quality low-complexity algorithm for
1149	              packet loss concealment with G.711", September 1999.

1151	   [G.711-A1]
1152	              ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711
1153	              Amendment 1 - Amendment 1: New Annex A on Lossless
1154	              Encoding of PCM Frames", September 2009.

1156	11.2.  Informative References

1158	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
1159	              June 1999.

1161	   [G.729]    ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of
1162	              speech at 8 kbit/s using conjugate-structure algebraic-
1163	              code-excited linear prediction (CS-ACELP)", January 2007.

1165	   [G.722]    ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio-
1166	              coding within 64 kbit/s", November 1988.

1168	   [ICASSP]   N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M.
1169	              A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H.
1170	              Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0
1171	              - Lossless Compression of G.711 Pulse Code Modulation,
1172	              International Conference on Acoustics Speech and Signal
1173	              Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
1174	              March 2010.

1176	   [IEEE]     C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, ,
1177	              and G.M. Masson, "Spot Me if You Can: Uncovering Spoken
1178	              Phrases in Encrypted VoIP Conversations, IEEE Symposium on
1179	              Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May
1180	              2008.

1182	Authors' Addresses

1184	   Michael A. Ramalho (editor)
1185	   Cisco Systems, Inc.
1186	   8000 Hawkins Road
1187	   Sarasota, FL  34241
1188	   USA

1190	   Phone: +1 919 476 2038
1191	   Email: mramalho@cisco.com

1193	   Paul E. Jones
1194	   Cisco Systems, Inc.
1195	   7025 Kit Creek Rd.
1196	   Research Triangle Park, NC  27709
1197	   USA

1199	   Phone: +1 919 476 2048
1200	   Email: paulej@packetizer.com

1202	   Noboru Harada
1203	   NTT Communications Science Labs.
1204	   3-1 Morinosato-Wakamiya
1205	   Atsugi, Kanagawa  243-0198
1206	   JAPAN

1208	   Phone: +81 46 240 3676
1209	   Email: harada.noboru@lab.ntt.co.jp
1210	   Muthu Arul Mozhi Perumal
1211	   Cisco Systems, Inc.
1212	   Cessna Business Park
1213	   Sarjapur-Marathahalli Outer Ring Road
1214	   Bangalore, Karnataka  560103
1215	   India

1217	   Phone: +91 9449288768
1218	   Email: mperumal@cisco.com

1220	   Lei Miao
1221	   Huawei Technologies Co. Ltd
1222	   Q22-2-A15R, Enviroment Protection Park
1223	   No. 156 Beiqing Road
1224	   HaiDian District
1225	   Beijing  100095
1226	   China

1228	   Phone: +86 1059728300
1229	   Email: lei.miao@huawei.com