idnits 2.17.1 

draft-ietf-avt-rtp-g719-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1239.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1250.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1257.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1263.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 664 has weird spacing: '... Packet   n: 1...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (Nov 17, 2008) is 5631 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 2326
     (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 4288
     (Obsoleted by RFC 6838)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU-T-G719'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      M. Westerlund
3	Internet-Draft                                              I. Johansson
4	Intended status: Standards Track                             Ericsson AB
5	Expires: May 21, 2009                                       Nov 17, 2008

7	                      RTP Payload format for G.719
8	                       draft-ietf-avt-rtp-g719-04

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on May 21, 2009.

35	Abstract

37	   This document specifies the payload format for packetization of the
38	   G.719 full-band codec encoded audio signals into the Real-time
39	   Transport Protocol (RTP).  The payload format supports transmission
40	   of multiple channels, multiple frames per payload, and interleaving.

42	Table of Contents

44	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
45	   2.  Definitions and Conventions  . . . . . . . . . . . . . . . . .  3
46	   3.  G.719 Description  . . . . . . . . . . . . . . . . . . . . . .  3
47	   4.  Payload format Capabilities  . . . . . . . . . . . . . . . . .  4
48	     4.1.  Multi-rate Encoding and Rate Adaptation  . . . . . . . . .  4
49	     4.2.  Support for Multi-Channel Sessions . . . . . . . . . . . .  5
50	     4.3.  Robustness against Packet Loss . . . . . . . . . . . . . .  5
51	       4.3.1.  Use of Forward Error Correction (FEC)  . . . . . . . .  5
52	       4.3.2.  Use of Frame Interleaving  . . . . . . . . . . . . . .  6
53	   5.  Payload format . . . . . . . . . . . . . . . . . . . . . . . .  7
54	     5.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  8
55	     5.2.  Payload Structure  . . . . . . . . . . . . . . . . . . . .  8
56	       5.2.1.  Basic ToC element  . . . . . . . . . . . . . . . . . .  9
57	     5.3.  Basic mode . . . . . . . . . . . . . . . . . . . . . . . . 10
58	     5.4.  Interleaved mode . . . . . . . . . . . . . . . . . . . . . 10
59	     5.5.  Audio Data . . . . . . . . . . . . . . . . . . . . . . . . 11
60	     5.6.  Implementation Considerations  . . . . . . . . . . . . . . 12
61	       5.6.1.  Receiving Redundant Frames . . . . . . . . . . . . . . 12
62	       5.6.2.  Interleaving . . . . . . . . . . . . . . . . . . . . . 12
63	       5.6.3.  Decoding Validation  . . . . . . . . . . . . . . . . . 13
64	   6.  Payload Examples . . . . . . . . . . . . . . . . . . . . . . . 13
65	     6.1.  3 mono frames with 2 different bitrates  . . . . . . . . . 14
66	     6.2.  2 stereo frame-blocks of the same bitrate  . . . . . . . . 14
67	     6.3.  4 mono frames interleaved  . . . . . . . . . . . . . . . . 15
68	   7.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 16
69	     7.1.  Media Type Definition  . . . . . . . . . . . . . . . . . . 16
70	     7.2.  Mapping to SDP . . . . . . . . . . . . . . . . . . . . . . 19
71	       7.2.1.  Offer/Answer Considerations  . . . . . . . . . . . . . 20
72	       7.2.2.  Declarative SDP Considerations . . . . . . . . . . . . 23
73	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
74	   9.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 23
75	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 24
76	     10.1. Confidentiality  . . . . . . . . . . . . . . . . . . . . . 24
77	     10.2. Authentication and Integrity . . . . . . . . . . . . . . . 25
78	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25
79	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
80	     12.1. Informative References . . . . . . . . . . . . . . . . . . 25
81	     12.2. Normative References . . . . . . . . . . . . . . . . . . . 26
82	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
83	   Intellectual Property and Copyright Statements . . . . . . . . . . 28

85	1.  Introduction

87	   This document specifies the payload format for packetization of the
88	   G.719 full-band (FB) codec encoded audio signals into the Real-time
89	   Transport Protocol (RTP) [RFC3550].  The payload format supports
90	   transmission of multiple channels, multiple frames per payload,
91	   packet loss robustness methods using redundancy or interleaving.

93	   This document starts with conventions, a brief description of the
94	   codec, and the payload formats capabilities.  The payload format is
95	   specified in Section 5.  Examples can be found in Section 6.  The
96	   media type and its mappings to SDP, usage in SDP offer/answer is then
97	   specified.  The document ends with considerations around congestion
98	   control and security.

100	2.  Definitions and Conventions

102	   The term "frame-block" is used in this document to describe the time-
103	   synchronized set of audio frames in a multi-channel audio session.
104	   In particular, in an N-channel session, a frame-block will contain N
105	   audio frames, one from each of the channels, and all N speech frames
106	   represents exactly the same time period.

108	   This document contains depictions of bit fields.  The most
109	   significant bit is always leftmost in the figure on each row and have
110	   the lowest enumeration.  For fields that are depicted over multiple
111	   rows the upper row is more significant than the next.

113	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
114	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
115	   document are to be interpreted as described in RFC 2119 [RFC2119].

117	3.  G.719 Description

119	   The ITU-T G.719 full-band codec is a transform coder based on
120	   Modulated Lapped Transform (MLT).  G.719 is a low complexity full
121	   bandwidth codec for conversational speech and audio coding.  The
122	   encoder input and decoder output are sampled at 48 kHz.  The codec
123	   enables full bandwidth, from 20 Hz to 20 kHz, encoding of speech,
124	   music and general audio content at rates from 32 kbit/s up to 128
125	   kbit/s.  The codec operates on 20ms frames and has an algorithmic
126	   delay of 40 ms.

128	   The codec provides excellent quality for speech, music and other
129	   types of audio.  Some of the applications for which this coder is
130	   suitable are:

132	   o  Real-time communications such as video conferencing and telephony.

134	   o  Streaming audio

136	   o  Archival and messaging

138	   The encoding and decoding algorithm can change the bit rate at any
139	   20ms frame boundary.  The encoder receives the audio sampled at
140	   48kHz.  The support of other sampling rates is possible by re-
141	   sampling the input signal to the codec's sampling rate, i.e. 48kHz,
142	   however, this functionality is not part of the standard.

144	   The encoding is performed on equally sized frames.  For each frame,
145	   the encoder decides between two encoding modes, a transient mode and
146	   a stationary mode.  The decision is based on statistics derived from
147	   the input signal.  The stationary mode uses a long MLT that leads to
148	   a spectrum of 960 coefficients while the transient encoding mode uses
149	   a short MLT (higher time resolution transform) which results in 4
150	   spectra (4 x 240 = 960 coefficients).  The encoding of the spectrum
151	   is done in two steps.  First, the spectral envelope is computed,
152	   quantized and Huffman encoded.  The envelope is computed on a non-
153	   uniform frequency subdivision.  From the coded spectral envelope, a
154	   weighted spectral envelope is derived and is used for bit-allocation,
155	   this process is also repeated at the decoder, thus only the spectral
156	   envelope is transmitted.  The output of the bit-allocation is used in
157	   order to quantize the spectra.  In addition, for stationary frames
158	   the encoder estimates the amount of noise level.  The decoder applies
159	   the reverse operation upon reception of the bit stream.  The non-
160	   coded coefficients (i.e. no bits allocated) are replaced by entries
161	   of a noise codebook which is built based on the decoded coefficients.

163	4.  Payload format Capabilities

165	   This payload format have a number of capabilities and this section
166	   discuss them in some detail.

168	4.1.  Multi-rate Encoding and Rate Adaptation

170	   G.719 supports multi-rate encoding capability that enables on a per
171	   frame basis variation of the encoding rate.  This enables support for
172	   bit-rate adaptation and congestion control.  The possibility to
173	   aggregate multiple audio frames into a single RTP payload is another
174	   dimension of adaptation.  The RTP and payload format overhead can
175	   thus be reduced by the aggregation at the cost of increased delay and
176	   reduced packet-loss robustness.

178	4.2.  Support for Multi-Channel Sessions

180	   The RTP payload format defined in this document supports multi-
181	   channel audio content (e.g. stereophonic or surround audio sessions).
182	   Although the G.719 codec itself does not support encoding of multi-
183	   channel audio content into a single bit stream, it can be used to
184	   separately encode and decode each of the individual channels.  To
185	   transport (or store) the separately encoded multi-channel content,
186	   the audio frames for all channels that are framed and encoded for the
187	   same 20 ms period are logically collected in a "frame-block".

189	   At the session setup, out-of-band signaling must be used to indicate
190	   the number of channels in the payload type.  The order of the audio
191	   frames within the frame-block depends on the number of the channels
192	   and follows the definition in Section 4.1 of the RTP/AVP Profile
193	   [RFC3551].  When using SDP for signaling, the number of channels is
194	   specified in the rtpmap attribute.

196	4.3.  Robustness against Packet Loss

198	   The payload format supports several means, including forward error
199	   correction (FEC) and frame interleaving, to increase robustness
200	   against packet loss.

202	4.3.1.  Use of Forward Error Correction (FEC)

204	   Generic forward error correction within RTP is defined, for example,
205	   in RFC 5109 [RFC5109].  Audio redundancy coding is defined in RFC
206	   2198 [RFC2198].  Either scheme can be used to add redundant
207	   information to the RTP packet stream and make it more resilient to
208	   packet losses, at the expense of a higher bit rate.  Please see
209	   either RFCs for a discussion of the implications of the higher bit
210	   rate to network congestion.

212	   In addition to these media-unaware mechanisms, this memo specifies an
213	   optional G.719 specific form of audio redundancy coding, which may be
214	   beneficial in terms of packetization overhead.  Conceptually,
215	   previously transmitted transport frames are aggregated together with
216	   new ones.  A sliding window can be used to group the frames to be
217	   sent in each payload.  However, irregular or non-consecutive patterns
218	   are also possible by inserting NO_DATA frames between primary and
219	   redundant transmissions.  Figure 1 below shows an example.

221	   --+--------+--------+--------+--------+--------+--------+--------+--
222	     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
223	   --+--------+--------+--------+--------+--------+--------+--------+--

225	      <---- p(n-1) ---->
226	               <----- p(n) ----->
227	                        <---- p(n+1) ---->
228	                                 <---- p(n+2) ---->
229	                                          <---- p(n+3) ---->
230	                                                   <---- p(n+4) ---->

232	              Figure 1: An example of redundant transmission

234	   Here, each frame is retransmitted once in the following RTP payload
235	   packet. f(n-2)...f(n+4) denote a sequence of audio frames, and p(n-
236	   1)...p(n+4) a sequence of payload packets.

238	   The mechanism described does not really require signaling at the
239	   session setup.  However, signalling has been defined to allow for the
240	   sender to voluntarily bounding the buffering and delay requirements.
241	   If nothing is signalled the use of this mechanism is allowed and
242	   unbounded.  For a certain timestamp, the receiver may receive
243	   multiple copies of a frame containing encoded audio data, even at
244	   different encoding rates.  The cost of this scheme is bandwidth and
245	   the receiver delay necessary to allow the redundant copy to arrive.

247	   This redundancy scheme provides a functionality similar to the one
248	   described in RFC 2198, but it works only if both original frames and
249	   redundant representations are G.719 frames.  When the use of other
250	   media coding schemes is desirable, one has to resort to RFC 2198.

252	   The sender is responsible for selecting an appropriate amount of
253	   redundancy based on feedback about the channel conditions, e.g., in
254	   the RTP Control Protocol (RTCP) [RFC3550] receiver reports.  The
255	   sender is also responsible for avoiding congestion, which may be
256	   exacerbated by redundancy (see Section 9 for more details).

258	4.3.2.  Use of Frame Interleaving

260	   To decrease protocol overhead, the payload design allows several
261	   audio transport frames to be encapsulated into a single RTP packet.
262	   One of the drawbacks of such an approach is that in case of packet
263	   loss several consecutive frames are lost.  Consecutive frame loss
264	   normally renders error concealment less efficient and usually causes
265	   clearly audible and annoying distortions in the reconstructed audio.
266	   Interleaving of transport frames can improve the audio quality in
267	   such cases by distributing the consecutive losses into a number of
268	   isolated frame losses, which are easier to conceal.  However,
269	   interleaving and bundling several frames per payload also increases
270	   end-to-end delay and sets higher buffering requirements.  Therefore,
271	   interleaving is not appropriate for all use cases or devices.
272	   Streaming applications should most likely be able to exploit
273	   interleaving to improve audio quality in lossy transmission
274	   conditions.

276	   Note that this payload design supports the use of frame interleaving
277	   as an option.  The usage of this feature needs to be negotiated in
278	   the session setup.

280	   The interleaving supported by this format is rather flexible.  For
281	   example, a continuous pattern can be defined, as depicted in
282	   Figure 2.

284	   --+--------+--------+--------+--------+--------+--------+--------+--
285	     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
286	   --+--------+--------+--------+--------+--------+--------+--------+--

288	              [ p(n)   ]
289	     [ p(n+1) ]                 [ p(n+1) ]
290	                       [ p(n+2) ]                 [ p(n+2) ]
291	                                         [ p(n+3) ]
292	                                                           [ p(n+4) ]

294	   Figure 2: An example of interleaving pattern that has constant delay

296	   In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
297	   aggregated into packets p(n) to p(n+4), each packet carrying two
298	   frames.  This approach provides an interleaving pattern that allows
299	   for constant delay in both the interleaving and de-interleaving
300	   processes.  The de-interleaving buffer needs to have room for at
301	   least three frames, including the one that is ready to be consumed.
302	   The storage space for three frames is needed, for example, when f(n)
303	   is the next frame to be decoded: since frame f(n) was received in
304	   packet p(n+2), which also carried frame f(n+3), both these frames are
305	   stored in the buffer.  Furthermore, frame f(n+1) received in the
306	   previous packet, p(n+1), is also in the de-interleaving buffer.  Note
307	   also that in this example the buffer occupancy varies: when frame
308	   f(n+1) is the next one to be decoded, there are only two frames,
309	   f(n+1) and f(n+3), in the buffer.

311	5.  Payload format

313	   The main purpose of the payload design for G.719 is to maximize the
314	   potential of the codec to its fullest degree with an as minimal
315	   overhead as possible.  In the design both basic and interleaved modes
316	   have been included as the codec is suitable both for conversational
317	   and other low delay applications as well as streaming, where more
318	   delay is acceptable.

320	   The main structural difference between the basic and interleaved
321	   modes is the extension of the table of content entries with frame
322	   displacement fields in the interleaved mode.  The basic mode supports
323	   aggregation of multiple consecutive frames in a payload.  The
324	   interleaved mode supports aggregation of multiple frames that are
325	   non-consecutive in time.  In both modes it is possible to have frames
326	   encoded with different frame types in the same payload.

328	   The payload format also supports the usage of G.719 for carrying
329	   multi-channel content using one discrete encoder per channel all
330	   using the same bit-rate.  In this case a complete frame-block with
331	   data from all channels are included in the RTP payload.  The data is
332	   the concatenation of all the encoded audio frames in the order
333	   specified for that number of included channels.  Also interleaving is
334	   done on complete frame-blocks rather than individual audio frames.

336	5.1.  RTP Header Usage

338	   The RTP timestamp corresponds to the sampling instant of the first
339	   sample encoded for the first frame-block in the packet.  The
340	   timestamp clock frequency SHALL be 48000 Hz.  The timestamp is also
341	   used to recover the correct decoding order of the frame-blocks.

343	   The RTP header marker bit (M) SHALL be set to 1 whenever the first
344	   frame-block carried in the packet is the first frame-block in a
345	   talkspurt (see definition of the talkspurt in section 4.1 of
346	   [RFC3551]).  For all other packets the marker bit SHALL be set to
347	   zero (M=0).

349	   The assignment of an RTP payload type for the format defined in this
350	   memo is outside the scope of this document.  The RTP profiles in use
351	   currently mandates binding the payload type dynamically for this
352	   payload format.  This is basically necessary due to that the payload
353	   type expresses the configuration of the payload itself, i.e. basic or
354	   interleaved mode and the number of channels carried.

356	   The remaining RTP header fields are used as specified in RFC 3550
357	   [RFC3550].

359	5.2.  Payload Structure

361	   The payload consists of one or more table of contents (ToC) entries
362	   followed by the audio data corresponding to the ToC entries.  The
363	   following sections describe both the basic mode and the interleaved
364	   mode.  Each ToC entry MUST be padded to a byte boundary to ensure
365	   octet alignment.  The rules regarding maximum payload size given in
366	   Section 3.2 of [I-D.ietf-tsvwg-udp-guidelines] SHOULD be followed.

368	5.2.1.  Basic ToC element

370	   All the different formats and modes in this draft use a common basic
371	   ToC which may be extended in the different options described below.

373	    0 1 2 3 4 5 6 7
374	   +-+-+-+-+-+-+-+-+
375	   |F|    L    |R|R|
376	   +-+-+-+-+-+-+-+-+

378	                        Figure 3: Basic TOC element

380	   F (1 bit):  If set to 1, indicates that this ToC entry is followed by
381	      another ToC entry; if set to 0, indicates that this ToC entry is
382	      the last one in the ToC.

384	   L (5 bits):  A field that gives the frame length of each individual
385	      frame within the frame-block.

387	        L          length(bytes)
388	       ============================
389	        0           0 NO_DATA
390	        1-7         N/A (reserved)
391	        8-22        80+10*(L-8)
392	       23-27        240+20*(L-23)
393	       28-31        N/A (reserved)

395	                Figure 4: How to map L values to frame lengths

397	      L=0 (NO_DATA) is used to indicate an empty frame, this is useful
398	      if frames are missing e.g at re-packetization or to insert gaps
399	      when sending redundant frames together with primary frames in the
400	      same payload.
401	      The value range [1..7] and [28..31] inclusive is reserved for
402	      future use in this draft version, if these values occur in a ToC
403	      the entire packet SHOULD be treated as invalid and discarded.
404	      A few examples are given below where the frame size and the
405	      corresponding codec bitrate is computed based on the value L.

407	         L    Bytes    Codec Bitrate(kbps)
408	       ===================================
409	         8      80        32
410	         9      90        36
411	        10     100        40
412	        12     120        48
413	        16     160        64
414	        22     220        88
415	        23     240        96
416	        25     280       112
417	        27     320       128

419	        Figure 5: Examples of L values and corresponding frame lengths

421	      This encoding yields a granularity of 4kbps between 32 and 88kbps
422	      and a granularity of 8kbps between 88 and 128kbps with a defined
423	      range of 32-128kbps for the codec data.

425	   R (2bits):  Reserved bits.  SHALL be set to 0 on sending and SHALL be
426	      ignored on reception.

428	5.3.  Basic mode

430	   The basic ToC element Figure 3 is followed by a one octet field for
431	   the number of frame-blocks (#frames) to form the ToC entry.  The
432	   frame-blocks field tells how many frame-blocks of the same length the
433	   ToC entry relates to.

435	    0 1 2 3 4 5 6 7
436	   +-+-+-+-+-+-+-+-+
437	   |    #frames    |
438	   +-+-+-+-+-+-+-+-+

440	                  Figure 6: Number of frame-blocks field

442	5.4.  Interleaved mode

444	   The basic ToC is followed by a one octet field for the number of
445	   frame-blocks (#frames) and then the DIS fields to form a ToC entry in
446	   interleaved mode.  The frame-blocks field tells how many frame-blocks
447	   of the same length the ToC relates to.  The DIS fields, one for each
448	   frame-block indicated by the #frames field, express the interleaving
449	   distance between audio frames carried in the payload.  If necessary
450	   to achieve octet alignment, a 4-bit padding is added.

452	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
453	   |    #frames    | DIS1  |  ...  | DISi  |  ...  | DISn  | Padd  |
454	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

456	            Figure 7: Number of frame-block + interleave fields

458	   DIS1...DISn (4 bits):  A list of n (n=#frames) displacement fields
459	      indicating the displacement of the i:th (i=1..n) audio frame-block
460	      relative to the preceding frame-block in the payload, in units of
461	      20 ms long audio frame-blocks).  The four-bit unsigned integer
462	      displacement values may be between 0 and 15 indicating the number
463	      of audio frame-blocks in decoding order between the (i-1):th and
464	      the i:th frame in the payload.  Note that for the first ToC entry
465	      of the payload the value of DIS1 is meaningless.  It SHALL be set
466	      to zero by a sender, and SHALL be ignored by a receiver.  This
467	      frame-block's location in the decoding order is uniquely defined
468	      by the RTP timestamp.  Note that for subsequent ToC entries DIS1
469	      indicates the number of frames between the last frame of the
470	      previous group and the first frame of this group.

472	   Padd (4 bits):  To ensure octet alignment, four padding bits SHALL be
473	      included at the end of the ToC entry in case there is an odd
474	      number of frame-blocks in the group referenced by this ToC entry.
475	      These bits SHALL be set to zero and SHALL be ignored by the
476	      receiver.  If a group containing an even number of frames is
477	      referenced by this ToC entry, these padding bits SHALL NOT be
478	      included in the payload.

480	5.5.  Audio Data

482	   The audio data part follows the table of contents.  All the octets
483	   comprising an audio frame SHALL be appended to the payload as a unit.
484	   For each frame-block the audio frames are concatenated in order
485	   indicated by table in Section 4.1 of [RFC3551] for the number of
486	   channels configured for the payload type in use.  So the first
487	   channel (left most) indicated comes first followed by the next
488	   channel.  The audio frame-blocks are packetized in increasing
489	   timestamp order within each group of frame-blocks (per ToC entry),
490	   i.e. oldest frame-block first.  The groups of frame-blocks are
491	   packetized in the same order as their corresponding ToC entries.

493	   The audio frames are specified in ITU recommendation [ITU-T-G719].

495	   The G.719 bit stream is split into a sequence of octets and
496	   transmitted in order from the left most (most significant-MSB) bit to
497	   the right most (least significant -LSB) bit.

499	5.6.  Implementation Considerations

501	   An application implementing this payload format MUST understand all
502	   the payload parameters specified in this specification.  Any mapping
503	   of the parameters to a signaling protocol MUST support all
504	   parameters.  So an implementation of this payload format in an
505	   application using SDP is required to understand all the payload
506	   parameters in their SDP-mapped form.  This requirement ensures that
507	   an implementation always can decide whether it is capable of
508	   communicating when the communicating entities support this version of
509	   the specification.

511	   Basic mode SHALL be implemented and the interleaved mode SHOULD be
512	   implemented.  The implementation burden of both is rather small, and
513	   supporting both ensures interoperability.  However, interleaving is
514	   not mandated as it has limited applicability for conversational
515	   application that requires tight delay boundaries.

517	5.6.1.  Receiving Redundant Frames

519	   The reception of redundant audio frames, i.e. more than one audio
520	   frame from the same source for the same time slot, MUST be supported
521	   by the implementation.  In the case that the receiver gets multiple
522	   audio frames in different bit-rates for the same time slot it is
523	   RECOMMENDED that the receiver keeps the one with the highest bit-
524	   rate.

526	5.6.2.  Interleaving

528	   The use of interleaving requires further considerations.  As
529	   presented in the example in Section 4.3.2, a given interleaving
530	   pattern requires a certain amount of the de-interleaving buffer.
531	   This buffer space, expressed in a number of transport frame slots, is
532	   indicated by the "interleaving" media type parameter.  The number of
533	   frame slots needed can be converted into actual memory requirements
534	   by considering the 320 bytes per frame used by the highest bit-rate
535	   rate of G.719.

537	   The information about the frame buffer size is not always sufficient
538	   to determine when it is appropriate to start consuming frames from
539	   the interleaving buffer.  Additional information is needed when the
540	   interleaving pattern changes.  The "int-delay" media type parameter
541	   is defined to convey this information.  It allows a sender to
542	   indicate the minimal media time that needs to be present in the
543	   buffer before the decoder can start consuming frames from the buffer.
544	   Because the sender has full control over the interleaving pattern, it
545	   can calculate this value.  In certain cases (for example, if joining
546	   a multicast session with interleaving mid-session), a receiver may
547	   initially receive only part of the packets in the interleaving
548	   pattern.  This initial partial reception (in frame sequence order) of
549	   frames can yield too few frames for acceptable quality from the audio
550	   decoding.  This problem also arises when using encryption for access
551	   control, and the receiver does not have the previous key.  Although
552	   the G.719 is robust and thus tolerant to a high random frame erasure
553	   rate, it would have difficulties handling consecutive frame losses at
554	   startup.  Thus, some special implementation considerations are
555	   described.

557	   In order to handle this type of startup efficiently, decoding can
558	   start provided that:

560	   1.  There are at least two consecutive frames available.

562	   2.  More than or equal to half the frames are available in the time
563	       period from where decoding was planned to start and the most
564	       forward received decoding.

566	   After receiving a number of packets, in the worst case as many
567	   packets as the interleaving pattern covers, the previously described
568	   effects disappear and normal decoding is resumed.  Similar issues
569	   arise when a receiver leaves a session or has lost access to the
570	   stream.  If the receiver leaves the session, this would be a minor
571	   issue since playout is normally stopped.  The sender can avoid this
572	   type of problem in many sessions by starting and ending interleaving
573	   patterns correctly when risks of losses occur.  One such example is a
574	   key-change done for access control to encrypted streams.  If only
575	   some keys are provided to clients and there is a risk of they
576	   receiving content for which they do not have the key, it is
577	   recommended that interleaving patterns do not overlap key changes.

579	5.6.3.  Decoding Validation

581	   If the receiver finds a mismatch between the size of a received
582	   payload and the size indicated by the ToC of the payload, the
583	   receiver SHOULD discard the packet.  This is recommended because
584	   decoding a frame parsed from a payload based on erroneous ToC data
585	   could severely degrade the audio quality.

587	6.  Payload Examples

589	   A few examples to highlight the payload format

591	6.1.  3 mono frames with 2 different bitrates

593	   The first example is a payload consisting of 3 mono frames where the
594	   2 first frames correspond to a bitrate of 32kbps (80byte/frame) and
595	   the last is 48kbps (120byte/frame).

597	      The first 32 bits are ToC fields.
598	      Bit 0 is '1' as another ToC field follow.
599	      Bits 1..5 is 01000 = 80bytes/frame
600	      Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame
601	      Bit 16 is '0', no more ToC follows
602	      Bits 17..21 is 01100 = 120 bytes/frame
603	      Bits 24..31 = 00000001 = 1 frame-block with 120bytes/frame

605	       0                   1                   2                   3
606	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
607	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
608	      |1|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0|0|0 1 1 0 0|0 0|0 0 0 0 0 0 0 1|
609	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
610	      |d(0)   frame 1                                                 |
611	      .                                                               .
612	      |                                                         d(639)|
613	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
614	      |d(0)   frame 2                                                 |
615	      .                                                               .
616	      |                                                         d(639)|
617	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
618	      |d(0)   frame 3                                                 |
619	      .                                                               .
620	      |                                                         d(959)|
621	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

623	6.2.  2 stereo frame-blocks of the same bitrate

625	   A payload consisting of 2 stereo frames corresponding to a bitrate of
626	   32kbps (80byte/frame) per channel.  The receiver calculates the
627	   number of frames in the audio block by multiplying the value of the
628	   channels parameter (2) with the #frames field value (2) to derive
629	   that there are 4 audio frames in the payload.

631	      The first 16 bits is the ToC field.
632	      Bit 0 is '0' as no ToC field follow.
633	      Bits 1..5 is 01000 = 80bytes/frame
634	      Bits 8..15 is 00000010 = 2 frame-blocks with 80bytes/frame
635	       0                   1                   2                   3
636	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
637	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
638	      |0|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0| d(0) frame 1 left ch.         |
639	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
640	      .                                                               .
641	      |                         d(639)| d(0) frame 1 right ch.        |
642	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
643	      .                                                               .
644	      |                         d(639)| d(0) frame 2 left ch.         |
645	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
646	      .                                                               .
647	      |                         d(639)| d(0) frame 2 right ch.        |
648	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
649	      |                         d(639)|
650	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

652	6.3.  4 mono frames interleaved

654	   A payload consisting of 4 mono frames corresponding to a bitrate of
655	   32kbps (80byte/frame) interleaved.  A pattern of interleaving for
656	   constant delay when aggregating 4 frames is used in the below
657	   example.  The actual packet illustrated is packet n, while the
658	   previous and following packets frame-block content is shown to
659	   illustrate the pattern.

661	      Packet n-3:  1,  6, 11, 16
662	      Packet n-2:  5, 10, 15, 20
663	      Packet n-1:  9, 14, 19, 24
664	      Packet   n: 13, 18, 23, 28
665	      Packet n+1: 17, 22, 27, 32
666	      Packet n+2: 21, 26, 31, 36

668	      The first 16 bits is the ToC field.
669	      Bit 0 is '0' as there are no ToC field following.
670	      Bits 1..5 is 01000 = 80bytes/frame
671	      Bits 8..15 is 00000100 = 4 frame-blocks with 80bytes/frame
672	      Bits 16..19 is 0000 = DIS1 (0)
673	      Bits 20..23 is 0100 = DIS2 (4)
674	      Bits 24..27 is 0100 = DIS3 (4)
675	      Bits 28..31 is 0100 = DIS4 (4)
676	       0                   1                   2                   3
677	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
678	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
679	      |0|0 1 0 0 0|0 0|0 0 0 0 0 1 0 0|0 0 0 0|0 1 0 0|0 1 0 0|0 1 0 0|
680	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
681	      | d(0) frame 13                                                 |
682	      .                                                               .
683	      |                                                         d(639)|
684	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
685	      | d(0) frame 18                                                 |
686	      .                                                               .
687	      |                                                         d(639)|
688	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
689	      | d(0) frame 23                                                 |
690	      .                                                               .
691	      |                                                         d(639)|
692	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
693	      | d(0) frame 28                                                 |
694	      .                                                               .
695	      |                                                         d(639)|
696	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

698	7.  Payload Format Parameters

700	   This RTP payload format is identified using the media type audio/g719
701	   which is registered in accordance with [RFC4855] and using the
702	   template of [RFC4288].

704	7.1.  Media Type Definition

706	   The media type for the G.719 codec is allocated from the IETF tree
707	   since G.719 is a has the potential to become a widely used audio
708	   codec in general VoIP, teleconferencing and streaming applications.
709	   This media type registration covers real-time transfer via RTP.

711	   Note, any unspecified parameter MUST be ignored by the receiver to
712	   ensure that additional parameters can be added in any future revision
713	   of this specification.

715	   Type name: audio

717	   Subtype name: g719

719	   Required parameters: none

721	   Optional parameters:

723	   interleaving:  Indicates that interleaved mode SHALL be used for the
724	      payload.  The parameter specifies the number of frame-block slots
725	      available in a de-interleaving buffer (including the frame that is
726	      ready to be consumed).  Its value is equal to one plus the maximum
727	      number of frames that can precede any frame in transmission order
728	      and follow the frame in RTP timestamp order.  The value MUST be
729	      greater than zero.  If this parameter is not present, interleaved
730	      mode SHALL NOT be used.

732	   int-delay:  The minimal media time delay in milliseconds that is
733	      needed to avoid underrun in the de-interleaving buffer before
734	      starting decoding, i.e., the difference in RTP timestamp ticks
735	      between the earliest and latest audio frame present in the de-
736	      interleaving buffer expressed in milliseconds.  The value is a
737	      stream property and provided per source.  The allowed values are 0
738	      to the largest value expressible by a unsigned 16 bit integer
739	      (65535).  Please note that the in practice largest value that can
740	      be used is equal to the declared size of the interleaving buffer
741	      of the receiver.  If the value for some reason is larger than the
742	      receiver buffer declared by or for the receiver this value
743	      defaults to the size of the receiver buffer.  For sources for
744	      which this value hasn't been provided the value defaults to the
745	      size of the receiver buffer.  The format is comma separated list
746	      of SSRC ":" delay in ms pairs which in ABNF [RFC5234] is expressed
747	      as:

749	         int-delay = "int-delay:" source-delay *("," source-delay)

751	         source-delay = SSRC ":" delay-value

753	         SSRC = 1*8HEXDIG ; The 32-bit SSRC encoded in hex format

755	         delay-value = 1*5DIGIT ; The delay value in milliseconds

757	         Example: int-delay=ABCD1234:1000,4321DCB:640

759	         NOTE: No white space allowed in the parameter before the end of
760	         all the value pairs

762	   max-red:  The maximum duration in milliseconds that elapses between
763	      the primary (first) transmission of a frame and any redundant
764	      transmission that the sender will use.  This parameter allows a
765	      receiver to have a bounded delay when redundancy is used.  Allowed
766	      values are between 0 (no redundancy will be used) and 65535.  If
767	      the parameter is omitted, no limitation on the use of redundancy
768	      is present.

770	   channels:  The number of audio channels.  The possible values (1-6)
771	      and their respective channel order is specified in Section 4.1 in
772	      [RFC3551].  If omitted, it has the default value of 1.

774	   CBR:  Constant Bit Rate (CBR), indicates the exact codec-bitrate in
775	      bits per second (not including the overhead from packetization,
776	      RTP header or lower layers) that the codec MUST use.  CBR is to be
777	      used when dynamic rate cannot be supported (one case is e.g
778	      gateway to H.320).  CBR is mostly used for gateways to circuit
779	      switch networks.  Therefore the CBR rate is the rate not including
780	      any FEC as specified in Section 4.3.1.  If FEC is to be used the
781	      b= parameter MUST be used to allow the extra bit rate needed to
782	      send the redundant information.  It is RECOMMENDED that this
783	      parameter is only used when necessary to establish a working
784	      communication.  The usage of this parameter have implications on
785	      congestion control that needs to be considered, see Section 9.

787	   ptime:  see [RFC4566].

789	   maxptime:  see [RFC4566].

791	   Encoding considerations:

793	      This media type is framed and binary, see section 4.8 in RFC4288
794	      [RFC4288].

796	   Security considerations:

798	      See Section 10 of RFC XXXX.

800	   Interoperability considerations:

802	      The support of the Interleaving mode is not mandatory and needs to
803	      be negotiated.  See Section 7.2 for how to that for SDP based
804	      protocols.

806	   Published specification:

808	      RFC XXXX

810	   Applications that use this media type:

812	      Real-time audio applications like voice over IP and
813	      teleconference, and multi-media streaming.

815	   Additional information: none

817	   Person & email address to contact for further information:

819	      Payload format: IngemarJohansson
820	      <ingemar.s.johansson@ericsson.com>

822	   Intended usage: COMMON

824	   Restrictions on usage:

826	      This media type depends on RTP framing, and hence is only defined
827	      for transfer via RTP [RFC3550].  Transport within other framing
828	      protocols is not defined at this time.

830	   Author:

832	      Ingemar Johansson <ingemar.s.johansson@ericsson.com>

834	      Magnus Westerlund <magnus.westerlund@ericsson.com>

836	   Change controller:

838	      IETF Audio/Video Transport working group delegated from the IESG.

840	   Additional Information:

842	      File storage of G.719 encoded audio in ISO base media file format
843	      is specified in Annex A of [ITU-T-G719].  Thus media file formats
844	      such as MP4 (audio/mp4 or video/mp4) [RFC4337] and 3GP (audio/3GPP
845	      and video/3GPP) [RFC3839] can contain G.719 encoded audio.

847	7.2.  Mapping to SDP

849	   The information carried in the media type specification has a
850	   specific mapping to fields in the Session Description Protocol (SDP)
851	   [RFC4566], which is commonly used to describe RTP sessions.  When SDP
852	   is used to specify sessions employing the G.719 codec, the mapping is
853	   as follows:

855	   o  The media type ("audio") goes in SDP "m=" as the media name.

857	   o  The media subtype (payload format name) goes in SDP "a=rtpmap" as
858	      the encoding name.  The RTP clock rate in "a=rtpmap" MUST be
859	      48000, and the encoding parameter "channels" (Section 7.1) MUST
860	      either be explicitly set to N or omitted, implying a default value
861	      of 1.  The values of N that are allowed are specified in Section
862	      4.1 in [RFC3551].

864	   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
865	      "a=maxptime" attributes, respectively.

867	   o  Any remaining parameters go in the SDP "a=fmtp" attribute by
868	      copying them directly from the media type parameter string as a
869	      semicolon-separated list of parameter=value pairs.

871	7.2.1.  Offer/Answer Considerations

873	   The following considerations apply when using SDP Offer-Answer
874	   procedures to negotiate the use of G.719 payload in RTP:

876	   o  Each combination of the RTP payload transport format configuration
877	      parameters (interleaving, and channels) is unique in its bit-
878	      pattern and not compatible with any other combination.  When
879	      creating an offer in an application desiring to use the more
880	      advanced features (interleaving, or more than one channel), the
881	      offerer is RECOMMENDED to also offer a payload type containing
882	      only the configuration with a single channel.  If multiple
883	      configurations are of interest to the application, they may all be
884	      offered; however, care should be taken not to offer too many
885	      payload types.  An SDP answerer MUST include, in the SDP answer
886	      for a payload type, the following parameters unmodified from the
887	      SDP offer (unless it removes the payload type): "interleaving";
888	      and "channels".  However, the value of the Interleaving parameter
889	      MAY be changed.  The SDP offerer and answerer MUST generate G.719
890	      packets as described by these parameters.

892	   o  The "interleaving" and "int-delay" parameter's values have a
893	      specific relationship that needs to be considered.  It also
894	      depends on the directionality of the streams and their delivery
895	      method.  The high level explanation that can be understood from
896	      the definition is that the value of "interleaving" declares the
897	      size of the receiver buffer, while int-delay is a stream property
898	      provided by the sender to inform how much buffer space it in
899	      practice is using for the stream it sends.

901	      *  For media streams which is sent over multicast the value of
902	         "interleaving" SHALL NOT be changed by the answerer.  It shall
903	         either be accepted or the payload type deleted.  The value of
904	         the "int-delay" parameter is a stream property and provided by
905	         the offer/answer agent that intends to send media with this
906	         payload type, and for each stream coming from that agent (one
907	         or more).  The value MUST be between 0 and what corresponds to
908	         the buffer size declared by the value of the "interleaving"
909	         parameter.

911	      *  For unicast streams which the offerer declares as send-only the
912	         value of the "interleaving" parameter is the size that the
913	         answerer is RECOMMENDED to use by the offerer.  The answerer
914	         MAY change it to any allowed value.  The int-delay parameter
915	         value will be the one the offerer intends to use unless the
916	         answerer reduce the value of the interleaving parameter below
917	         what is needed for that int-delay value.  If the interleaving
918	         value in the answer is smaller than the offer's int-delay, the
919	         int-delay value is per default reduced to be corresponding to
920	         the interleaving value.  If the offerer is not satisfied with
921	         this he will need to perform another round of offer/answer.  As
922	         the answerer will not send any media it doesn't include any
923	         int-delay in the answer.

925	      *  For unicast streams which the offerer declares as recvonly the
926	         value of interleaving in the offer will be the offerer's size
927	         of the interleaving buffer.  The answerer indicate its
928	         preferred size of the interleaving buffer for any future round
929	         of offer/answer.  The offerer will not provide any int-delay
930	         parameter as it is not sending any media.  The answerer is
931	         recommended in its answer include a int-delay parameter to
932	         declare what the property is for the stream it is going to
933	         send.  As it already know the receivers interleaving buffer
934	         size, there should be no issue with providing a value that is
935	         between 0 and corresponding to a full de-interleaving buffer.

937	      *  For unicast streams which the offer declares as sendrecv
938	         streams the value of the interleaving parameter in the offer
939	         will be offerer's size of the interleaving buffer.  The
940	         answerer will in the answer indicate the size of its actual
941	         interleaving buffer.  It is recommended that this value is as
942	         least as big as the offer's.  The offerer is recommended to
943	         include a int-delay parameter that is selected based on that
944	         the answerer has at least as much interleaving space as the
945	         offerer unless nothing else is known.  As the offerer's
946	         interleaving buffer size is not yet known this may fail, in
947	         which cases the default rule is to downgrade the value of the
948	         int-delay to correspond to the full size of the answerer's
949	         interleaving buffer.  If the offerer isn't satisfied with this
950	         it will need to initiate another round of offer/answer.  The
951	         answerer is recommended in its answer include a int-delay
952	         parameter to declare what the property is for the stream(s) it
953	         is going to send.  As it already know the receivers
954	         interleaving buffer size, there should be no issue with
955	         providing a value that is between 0 and corresponding to a full
956	         de-interleaving buffer.

958	   o  In most cases, the parameters "maxptime" and "ptime" will not
959	      affect interoperability; however, the setting of the parameters
960	      can affect the performance of the application.  The SDP offer-
961	      answer handling of the "ptime" parameter is described in
962	      [RFC3264].  The "maxptime" parameter MUST be handled in the same
963	      way.

965	   o  The parameter "max-red" is a stream property parameter.  For
966	      sendonly or sendrecv unicast media streams, the parameter declares
967	      the limitation on redundancy that the stream sender will use.  For
968	      recvonly streams, it indicates the desired value for the stream
969	      sent to the receiver.  The answerer MAY change the value, but is
970	      RECOMMENDED to use the same limitation as the offer declares.  In
971	      the case of multicast, the offerer MAY declare a limitation; this
972	      SHALL be answered using the same value.  A media sender using this
973	      payload format is RECOMMENDED to always include the "max-red"
974	      parameter.  This information is likely to simplify the media
975	      stream handling in the receiver.  This is especially true if no
976	      redundancy will be used, in which case "max-red" is set to 0.

978	   o  Any unknown parameter in an offer SHALL be removed in the answer.

980	   o  The b= SDP parameter SHOULD be used to negotiate the maximum
981	      bandwidth to be used for the audio stream.  The offerer may offer
982	      a maximum rate and the answer may contain a lower rate.  If no b=
983	      parameter is present in the offer or answer it implies a rate up
984	      to 128kbps

986	   o  The parameter "CBR" is a receiver capability, i.e. only receivers
987	      that really requires constant bit-rate should use it.  Usage of
988	      this parameter have negative impact on the possibility to perform
989	      congestion control, see Section 9.  For recvonly and sendrecv
990	      streams, it indicates the desired constant bit rate that the
991	      receiver wants to accept.  A sender MUST be able to send constant
992	      bit rate stream since it is a subset of the variable bit rate
993	      capability.  If the offer includes this parameter the answerer
994	      MUST send G.719 audio at the constant bit rate if it is within the
995	      allowed session bit rate (b= parameter).  If the answerer can not
996	      support the stated CBR this payload type must be refused in the
997	      answer.  The answerer SHOULD only include this parameter if it
998	      self requires to receive at a constant bit rate, even if the offer
999	      did not include the CBR parameter.  In this case, the offerer
1000	      SHALL send at the constant bit rate but SHALL be able to accept
1001	      media at variable bit rate.  An answerer is RECOMMEND to use the
1002	      same CBR rate as in the offer, as symmetric usage is more likely
1003	      to work.  If both sides requires a particular CBR rate there is
1004	      the possibility of communication failure when one or both sides
1005	      can't transmit the requested rate.  In this case the agent
1006	      detecting this issue will have to perform a second round of offer/
1007	      answer to try to find another working configuration or end the
1008	      established session.  In case the offer contained a CBR parameter
1009	      but the answer does not, then the offerer is free to transmit at
1010	      any rate to the answerer, but the answerer is restricted to the
1011	      declared rate.

1013	7.2.2.  Declarative SDP Considerations

1015	   In declarative usage, like SDP in RTSP [RFC2326] or SAP [RFC2974],
1016	   the parameters SHALL be interpreted as follows:

1018	   o  The payload format configuration parameters (interleaving, and
1019	      channels) are all declarative, and a participant MUST use the
1020	      configuration(s) that is provided for the session.  More than one
1021	      configuration may be provided if necessary by declaring multiple
1022	      RTP payload types; however, the number of types should be kept
1023	      small.

1025	   o  It might not be possible to know the SSRC values that are going to
1026	      be used by the sources at the time of sending the SDP.  This is
1027	      not a major issues as the size of the interleaving buffer can be
1028	      tailored towards the values actually going to be used.  Thus
1029	      ensuring that the default values for int-delay is not resulting in
1030	      to much extra buffering.

1032	   o  Any "maxptime" and "ptime" values should be selected with care to
1033	      ensure that the session's participants can achieve reasonable
1034	      performance.

1036	   o  The parameter "CBR" if included applies to all RTP streams using
1037	      that payload type for which a particular CBR rate is declared.
1038	      Usage of this parameter have negative impact on the possibility to
1039	      perform congestion control, see Section 9.

1041	8.  IANA Considerations

1043	   One media type (audio/g719) has been defined and needs registration
1044	   in the media types registry; see Section 7.1.

1046	9.  Congestion Control

1048	   The general congestion control considerations for transporting RTP
1049	   data apply; see RTP [RFC3550] and any applicable RTP profile like AVP
1050	   [RFC3551].  However, the multi-rate capability of G.719 audio coding
1051	   provides a mechanism that may help to control congestion, since the
1052	   bandwidth demand can be adjusted (within the limits of the codec) by
1053	   selecting a different encoding bit-rate.

1055	   The number of frames encapsulated in each RTP payload highly
1056	   influences the overall bandwidth of the RTP stream due to header
1057	   overhead constraints.  Packetizing more frames in each RTP payload
1058	   can reduce the number of packets sent and hence the header overhead,
1059	   at the expense of increased delay and reduced error robustness.  If
1060	   forward error correction (FEC) is used, the amount of FEC-induced
1061	   redundancy needs to be regulated such that the use of FEC itself does
1062	   not cause a congestion problem.  In other words a sender SHALL NOT
1063	   increase the total bit-rate when adding redundancy in response to
1064	   packet loss, and needs instead to adjust it down in accordance to the
1065	   congestion control algorithm being run.  Thus when adding redundancy
1066	   the media bit-rate will generally be needed to reduced to free up the
1067	   bit-rate that is used for redundancy.

1069	   The CBR signalling parameter allows a receiver to lock down a RTP
1070	   payload type to use a single encoding rate.  As this prevents the
1071	   codec rate from being lowered when congestion is experienced, the
1072	   sender is constrained to either change the packetization or abort the
1073	   transmission.  Since these responses to congestion are severely
1074	   limited, implementations SHOULD NOT use the CBR parameter unless they
1075	   are interacting with a device that cannot support variable bit rate
1076	   (e.g. a gateway to H.320 systems).  When using CBR mode, a receiver
1077	   MUST monitor the packet loss rate to ensure congestion is not caused,
1078	   following the guidelines in Section 2 of RFC 3551.

1080	10.  Security Considerations

1082	   RTP packets using the payload format defined in this specification
1083	   are subject to the general security considerations discussed in RTP
1084	   [RFC3550] and any applicable profile such as AVP [RFC3551] or SAVP
1085	   [RFC3711].  As this format transports encoded audio, the main
1086	   security issues include confidentiality, integrity protection, and
1087	   data origin authentication of the audio itself.  The payload format
1088	   itself does not have any built-in security mechanisms.  Any suitable
1089	   external mechanisms, such as SRTP [RFC3711], MAY be used.

1091	   This payload format and the G.719 decoder do not exhibit any
1092	   significant non-uniformity in the receiver-side computational
1093	   complexity for packet processing, and thus are unlikely to pose a
1094	   denial-of-service threat due to the receipt of pathological data.
1095	   The payload format or the codec data does not contain any type of
1096	   active content such as scripts.

1098	10.1.  Confidentiality

1100	   In order to ensure confidentiality of the encoded audio, all audio
1101	   data bits MUST be encrypted.  There is less need to encrypt the
1102	   payload header or the table of contents since they only carry
1103	   information about the frame type.  This information could also be
1104	   useful to a third party, for example, for quality monitoring.
1105	   However, as there currently don't exist any mechanism supporting
1106	   differential protection, this behavior isn't expected to be supported
1107	   and requirement of the audio data will be what governs the protection
1108	   of the RTP payload.

1110	   The use of interleaving in conjunction with encryption can have a
1111	   negative impact on confidentiality, for a short period of time.
1112	   Consider the following packets (in brackets) containing frame numbers
1113	   as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular
1114	   continuous diagonal interleaving pattern).  The originator wishes to
1115	   deny some participants the ability to hear material starting at time
1116	   16.  Simply changing the key on the packet with the timestamp at or
1117	   after 16, and denying that new key to those participants, does not
1118	   achieve this; frames 17, 18, and 21 have been supplied in prior
1119	   packets under the prior key, and error concealment may make the audio
1120	   intelligible at least as far as frame 18 or 19, and possibly further.

1122	10.2.  Authentication and Integrity

1124	   To authenticate the sender of the audio-stream, an external mechanism
1125	   MUST be used.  It is RECOMMENDED that such a mechanism protects both
1126	   the complete RTP header and the payload (audio and data bits).  Data
1127	   tampering by a man-in-the-middle attacker could replace audio content
1128	   and also result in erroneous depacketization/decoding that could
1129	   lower the audio quality.

1131	11.  Acknowledgements

1133	   The authors would like to thank Roni Even and Anisse Taleb for their
1134	   help with this draft.  We would also like to thank the people that
1135	   has provided feedback; Colin Perkins, Mark Baker and Stephen Botzko.

1137	12.  References

1139	12.1.  Informative References

1141	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1142	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1143	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1144	              September 1997.

1146	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
1147	              Streaming Protocol (RTSP)", RFC 2326, April 1998.

1149	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
1150	              Announcement Protocol", RFC 2974, October 2000.

1152	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1153	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1154	              RFC 3711, March 2004.

1156	   [RFC3839]  Castagno, R. and D. Singer, "MIME Type Registrations for
1157	              3rd Generation Partnership Project (3GPP) Multimedia
1158	              files", RFC 3839, July 2004.

1160	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
1161	              Registration Procedures", BCP 13, RFC 4288, December 2005.

1163	   [RFC4337]  Y Lim and D. Singer, "MIME Type Registration for MPEG-4",
1164	              RFC 4337, March 2006.

1166	   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
1167	              Formats", RFC 4855, February 2007.

1169	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1170	              Correction", RFC 5109, December 2007.

1172	12.2.  Normative References

1174	   [I-D.ietf-tsvwg-udp-guidelines]
1175	              Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
1176	              for Application Designers",
1177	              draft-ietf-tsvwg-udp-guidelines-11 (work in progress),
1178	              October 2008.

1180	   [ITU-T-G719]
1181	              ITU-T, "Specification : ITU-T G.719 extension for 20 kHz
1182	              fullband audio", April 2008.

1184	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1185	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1187	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1188	              with Session Description Protocol (SDP)", RFC 3264,
1189	              June 2002.

1191	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1192	              Jacobson, "RTP: A Transport Protocol for Real-Time
1193	              Applications", STD 64, RFC 3550, July 2003.

1195	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1196	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1197	              July 2003.

1199	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1200	              Description Protocol", RFC 4566, July 2006.

1202	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1203	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1205	Authors' Addresses

1207	   Magnus Westerlund
1208	   Ericsson AB
1209	   Torshamnsgatan 21-23
1210	   SE-164 83 Stockholm
1211	   SWEDEN

1213	   Phone: +46 8 7190000
1214	   Email: magnus.westerlund@ericsson.com

1216	   Ingemar Johansson
1217	   Ericsson AB
1218	   Laboratoriegrand 11
1219	   SE-971 28 Lulea
1220	   SWEDEN

1222	   Phone: +46 73 0783289
1223	   Email: ingemar.s.johansson@ericsson.com

1225	Full Copyright Statement

1227	   Copyright (C) The IETF Trust (2008).

1229	   This document is subject to the rights, licenses and restrictions
1230	   contained in BCP 78, and except as set forth therein, the authors
1231	   retain all their rights.

1233	   This document and the information contained herein are provided on an
1234	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1235	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1236	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1237	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1238	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1239	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1241	Intellectual Property

1243	   The IETF takes no position regarding the validity or scope of any
1244	   Intellectual Property Rights or other rights that might be claimed to
1245	   pertain to the implementation or use of the technology described in
1246	   this document or the extent to which any license under such rights
1247	   might or might not be available; nor does it represent that it has
1248	   made any independent effort to identify any such rights.  Information
1249	   on the procedures with respect to rights in RFC documents can be
1250	   found in BCP 78 and BCP 79.

1252	   Copies of IPR disclosures made to the IETF Secretariat and any
1253	   assurances of licenses to be made available, or the result of an
1254	   attempt made to obtain a general license or permission for the use of
1255	   such proprietary rights by implementers or users of this
1256	   specification can be obtained from the IETF on-line IPR repository at
1257	   http://www.ietf.org/ipr.

1259	   The IETF invites any interested party to bring to its attention any
1260	   copyrights, patents or patent applications, or other proprietary
1261	   rights that may cover technology that may be required to implement
1262	   this standard.  Please address the information to the IETF at
1263	   ietf-ipr@ietf.org.