idnits 2.17.1 

draft-hiwasaki-avt-rtp-uemclip-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 801.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 812.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 819.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 825.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     o  When an offered condition does not fit an answerer's
     capabilities, it naturally MUST not answer the conditions, and session
     MAY proceed to re-INVITE, if possible.  If a condition (mode) is decided
     upon, an offerer and an answerer MUST transmit on this condition.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 9, 2007) is 6226 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'Hz' on line 669

  ** Obsolete normative reference: RFC 3555 (ref. '5') (Obsoleted by RFC
     4855, RFC 4856)

  ** Obsolete normative reference: RFC 4288 (ref. '6') (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 4566 (ref. '7') (Obsoleted by RFC 8866)


     Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Audio/Video Transport                                        Y. Hiwasaki
3	Internet-Draft                                                 H. Ohmuro
4	Intended status: Standards Track                         NTT Corporation
5	Expires: October 11, 2007                                  April 9, 2007

7	              RTP payload format for UEMCLIP speech codec
8	                   draft-hiwasaki-avt-rtp-uemclip-02

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on October 11, 2007.

35	Copyright Notice

37	   Copyright (C) The IETF Trust (2007).

39	Abstract

41	   This document describes the RTP payload format of an ITU-T G.711
42	   enhanced speech codec, UEMCLIP.  The bitstream has a scalable
43	   structure with an embedded u-law bitstream, also known as PCMU, thus
44	   providing a handy transcoding operation between narrowband and
45	   wideband speech.

47	Table of Contents

49	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
50	     1.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  3
51	   2.  Media Format Background  . . . . . . . . . . . . . . . . . . .  4
52	   3.  Payload Format . . . . . . . . . . . . . . . . . . . . . . . .  6
53	     3.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  6
54	     3.2.  Multiple frames in an RTP packet . . . . . . . . . . . . .  7
55	     3.3.  Payload Data . . . . . . . . . . . . . . . . . . . . . . .  7
56	       3.3.1.  Main Header  . . . . . . . . . . . . . . . . . . . . .  7
57	       3.3.2.  Sub-layer data . . . . . . . . . . . . . . . . . . . . 11
58	   4.  G.711 interoperability . . . . . . . . . . . . . . . . . . . . 13
59	   5.  Congestion Control Considerations  . . . . . . . . . . . . . . 14
60	   6.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 15
61	     6.1.  Media type registration  . . . . . . . . . . . . . . . . . 15
62	     6.2.  Mapping to SDP Parameters  . . . . . . . . . . . . . . . . 16
63	       6.2.1.  Dynamic transmission definition  . . . . . . . . . . . 16
64	     6.3.  Offer-answer Model Considerations  . . . . . . . . . . . . 17
65	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
66	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
67	   9.  Normative References . . . . . . . . . . . . . . . . . . . . . 21
68	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
69	   Intellectual Property and Copyright Statements . . . . . . . . . . 23

71	1.  Introduction

73	   This document specifies the payload format for sending UEMCLIP
74	   encoded speech using the Real-time Transport Protocol (RTP) [3].
75	   UEMCLIP is an enhanced version of u-law ITU-T G.711, and designed to
76	   help the market for smooth transition towards the forthcoming
77	   wideband communication environment and while maintaining the
78	   interoperability and less transcoding load with the existing
79	   terminals, in which the implementation of G.711 is mandatory.

81	   The payload format is described in Section 3.  The interoperability
82	   with G.711 issues are discussed in Section 4.  In Section 6.1, a
83	   media type registration for UEMCLIP RTP payload format is provided.

85	1.1.  Terminology

87	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
88	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
89	   document are to be interpreted as described in [1].

91	2.  Media Format Background

93	   UEMCLIP stands for "U-law EMbedded Coder for Low-delay IP
94	   communication", and is basically an enhanced version of u-law ITU-T
95	   G.711, otherwise known as PCMU [5].  It is developed for VoIP (Voice
96	   over Internet Protocol) applications, and is especially suitable for
97	   wideband multi-point conferencing.  The main goal of this codec is to
98	   provide a wideband communication platform that is highly
99	   interoperable with existing terminals equipped with G.711, and to
100	   stimulate the market to gradually shift to the wideband
101	   communication.  Because the G.711 bitstream is embedded in the
102	   bitstream, costly transcoding would be avoided especially when
103	   interoperating with narrowband terminals.

105	   This document does not discuss the implementation detail of the
106	   encoder and decoder, but only describes the bitstream format.  The
107	   implementation detail will be available by other means.

109	   Because of its scalable nature, there are a number of sub-bitstreams
110	   (layer data) with in a UEMCLIP bitstream.  By choosing appropriate
111	   sub-layers, the codec can adapt to the following requirements:

113	   o  Sampling frequency,

115	   o  Number of channels,

117	   o  Speech quality, and

119	   o  Bit-rate.

121	   The current implementation of UEMCLIP codec includes three sub-
122	   coders, as shown in Table 1.  The core layer is G.711 core, and other
123	   two are quality and bandwidth enhancement layers with bit-rate of 16
124	   kbit/s each.

126	   +-------+---------------------+----------+--------------------------+
127	   | Layer | Description         | Bit-rate | Coding algorithm         |
128	   +-------+---------------------+----------+--------------------------+
129	   |   a   | G.711 core          |       64 | u-law PCM                |
130	   |       |                     |          |                          |
131	   |   b   | Lower-band          |       16 | Time domain block        |
132	   |       | enhancement         |          | quantization             |
133	   |       |                     |          |                          |
134	   |   c   | Higher-band         |       16 | MDCT block quantization  |
135	   +-------+---------------------+----------+--------------------------+

137	                      Table 1: Sub-layer description

139	   Based on these sub-layers, UEMCLIP codec operates in four modes as
140	   shown in Table 2.  Here, "Fs" is the sampling frequency in kHz.  The
141	   absent Modes 2 and 5 are reserved for future extension to 32 kHz
142	   sampling modes.  As the mode definition is expected to grow, any
143	   other modes not defined in this table MUST NOT be used for
144	   compatibility and interoperability reasons.

146	   +------+----+----+-------+-------+-------+-------------+------------+
147	   | Mode | Ch | Fs | Layer | Layer | Layer |    Bit-rate |      Total |
148	   |      |    |    |   a   |   b   |   c   | w/o headers |   bit-rate |
149	   |      |    |    |       |       |       |      [kbps] |     [kbps] |
150	   +------+----+----+-------+-------+-------+-------------+------------+
151	   |   0  |  1 |  8 |   x   |   -   |   -   |          64 |       68.8 |
152	   |      |    |    |       |       |       |             |            |
153	   |   1  |  1 | 16 |   x   |   -   |   x   |          80 |       85.6 |
154	   |      |    |    |       |       |       |             |            |
155	   |   2  |  - |  - |   -   |   -   |   -   |           - |          - |
156	   |      |    |    |       |       |       |             |            |
157	   |   3  |  1 |  8 |   x   |   x   |   -   |          80 |       85.6 |
158	   |      |    |    |       |       |       |             |            |
159	   |   4  |  1 | 16 |   x   |   x   |   x   |          96 |      102.4 |
160	   |      |    |    |       |       |       |             |            |
161	   |   5  |  - |  - |   -   |   -   |   -   |           - |          - |
162	   +------+----+----+-------+-------+-------+-------------+------------+

164	                         Table 2: Mode description

166	   UEMCLIP bitstream contains internal headers and other side-
167	   information apart from the layer data.  This results in total bit-
168	   rate larger than the sum of the layers shown in the above table.  The
169	   detail of the internal headers and auxiliary information are
170	   described in Section 3.3.1.

172	   Defining the sampling frequency and the number of channels does not
173	   result in a singular mode, i.e., there can be multiple modes for the
174	   same sampling frequency or number of channels.  The supported modes
175	   would differ from the implementations, thus the sender and the
176	   receiver must exchange what mode to use for transmission.

178	3.  Payload Format

180	   As an RTP payload, UEMCLIP bitstream can contain one or more frames
181	   as shown in Figure 1.

183	     0                   1                   2                   3
184	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
185	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
186	    |                      RTP Header                               |
187	    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
188	    |                                                               |
189	    |                 one or more frames of UEMCLIP                 |
190	    |                                                               |
191	    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

193	                       Figure 1: RTP payload format

195	   UEMCLIP bitstream has a scalable structure, thus it is possible to
196	   reconstruct the signal by decoding a part of it.  A UEMCLIP frame is
197	   composed of a main header followed by one or more sub-layers.  As a
198	   sub-layer, the core layer, i.e., "Layer a", MUST always be included.
199	   It should be noted that the location of the base layer may not be
200	   located at the top.  The decoder MUST always refer to the layer ID
201	   for proper decoding.  The bitstream, for the case of enhancement
202	   header with length 0, is shown in Figure 2, where sub-layer #1 can be
203	   any arbitrary sub-layer data.

205	                 +--+-------+-------+-------+-------+//-+
206	                 |MH| SD #1 | SD #2 | SD #3 | SD #4 |...|
207	                 |  |       |       |       |       |   |
208	                 +--+-------+-------+-------+-------+//-+

210	               Figure 2: A UEMCLIP frame (bitstream format)

212	   The UEMCLIP bitstream does not include the following information: a)
213	   the codec type, b) Mode, c) I/O sampling frequency, and d) encoder
214	   version.  As described before, these information SHOULD be exchanged
215	   while establishing a connection, for example, by means of SDP.

217	3.1.  RTP Header Usage

219	   Each RTP packet starts with a fixed RTP header, as explained in [3].
220	   The following fields of the RTP fixed header used specifically for
221	   UEMCLIP streams are emphasized:

223	   Payload type:  The assignment of an RTP payload type for this packet
224	      format is outside the scope of this document, however, it is
225	      expected that a payload type in the dynamic range shall be
226	      assigned.

228	   Timestamp:  This encodes the sampling instant of the first speech
229	      signal sample in the RTP data packet.  For UEMCLIP streams, the
230	      RTP timestamp MUST be a multiple of 8 kHz, and in case the
231	      sampling rate can change during a session, this figure should
232	      equal to the maximum rate (in Hz) given in Table 2 .

234	   Marker bit:  If the codec is used for applications with discontinuous
235	      transmission (DTX, or silence compression), the first packet after
236	      a silence period during which packets have not been transmitted
237	      contiguously SHOULD have the marker bit in the RTP data header set
238	      to one.  The marker bit in all other packets MUST be zero.
239	      Applications without DTX MUST set the marker bit to zero.

241	3.2.  Multiple frames in an RTP packet

243	   More than one UEMCLIP frame may be included in a single RTP packet by
244	   a sender.  However, senders have the following additional
245	   restrictions:

247	   o  SHOULD NOT include more UEMCLIP frames in a single RTP packet than
248	      will fit in the MTU of the RTP transport protocol.

250	   o  All frames contained in a single RTP packet MUST be of the same
251	      length, i.e., they MUST have the same bit rate (octets per frame).

253	   o  Frames MUST NOT be split between RTP packets.

255	   It is RECOMMENDED that the number of frames contained within an RTP
256	   packet be consistent with the application.  Since UEMCLIP is designed
257	   form a telephony application where delay is important, then the fewer
258	   frames per packet the lower the delay, is preferable.

260	3.3.  Payload Data

262	3.3.1.  Main Header

264	   The main header (MH) is placed at the top of a payload and has size
265	   of 10 bytes with additional optional enhanced header size.  The
266	   content of the main header is defined in Figure 3.

268	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
269	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
270	   |     ID        |             BS                |      MX       |
271	   |               |                               |               |
272	   |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5|0 1 2 3 4 5 6 7|
273	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
274	   |                              PC                               |
275	   |                                                               |
276	   |0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1|
277	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
278	   |   PC(cont'd)  |      ES       |             EH                |
279	   |               |               |         (if exists)           |
280	   |2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|                             ...
281	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+

283	                 Figure 3: UEMCLIP main header format (MH)

285	   Identification (ID):  8 bits

287	      The value should be "0x95".

289	   Byte size (BS):  16 bits

291	      Indicates the byte size of the following UEMCLIP payload.  This
292	      means that the RTP header size, ID and BS are not included.  It is
293	      encoded in network byte-order.

295	   Mixing information (MX):  8 bits

297	      Mixing information field.

299	   Packet-loss Concealment information (PC):  40 bits

301	      Packet-loss concealment (PLC) information field.

303	   Enhanced-header Size (ES):  8 bits

305	      Size of EH (enhanced header) in bytes.

307	   Enhanced header (EH):  8*ES bits

309	      Content of the enhanced header.  When ES is 0, the enhanced header
310	      is non-existent.

312	3.3.1.1.  Mixing information field

314	                            0 1 2 3 4 5 6 7
315	                           +-+-+-+-+-+-+-+-+
316	                           |C|R|V|   PW1   |
317	                           |1|1|1|         |
318	                           | | | |0 1 2 3 4|
319	                           +-+-+-+-+-+-+-+-+

321	                  Figure 4: Mixing information field (MX)

323	   Check bit #1 (C1):  1 bit

325	      Validity flag of V1 and PW1.  This bit being "1" indicates that
326	      both parameters are valid, and "0" indicates that the parameters
327	      should be ignored.

329	   Reserved bit #1 (R1):  1 bit

331	      This bit should be ignored.

333	   VAD flag #1 (V1):  1 bit

335	      Voice activity detection flag of the current frame.  This flag
336	      being "1" indicates that the frame is an active (voice) segment,
337	      and "0" indicates that it is an inactive (non-voice) or a silent
338	      segment.

340	   Power #1 (PW1):  5 bits

342	      Signal power code of the current frame.

344	3.3.1.2.  PLC information field

346	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
347	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
348	   |C|C|R|V|   K   |R|     P1      |R|     P2      |      PW2      |
349	   |2|3|2|2|       |3|             |4|             |               |
350	   | | | | |0 1 2 3| |0 1 2 3 4 5 6| |0 1 2 3 4 5 6|0 1 2 3 4 5 6 7|
351	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
352	   |      R5       |
353	   |               |
354	   |0 1 2 3 4 5 6 7|
355	   +-+-+-+-+-+-+-+-+

357	                   Figure 5: PLC information field (PC)

359	   Check bit #2 (C2):  1 bit

361	      Validity flag of V2, K, P1, P2, and PW2.  If the flag is "1", it
362	      means that all these parameters are valid, and "0" means that the
363	      parameters should be ignored.  If any of these parameters is
364	      invalid, C1 should be set to "0".

366	   Check bit #3 (C3):  1 bit

368	      Payload validity indicator.  This flag is normally set to "0".  If
369	      a received packet has this flag set to "1", the payload data
370	      should be ignored and packet-loss concealment should be performed
371	      by the receiver.  This flag is used in case of a multi-point
372	      conferencing, where the upstream packet was lost and the mixing
373	      server did not execute packet-loss concealment.

375	   Reserved bit #2 (R2):  1 bit

377	      This bit should be ignored.

379	   VAD flag #2 (V2):  1 bit

381	      Voice activity detection flag of the current frame.  This may be
382	      as same as V1 in the mixing information.

384	   Frame indicator (K):  4 bits

386	      This value indicates the frame offset of P2 and PW2.  Since it is
387	      a better idea to carry the pitch and power parameters as PLC
388	      information in a different frame, this frame offset value gives
389	      which frame the parameters are to be associated with.  Since there
390	      are 4 bits allocated, it ranges between "0" and "15".

392	   Reserved bit #3 (R3):  1 bit

394	      This bit should be ignored.

396	   Pitch lag #1 (P1):  7 bits

398	      Pitch code of the current frame.  The actual pitch lag is
399	      calculated as P1+20 samples in 8-kHz sampling rate.  Pitch lag
400	      must be 20 <= pitch length <= 120.  Codes ranging between "0x65"
401	      and "0x7F" are not used.

403	   Reserved bit #4 (R4):  1 bit
404	      This bit should be ignored.

406	   Pitch lag #2 (P2):  7 bits

408	      Pitch code of the offset frame.  The actual pitch lag is
409	      calculated as P2+20 samples in 8-kHz sampling rate.  Pitch lag
410	      must be 20 <= pitch length <= 120.  Codes ranging between "0x65"
411	      and "0x7F" are not used.  The offset value is defined as K.

413	   Power #2 (PW2):  8 bits

415	      Signal power code of the offset frame.  The offset value is
416	      defined as K.

418	   Reserved bits #5 (R5):  8 bits

420	      These bits should be ignored.

422	3.3.2.  Sub-layer data

424	   Sub-layer data (SD) is a sub-header followed by layer bitstreams, as
425	   shown in Figure 6.  The sub-header indicates the layer location and
426	   the number of bytes.

428	     0                   1                   2
429	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7   . . .
430	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
431	    | CI| FI| QI| R6|      SB       |               LD         ...  |
432	    |   |   |   |   |               |                               |
433	    |0 1|0 1|0 1|0 1|0 1 2 3 4 5 6 7|                               |
434	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+

436	                      Figure 6: Sub-layer format (SD)

438	   Channel index (CI):  2 bits

440	      Indicates the channel number.  For all modes given in Table 2,
441	      this should be "0x1".  The detail is given in Table 3.

443	   Frequency index (FI):  2 bits

445	      Indicates the frequency number. "0" means that the layer is in the
446	      base frequency band, higher number means that the layer is in
447	      respective frequency band.  The detail is given in Table 3.

449	   Quality index (QI):  2 bits

451	      Indicates the quality layer number. "0" means that the layer is in
452	      the base layer, and higher number means that the layer is in
453	      respective quality layer.  The detail is given in Table 3.

455	   Reserved #6 (R6):  2 bits

457	      Not used (reserved).  The value must be "0".

459	   Sub-layer Size (SB):  8 bits

461	      Indicates the byte size of the following sub-layer data.

463	   Layer Data (LD):  SB*8 bits

465	      The actual sub-layer data.

467	3.3.2.1.  Layer index encoding

469	   The layer index is encoded using values of channel number, quality
470	   number, and frequency-band number encoded with 2-bits each, in the
471	   appearing order.  The last 2 bits are reserved for future use, and
472	   all implementation should ignore this field.  For all the layers
473	   shown in Table 1, the layer indices are shown in Table 3.

475	                         +-------+----+----+----+
476	                         | Layer | CI | FI | QI |
477	                         +-------+----+----+----+
478	                         |   a   |  0 |  0 |  0 |
479	                         |       |    |    |    |
480	                         |   b   |  0 |  0 |  1 |
481	                         |       |    |    |    |
482	                         |   c   |  0 |  1 |  0 |
483	                         +-------+----+----+----+

485	                          Table 3: Layer indices

487	4.  G.711 interoperability

489	   As given in Section 2, u-law encoded G.711 bitstream (Layer a) is the
490	   core layer of a UEMCLIP bitstream, and is always embedded.  This
491	   means that transcoding from UEMCLIP bitstream to G.711 does not have
492	   to undergo decoding and re-encoding procedures, but simple extraction
493	   would only suffice.  However, this does not apply for the reverse
494	   procedure, i.e., transcoding from G.711 to UEMCLIP, because the side
495	   information in the main header must be assigned separately.

497	   The transcoding from UEMCLIP to u-law G.711 can be done easily by
498	   finding an appropriate sub-layer.  The transcoder should look for a
499	   sub-layer with the layer index of 0x00, and subsequent LD which has
500	   size of SB*8 bits (usually for 20 ms frame, SB=160) are the actual
501	   G.711 bitstream data.  It should be noted that transcoder should not
502	   always expect the core layer to be located right after the main
503	   header.

505	   On the other hand, the transcoding from G.711 to UEMCLIP is not
506	   entirely straight-forward.  Since there are no means to generate
507	   enhancement sub-layers, a G.711 bitstream can only be converted to
508	   UEMCLIP Mode 0 bitstream.  If the original G.711 bitstream is encoded
509	   in A-law, it should first be converted to u-law to become the core
510	   layer.  Because the default packetization size is 20 ms, u-law
511	   encoded G.711 bitstream MUST be a 160-sample chunk.  For the main
512	   header contents, when the UEMCLIP encoder is not available, it should
513	   follow the following guidelines.

515	   o  ID must be set "0x95".

517	   o  Byte size (BS) should be set 7 bytes of the main header, plus sub-
518	      header size (2) added with number of samples in G.711 (SB) .

520	   o  The enhanced-header size (ES) set to "0x00".

522	   o  The check bit for mixing and PLC (C1 and C2) should be set 0.

524	   o  The payload validity indicator (C3) should be set 0.

526	   For the core layer (i.e., u-law G.711 bitstream), it should have the
527	   following sub-layer header:

529	   o  All CI, FI, QI, R6 MUST be 0.

531	   o  Sub-layer size (SB) MUST be 160 for 20 ms frame.

533	5.  Congestion Control Considerations

535	   The general congestion control considerations for transporting RTP
536	   data apply to UEMCLIP over RTP [3] as well as any applicable RTP
537	   profile like AVP [4].  UEMCLIP does not have any built-in mechanism
538	   for reducing the bandwidth.  Packing more frames in each RTP payload
539	   can reduce the number of packets sent, and hence the overhead from
540	   IP/UDP/RTP headers, at the expense of increased delay and reduced
541	   error robustness against packet losses.

543	6.  Payload Format Parameters

545	6.1.  Media type registration

547	   This registration is done using the template defined in [6] and
548	   following [5].

550	   MIME media type name:  audio

552	   MIME media subtype name:  UEMCLIP

554	   Required parameters:  Mode information: this defines bit-rate,
555	      sampling frequency and layer structure of the bitstream.  This
556	      parameter is necessary because the this is not signaled within the
557	      bitstream.

559	   Optional parameters:  none.

561	   Encoding considerations:  This type is defined for transferring
562	      UEMCLIP-encoded data via RTP using the payload format specified in
563	      Section 3 "Payload Format".  Audio data is binary data and must be
564	      encoded for non-binary transport; the Base64 encoding is suitable
565	      for e-mail.

567	   Security considerations:  See Section 7 "Security Considerations" of
568	      this document.

570	   Interoperability considerations:  This media is interoperable with
571	      u-law encoded ITU-T G.711. see Section 4 "G.711 interoperability"
572	      of this document.

574	   Published specification:  (T.B. assigned)

576	   Applications that use this media type:  Audio and video streaming and
577	      conferencing tools.

579	   Additional information:  None

581	   Intended usage:  COMMON

583	   Person & email address to contact for further information:  Yusuke
584	      Hiwasaki <hiwasaki.yusuke@lab.ntt.co.jp>

586	   Author/Intended change controller:

588	      Author:  Yusuke Hiwasaki

590	      Intended Change Controller:  IETF Audio/Video Transport Working
591	         Group delegated from the IESG

593	6.2.  Mapping to SDP Parameters

595	   Payload type:  Since it is not registered in [4], any RTP packets
596	      that carry UEMCLIP as payload type MUST be treated as a dynamic
597	      payload type.

599	   Codec name:  MIME registered codec name should be used.

601	   Sampling Frequency:  Depending on the mode to communicate, sampling
602	      frequency MUST be selected from the ones defined in Table 2.

604	   Channel numbers:  It SHOULD default to "1", as selected from the ones
605	      defined in Table 2.

607	   Packet intervals:  Since frame length of any UEMCLIP is 20 ms, when
608	      specifying a=ptime line, the argument MUST be a multiple of "20".
609	      When not listed in SDP, it should also default to the minimum
610	      size: "20".

612	   Bandwidth:  As described in [7], bandwidth line is OPTIONAL.  When
613	      there is no bandwidth restrictions, the numbers MUST be the
614	      largest value out of the Table 2, and the unit should be "kbit/s"
615	      with the fraction raised to the unit, including header overheads
616	      down to Layer 3.  If any restrictions apply, then the value MUST
617	      be the largest of the Table 2 that satisfy the restriction, by the
618	      same calculation procedure.  It MUST NOT encode with bit-rate
619	      larger than the answered bit-rate bandwidth.

621	   UMECLIP specific:  Any description specific to UEMCLIP are defined in
622	      the Format Specification Parameters (fmtp).  Each parameters MUST
623	      be separated with ";", and if any attributes (value) exists, it
624	      MUST be defined with "+".  For compatibility reasons, any
625	      application/terminal MUST ignore any parameters that does not
626	      appear below.  This is to ensure the upper-compatibility with
627	      later added parameters for the future enhancements.

629	6.2.1.  Dynamic transmission definition

631	   Since UEMCLIP codec can operate in number of modes, it is desirable
632	   to specify the range of modes that an encoder or a decoder can
633	   operate at.

635	   UEMCLIP decoders are designed to accept bitstreams in any modes.

637	   However, the implementation limitation may fail to adopt to the
638	   dynamic bit-rate change.  Thus introduced here is two concepts:
639	   "dynamic mode" (denoted as "dynmode"), where the dynamic mode (bit-
640	   rate) change is allowed, and "fixed mode" (denoted as "fixmode"),
641	   where the change is not permitted.  Both modes MUST be used
642	   exclusively.

644	   "fixmode" is used to specify no modification of the operating mode
645	   (bit-rate) during the session.  It MUST operate exclusively to
646	   "dynmode".  It should specify the possible combination of mode
647	   numbers, delimited by commas ",".  When offering a "fixmode", the
648	   offerer SHOULD list the mode numbers in descending priority order.
649	   The answerer MUST select a single suitable mode number and reply as
650	   "fixmode" with one argument.

652	   On the other hand, "dynmode" is used to allow modification of the
653	   operating mode during the session.  It MUST operate exclusively to
654	   "fixmode".  The offerer should specify the possible combination of
655	   mode numbers, delimited by commas ",".  The answerer can either
656	   select a number of suitable modes and reply as "dynmode" in the same
657	   manner, or select a single suitable mode number and reply as
658	   "fixmode" with one argument.

660	   The mode numbers that can be specified as arguments to "fixmode" or
661	   "dynmode" are restricted by a combination of a sampling frequency and
662	   a number of audio channels, as shown in Table 2.  This is because SDP
663	   binds a payload type to a combination of a sampling frequency and a
664	   number of audio channels.  When a "fixmode" or "dynmode" is not
665	   given, it MUST be interpreted as being defaulting to the fixed mode
666	   ("fixmode") and MUST use the default value specified in Table 4.

668	         +---------+----------+------------------+--------------+
669	         | Fs [Hz] | Channels | Selectable modes | Default mode |
670	         +---------+----------+------------------+--------------+
671	         |    8000 |     1    |        0,3       |       0      |
672	         |         |          |                  |              |
673	         |   16000 |     1    |        1,4       |       1      |
674	         +---------+----------+------------------+--------------+

676	                          Table 4: Default modes

678	6.3.  Offer-answer Model Considerations

680	   The procedures related to exchanging SDP messages MUST follow [2].

682	   o  When multiple UEMCLIP dynamic payload type number is offered, an
683	      answerer SHOULD select a single payload type number, i.e., one
684	      sampling frequency and channel condition.

686	   o  The ptime SHOULD be 20.

688	   o  An offerer SHOULD offer every possible combination of sampling
689	      frequency, channel number, and fmtp parameters including dynamic/
690	      fixed mode.  When the transmission bandwidth is restricted, it
691	      MUST be offered in accordance to the restriction.

693	   o  When offering/answering SDP, any fmtp parameters which are
694	      undefined MUST be ignored.  If any unknown/undefined parameters
695	      should be offered, an answerer MUST delete the entry from the
696	      answer message.  In this case, the offerer MUST use the default
697	      value for any deleted parameters.

699	   o  If a dynamic mode ("dynmode") is offered, an answerer MUST select
700	      either "dynmode" or "fixmode", according to ones capabilities.
701	      When fixed mode ("fixmode") is offered, an answerer MUST only
702	      answer "fixmode".  In the case of answering fixed mode
703	      ("fixmode"), answerer MUST select a single mode out of offered
704	      mode, regardless of dynamic/fixed mode specification.  If a mode
705	      is not offered at all, the session MUST default to fixed mode, and
706	      the default mode value, as shown in Table 4, MUST be used, based
707	      on the sampling frequency and number of channels specified
708	      elsewhere.

710	   o  When an offered condition does not fit an answerer's capabilities,
711	      it naturally MUST not answer the conditions, and session MAY
712	      proceed to re-INVITE, if possible.  If a condition (mode) is
713	      decided upon, an offerer and an answerer MUST transmit on this
714	      condition.

716	7.  Security Considerations

718	   RTP packets using the payload format defined in this specification
719	   are subject to the security considerations discussed in the RTP
720	   specification [3] and any appropriate profiles.  This implies that
721	   confidentiality of the media streams is achieved by encryption.

723	   A potential denial-of-service threat exists for data encoding using
724	   compression techniques that have non-uniform receiver-end
725	   computational load.  The attacker can inject pathological datagrams
726	   into the stream that are complex to decode and cause the receiver
727	   output to become overloaded.  However, UEMCLIP covered in this
728	   document do not exhibit any significant non-uniformity.

730	   Another potential threats are memory attacks by illegal layer indices
731	   or byte numbers.  The implementor of the decoder should always be
732	   aware that the indicated numbers may be corrupted and does not point
733	   to the right sub-layer or the allows reading beyond the bitstream
734	   boundaries.

736	8.  IANA Considerations

738	   It is requested that one new media subtype (audio/UEMCLIP) is
739	   registered by IANA.  For details, see Section 6.1.

741	9.  Normative References

743	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
744	        Levels", BCP 14, RFC 2119, March 1997.

746	   [2]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
747	        Session Description Protocol (SDP)", RFC 3264, June 2002.

749	   [3]  Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
750	        "RTP: A Transport Protocol for Real-Time Applications", STD 64,
751	        RFC 3550, July 2003.

753	   [4]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
754	        Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

756	   [5]  Casner, S. and P. Hoschka, "MIME Type Registration of RTP
757	        Payload Formats", RFC 3555, July 2003.

759	   [6]  Freed, N. and J. Klensin, "Media Type Specifications and
760	        Registration Procedures", BCP 13, RFC 4288, December 2005.

762	   [7]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
763	        Description Protocol", RFC 4566, July 2006.

765	Authors' Addresses

767	   Yusuke Hiwasaki
768	   NTT Corporation
769	   3-9-11 Midori-cho,
770	   Musashino-shi
771	   Tokyo  180-8585
772	   Japan

774	   Phone: +81(422)59-4815
775	   Email: hiwasaki.yusuke@lab.ntt.co.jp

777	   Hitoshi Ohmuro
778	   NTT Corporation
779	   3-9-11 Midori-cho,
780	   Musashino-shi
781	   Tokyo  180-8585
782	   Japan

784	   Phone: +81(422)59-2151
785	   Email: ohmuro.hitoshi@lab.ntt.co.jp

787	Full Copyright Statement

789	   Copyright (C) The IETF Trust (2007).

791	   This document is subject to the rights, licenses and restrictions
792	   contained in BCP 78, and except as set forth therein, the authors
793	   retain all their rights.

795	   This document and the information contained herein are provided on an
796	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
797	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
798	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
799	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
800	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
801	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

803	Intellectual Property

805	   The IETF takes no position regarding the validity or scope of any
806	   Intellectual Property Rights or other rights that might be claimed to
807	   pertain to the implementation or use of the technology described in
808	   this document or the extent to which any license under such rights
809	   might or might not be available; nor does it represent that it has
810	   made any independent effort to identify any such rights.  Information
811	   on the procedures with respect to rights in RFC documents can be
812	   found in BCP 78 and BCP 79.

814	   Copies of IPR disclosures made to the IETF Secretariat and any
815	   assurances of licenses to be made available, or the result of an
816	   attempt made to obtain a general license or permission for the use of
817	   such proprietary rights by implementers or users of this
818	   specification can be obtained from the IETF on-line IPR repository at
819	   http://www.ietf.org/ipr.

821	   The IETF invites any interested party to bring to its attention any
822	   copyrights, patents or patent applications, or other proprietary
823	   rights that may cover technology that may be required to implement
824	   this standard.  Please address the information to the IETF at
825	   ietf-ipr@ietf.org.

827	Acknowledgment

829	   Funding for the RFC Editor function is provided by the IETF
830	   Administrative Support Activity (IASA).