idnits 2.17.1 draft-ietf-avt-rtp-uemclip-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 14, 2009) is 5453 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU-T G.711' -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Audio/Video Transport Y. Hiwasaki 3 Internet-Draft H. Ohmuro 4 Intended status: Standards Track NTT Corporation 5 Expires: November 15, 2009 May 14, 2009 7 RTP payload format for mU-law EMbedded Codec for Low-delay IP 8 communication (UEMCLIP) speech codec 9 draft-ietf-avt-rtp-uemclip-06 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on November 15, 2009. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 This document describes the RTP payload format of a mU-law EMbedded 48 Coder for Low-delay IP communication (UEMCLIP), an enhanced speech 49 codec of ITU-T G.711. The bitstream has a scalable structure with an 50 embedded u-law bitstream, also known as PCMU, thus providing a handy 51 transcoding operation between narrowband and wideband speech. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Media Format Background . . . . . . . . . . . . . . . . . . . 4 58 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 6 60 3.2. Multiple frames in an RTP packet . . . . . . . . . . . . . 7 61 3.3. Payload Data . . . . . . . . . . . . . . . . . . . . . . . 8 62 3.3.1. Main Header . . . . . . . . . . . . . . . . . . . . . 8 63 3.3.2. Sub-layer . . . . . . . . . . . . . . . . . . . . . . 11 64 4. Transcoding between UEMCLIP and G.711 . . . . . . . . . . . . 13 65 5. Congestion Control Considerations . . . . . . . . . . . . . . 14 66 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 15 67 6.1. Media type registration . . . . . . . . . . . . . . . . . 15 68 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 16 69 6.2.1. Mode specification . . . . . . . . . . . . . . . . . . 17 70 6.3. Offer-Answer Model Considerations . . . . . . . . . . . . 17 71 6.3.1. Offer-Answer Guidelines . . . . . . . . . . . . . . . 18 72 6.3.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 18 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 75 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 76 9.1. Normative References . . . . . . . . . . . . . . . . . . . 23 77 9.2. Informative References . . . . . . . . . . . . . . . . . . 23 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 80 1. Introduction 82 This document specifies the payload format for sending "mU-law 83 EMbedded Coder for Low-delay IP communication" (UEMCLIP) encoded 84 speech using the Real-time Transport Protocol (RTP) [RFC3550]. 85 UEMCLIP is a proprietary codec which enhances u-law ITU-T G.711 86 [ITU-T G.711], and designed to help the market for smooth transition 87 towards the forthcoming wideband communication environment while 88 achieving a very little media transcoding load with the existing 89 terminals, in which the implementation of G.711 is mandatory. 91 It should be noted that, generally speaking, codecs are negotiated 92 and changed using an SDP exchange. Also, [RFC3550] defines general 93 RTP mixer and translator models, where media transcoding may not take 94 place at the node. For those cases, the design concept of the 95 embedded structure is not useful. However, there are other cases 96 when costly transcoding is unavoidable in commonly deployed types of 97 Multi-point Control Units (MCUs) which terminates media and RTCP 98 packets [RFC5117], and when narrowband and wideband terminals co- 99 exist. This embedded bitstream structure can reduce the media 100 transcoding to a simple bitstream truncation. 102 The background and the basic idea of the media format is described in 103 Section 2. The details of the payload format are given in Section 3. 104 The transcoding issues with G.711 are discussed in Section 4, and the 105 considerations for congestion control are in Section 5. In 106 Section 6, the payload format parameters for a media type 107 registration for UEMCLIP RTP payload format and SDP mappings are 108 provided. The security considerations and IANA considerations are 109 dealt in Section 7 and Section 8, respectively. 111 1.1. Terminology 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in [RFC2119]. 117 2. Media Format Background 119 UEMCLIP is an enhanced version of u-law ITU-T G.711, otherwise known 120 as PCMU [RFC4856]. It is targeted at Voice over Internet Protocol 121 (VoIP) applications, and its main goal is to provide a wideband 122 communication platform that is highly interoperable with existing 123 terminals equipped with G.711, and to stimulate the market to 124 gradually shift to using wideband communication. In widely deployed 125 multi-point conferencing systems, the packets usually go through 126 RTCP-terminating MCUs, "Topo-RTCP-terminating-MCU" as defined in 127 [RFC5117]. Because the G.711 bitstream is embedded in the bitstream, 128 costly media transcoding can be avoided in this case. 130 This document does not discuss the implementation details of the 131 encoder and decoder, but only describes the bitstream format. 133 Because of its scalable nature, there are a number of sub-bitstreams 134 (sub-layer) in a UEMCLIP bitstream. By choosing appropriate sub- 135 layers, the codec can adapt to the following requirements: 137 o Sampling frequency, 139 o Number of channels, 141 o Speech quality, and 143 o Bit-rate. 145 The UEMCLIP codec operates at 20-ms frame, and includes three sub- 146 coders as shown in Table 1. The core layer is u-law G.711 at 64 147 kbit/s, and other two are quality and bandwidth enhancement layers 148 with bit-rate of 16 kbit/s each. 150 +-------+---------------------+----------+--------------------------+ 151 | Layer | Description | Bit-rate | Coding algorithm | 152 +-------+---------------------+----------+--------------------------+ 153 | a | G.711 core | 64 | u-law PCM | 154 | | | | | 155 | b | Lower-band | 16 | Time domain block | 156 | | enhancement | | quantization | 157 | | | | | 158 | c | Higher-band | 16 | MDCT block quantization | 159 +-------+---------------------+----------+--------------------------+ 161 Table 1: Sub-layer description 163 Based on these sub-layers, UEMCLIP codec operates in four modes as 164 shown in Table 2. Here, "Ch" is the number of channels and "Fs" is 165 the sampling frequency in kHz. It should be noted that the current 166 version only supports single channel operation and there might be 167 future extensions with multi-channel capabilities. The absent Modes 168 2 and 5 are reserved for possible future extension to 32 kHz sampling 169 modes. As the mode definition is expected to grow, any other modes 170 not defined in this table MUST NOT be used for compatibility and 171 interoperability reasons. 173 +------+----+----+-------+-------+-------+-------------+------------+ 174 | Mode | Ch | Fs | Layer | Layer | Layer | Bit-rate | Total | 175 | | | | a | b | c | w/o headers | bit-rate | 176 | | | | | | | [kbps] | [kbps] | 177 +------+----+----+-------+-------+-------+-------------+------------+ 178 | 0 | 1 | 8 | x | - | - | 64 | 67.2 | 179 | | | | | | | | | 180 | 1 | 1 | 16 | x | - | x | 80 | 84.0 | 181 | | | | | | | | | 182 | 2 | - | - | - | - | - | - | - | 183 | | | | | | | | | 184 | 3 | 1 | 8 | x | x | - | 80 | 84.0 | 185 | | | | | | | | | 186 | 4 | 1 | 16 | x | x | x | 96 | 100.8 | 187 | | | | | | | | | 188 | 5 | - | - | - | - | - | - | - | 189 +------+----+----+-------+-------+-------+-------------+------------+ 191 Table 2: Mode description 193 UEMCLIP bitstream contains internal headers and other side- 194 information apart from the layer data. This results in total bit- 195 rate larger than the sum of the layers shown in the above table. The 196 detail of the internal headers and auxiliary information are 197 described in Section 3.3.1. 199 Defining the sampling frequency and the number of channels does not 200 result in a singular mode, i.e., there can be multiple modes for the 201 same sampling frequency or number of channels. The supported modes 202 would differ between implementations, thus the sender and the 203 receiver must negotiate what mode to use for transmission. 205 3. Payload Format 207 As an RTP payload, UEMCLIP bitstream can contain one or more frames 208 as shown in Figure 1. 210 0 1 2 3 211 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 213 | RTP Header | 214 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 215 | | 216 | one or more frames of UEMCLIP | 217 | | 218 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 220 Figure 1: RTP payload format 222 UEMCLIP bitstream has a scalable structure, thus it is possible to 223 reconstruct the signal by decoding a part of it. A UEMCLIP frame is 224 composed of a main header (MH) followed by one or more (up to three) 225 sub-layers (SL) as shown in Figure 2. 227 +--+-------+//-+ 228 |MH| SL #1 |...| 229 +--+-------+//-+ 231 Figure 2: A UEMCLIP frame (bitstream format) 233 As a sub-layer, the core layer, i.e., "Layer a", MUST always be 234 included. It should be noted that the location of the core layer may 235 or may not immediately follow MH field. The decoder MUST always 236 refer to the layer indices for proper decoding because the order of 237 the sub-layers is arbitrary. 239 The UEMCLIP bitstream does not explicitly include the following 240 information: mode and sampling frequency (Fs). As described before, 241 this information MUST be exchanged while establishing a connection, 242 for example, by means of SDP. 244 3.1. RTP Header Usage 246 Each RTP packet starts with a fixed RTP header, as explained in 247 [RFC3550]. The following fields of the RTP fixed header used 248 specifically for UEMCLIP streams are emphasized: 250 Payload type: The assignment of an RTP payload type for this packet 251 format is outside the scope of this document, however, it is 252 expected that a payload type in the dynamic range shall be 253 assigned. 255 Timestamp: This encodes the sampling instant of the first speech 256 signal sample in the RTP data packet. For UEMCLIP streams, the 257 RTP timestamp MUST advance based on a clock either at 8000 or 258 16000 (Hz). In cases where the audio sampling rate can change 259 during a session, the RTP timestamp rate MUST equal to the maximum 260 rate (in Hz) given in the mode range (see Section 6.2.1). This 261 implies that the RTP timestamp rate for UEMCLIP payload type MUST 262 NOT change during a session. For example, for a UEMCLIP stream 263 with 8-kHz audio sampling, where a transition to a 16-kHz audio 264 sampling mode is allowed, the RTP time stamp must always advance 265 using 16-kHz clock rate. For a fixed audio sampling mode, the RTP 266 timestamp rate should be either 8 or 16 kHz, depending on the 267 sampling rate. 269 Marker bit: If the codec is used for applications with discontinuous 270 transmission (DTX, or silence compression), the first packet after 271 a silence period during which packets have not been transmitted 272 contiguously SHOULD have the marker bit in the RTP data header set 273 to one. The marker bit in all other packets MUST be zero. 274 Applications without DTX MUST set the marker bit to zero. 276 3.2. Multiple frames in an RTP packet 278 More than one UEMCLIP frame may be included in a single RTP packet by 279 a sender. However, senders have the following additional 280 restrictions: 282 o A single RTP packet SHOULD NOT include more UEMCLIP frames than 283 will fit in the path MTU. 285 o All frames contained in a single RTP packet MUST be of the same 286 mode. 288 o Frames MUST NOT be split between RTP packets. 290 It is RECOMMENDED that the number of frames contained within an RTP 291 packet be consistent with the application. Since UEMCLIP is designed 292 for telephony application where delay has a great impact on the 293 quality, then fewer frames per packet for lower delay, is preferable. 295 3.3. Payload Data 297 In a UEMCLIP bitstream, all numbers are encoded in a network byte- 298 order. 300 3.3.1. Main Header 302 The main header (MH) is placed at the top of a frame and has size of 303 6 bytes. The content of the main header is shown in Figure 3. 305 0 1 2 3 306 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 308 | MX | PC | 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | PC(cont'd) | 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 Figure 3: UEMCLIP main header format (MH) 315 Mixing information (MX): 8 bits 317 Mixing information field. This field is only relevant when Topo- 318 RTCP-terminating-MCUs are utilized to interpret these fields. See 319 Section 3.3.1.1 for details of the fields. 321 Packet-loss Concealment information (PC): 40 bits 323 Packet-loss concealment (PLC) information field. See 324 Section 3.3.1.2. 326 3.3.1.1. Mixing information field 328 0 1 2 3 4 5 6 7 329 +-+-+-+-+-+-+-+-+ 330 |C|R|V| PW1 | 331 |1|1|1| | 332 +-+-+-+-+-+-+-+-+ 334 Figure 4: Mixing information field (MX) 336 Check bit #1 (C1): 1 bit 338 Validity flag of V1 and PW1. This bit being "1" indicates that 339 both parameters are valid, and "0" indicates that the parameters 340 should be ignored. If any of these parameters is invalid, this 341 bit should be set to "0". This flag is mainly intended for a 342 UEMCLIP-conscious Topo-RTCP-terminating-MCU. This flag should be 343 set to "0" in case of upward transcoding from G.711 (See 344 Section 4). 346 Reserved bit #1 (R1): 1 bit 348 This bit should be ignored. The default of this bit is 0. 350 VAD flag #1 (V1): 1 bit 352 Voice activity detection flag of the current frame, designed to be 353 used for MCU operations. This flag being "1" indicates that the 354 frame is an active (voice) segment, and "0" indicates that it is 355 an inactive (non-voice) or a silent segment. This flag is 356 specifically designed for mixing information. DTX judgment based 357 this flag is not recommended. 359 Power #1 (PW1): 5 bits 361 Signal power code of the current frame. The code is obtained by 362 calculating a root mean square (RMS) of "Layer a" and encoding 363 this RMS using G.711 u-law [ITU-T G.711]. Denoting the encoded 364 RMS as R, then PW1 is obtained by PW1 = ((~R)>>2) & 0x1F, where 365 "~", ">>", "&" are one's complement arithmetic, right SHIFT, and 366 bitwise AND operators, respectively. 368 3.3.1.2. PLC information field 370 0 1 2 3 371 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 |C|R2 |V| K |U| P1 |U| P2 | PW2 | 374 |2| |2| |1| |2| | | 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | R3 | 377 | | 378 +-+-+-+-+-+-+-+-+ 380 Figure 5: PLC information field (PC) 382 Check bit #2 (C2): 1 bit 384 Validity flag of V2, K, U1, P1, U2, P2, and PW2. If the flag is 385 "1", it means that all these parameters are valid, and "0" means 386 that the parameters should be ignored. If any of these parameters 387 is invalid, this bit should be set to "0". Similarly to C1, this 388 flag should be set to "0" in case of upward transcoding from G.711 389 (See Section 4). 391 Reserved bit #2 (R2): 2 bits 393 These bits should be ignored. The default of these bits are 0. 395 VAD flag #2 (V2): 1 bit 397 Voice activity detection flag of the current frame, designed to be 398 used for packet-loss concealment. This might not be as same as V1 399 in the mixing information, and might not be synchronous to the 400 marker bit in the RTP header. DTX judgment based this flag is not 401 recommended. 403 Frame indicator (K): 4 bits 405 This value indicates the frame offset of U2, P2 and PW2. Since it 406 is a better idea to carry the speech feature parameters as PLC 407 information in a different frame to maintain the speech quality, 408 this frame offset value gives which frame the parameters are to be 409 associated with. The value ranges between "0" and "15". If the 410 current frame number is N, for example, the value K indicates that 411 U2, P2 and PW2 are associated with frame of N-K. Frame indicator 412 is equal to the difference in the RTP sequence number when one 413 UEMCLIP frame is contained in a single RTP packet. 415 V/UV flag #1 (U1): 1 bit 417 Voiced/Unvoiced signal indicator of the current frame. This flag 418 being "0" indicates that the frame is a voiced signal segment, and 419 "1" indicates that it is an unvoiced signal segment. 421 Pitch lag #1 (P1): 7 bits 423 Pitch code of the current frame. The actual pitch lag is 424 calculated as P1+20 samples in 8-kHz sampling rate. Pitch lag 425 must be 20 <= pitch length <= 120. Codes ranging between "0x65" 426 and "0x7F" are not used. To obtain the pitch lag, any pitch 427 estimation method can be used, such as the one used in G.711 428 Appendix I [ITU-T G.711 Appendix 1]. 430 V/UV flag #2 (U2): 1 bit 432 Voiced/Unvoiced signal indicator of the offset frame. This flag 433 being "0" indicates that the frame is a voiced signal segment, and 434 "1" indicates that it is an unvoiced signal segment. The offset 435 value is defined as K. 437 Pitch lag #2 (P2): 7 bits 439 Pitch code of the offset frame. The offset value is defined as K. 440 The calculation method is identical to "P1", except that it is 441 based on the signal of offset frame. 443 Power #2 (PW2): 8 bits 445 Signal power code of the offset frame. The offset value is 446 defined as K. 448 Reserved bits #3 (R3): 8 bits 450 These bits should be ignored. The default of all bits are "0". 452 3.3.2. Sub-layer 454 Sub-layer (SL) is a sub-header followed by layer bitstreams, as shown 455 in Figure 6. The sub-header indicates the layer location and the 456 number of bytes. 458 0 1 2 459 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 . . . 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+ 461 |CI |FI |QI |R4 | SB | LD ... | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+ 464 Figure 6: Sub-layer format (SL) 466 Channel index (CI): 2 bits 468 Indicates the channel number. For all modes given in Table 2, 469 this should be "0". The detail is given in Table 3. 471 Frequency index (FI): 2 bits 473 Indicates the frequency number. "0" means that the layer is in the 474 base frequency band, higher number means that the layer is in 475 respective frequency band. The detail is given in Table 3. 477 Quality index (QI): 2 bits 479 Indicates the quality layer number. "0" means that the layer is in 480 the base layer, and higher number means that the layer is in 481 respective quality layer. The detail is given in Table 3. 483 Reserved #4 (R4): 2 bits 485 Not used (reserved). The default value is "0". 487 Sub-layer Size (SB): 8 bits 489 Indicates the byte size of the following sub-layer data. 491 Layer Data (LD): SB*8 bits 493 The actual sub-layer data. 495 For all the layers shown in Table 1, the layer indices are shown in 496 Table 3. 498 +-------+----+----+----+ 499 | Layer | CI | FI | QI | 500 +-------+----+----+----+ 501 | a | 0 | 0 | 0 | 502 | | | | | 503 | b | 0 | 0 | 1 | 504 | | | | | 505 | c | 0 | 1 | 0 | 506 +-------+----+----+----+ 508 Table 3: Layer indices 510 4. Transcoding between UEMCLIP and G.711 512 As given in Section 2, u-law encoded G.711 bitstream (Layer a) is the 513 core layer of a UEMCLIP bitstream, and is always embedded. This 514 means that media transcoding from UEMCLIP bitstream to G.711 does not 515 have to undergo decoding and re-encoding procedures, but simple 516 extraction would suffice. However, this does not apply for the 517 reverse procedure, i.e., transcoding from G.711 to UEMCLIP, because 518 the auxiliary information in the main header (MH) must be assigned 519 separately. It should be noted that this media transcoding is useful 520 for a Media Translator (Topo-Media-Translator) or a Point to 521 Multipoint Using RTCP Terminating MCU (Topo-RTCP-terminating-MCU) in 522 [RFC5117] and all the requirements apply. This means that a 523 transcoding device of this sort MUST rewrite RTCP packets, together 524 with the RTP media packets. 526 The transcoding from UEMCLIP to u-law G.711 can be done easily by 527 finding an appropriate sub-layer. Within a frame, the transcoder 528 should look for a sub-layer with layer index "0x00", and subsequent 529 LD which has size of SB*8 bits (UEMCLIP has a 20-ms frame thus, 530 SB=160) are the actual G.711 bitstream data. It should be noted that 531 transcoder should not always expect the core layer to be located 532 right after the main header. 534 On the other hand, the transcoding from G.711 to UEMCLIP is not 535 entirely straight-forward. Since there are no means to generate 536 enhancement sub-layers, a G.711 bitstream can only be converted to 537 UEMCLIP Mode 0 bitstream. If the original G.711 bitstream is encoded 538 in A-law, it should first be converted to u-law to become the core 539 layer. Because a UEMCLIP frame size is 20 ms, u-law encoded G.711 540 bitstream MUST be a 160-sample chunk to become a core layer. For the 541 main header contents, when the UEMCLIP encoder is not available, it 542 should follow the following guidelines. 544 o The check bits for mixing and PLC (C1 and C2) are set to 0. 546 o The reserved bits (R1 to R3) in MH are set to respective default 547 values. 549 For the core layer (i.e., u-law G.711 bitstream), it should have the 550 following sub-layer header: 552 o All CI, FI, QI, and R4 MUST be 0. 554 o Sub-layer size (SB) MUST be 160 for 20 ms frame. 556 5. Congestion Control Considerations 558 The general congestion control considerations for transporting RTP 559 data also apply to UEMCLIP over RTP [RFC3550] as well as any 560 applicable RTP profile like AVP [RFC3551]. 562 The bandwidth of a UEMCLIP bitstream can be reduced by changing to 563 lower-bit-rate modes. The embedded layer structure of UEMCLIP may 564 help to control congestion, when dynamic mode changing (see 565 Section 6.2.1) is available, and the range of modes is obtained by 566 offer-answer negotiation as given in Section 6.3. It should be noted 567 that this involves proper RTCP handling when the bit-rate is modified 568 in an RTP translator or a mixer [RFC3550]. 570 Packing more frames in each RTP payload can reduce the number of 571 packets sent, and hence the overhead from IP/UDP/RTP headers, at the 572 expense of increased delay and reduced error robustness against 573 packet losses. It should be treated with care because increased 574 delay means reduced quality. 576 6. Payload Format Parameters 578 6.1. Media type registration 580 This registration is done using the template defined in [RFC4288] and 581 following [RFC4855]. 583 Media type name: audio 585 Media subtype name: UEMCLIP 587 Required parameters: 589 Rate: Defines the sampling rate, and MUST be either 8000 or 590 16000. See Section 6.2.1 "Mode specification" of RFC xxxx 591 (this RFC) for details. 593 Optional parameters: 595 ptime: See RFC 4566 [RFC4566]. 597 maxptime: See RFC 4566 [RFC4566]. 599 mode: Indicates the range of dynamically changeable modes during 600 a session. Possible values are comma separated list of modes 601 from the supported mode set: 0, 1, 3, and 4. If only one mode 602 is specified, it means that the mode must not be changed during 603 the session. When not specified, the mode transmission 604 defaults to a singular mode as specified in Table 4. See 605 Section 6.2.1 "Mode specification" of RFC xxxx (this RFC) for 606 details. 608 Encoding considerations: This media type is framed and contains 609 binary data. See Section 4.8 of RFC 4288. 611 Security considerations: See Section 7 "Security Considerations" of 612 RFC xxxx (this RFC). 614 Interoperability considerations: This media may be readily 615 transcoded to u-law encoded ITU-T G.711. See Section 4 616 "Transcoding between UEMCLIP and G.711" of RFC xxxx (this RFC). 618 Published specification: RFC xxxx (This RFC) 620 Applications that use this media type: Audio and video streaming and 621 conferencing tools. 623 Additional information: None 625 Intended usage: COMMON 627 Restrictions on usage: This media type depends on RTP framing, and 628 hence is only defined for transfer via RTP. 630 Person & email address to contact for further information: Yusuke 631 Hiwasaki 633 Author: 635 Author: Yusuke Hiwasaki 637 Change Controller: IETF Audio/Video Transport Working Group 638 delegated from the IESG 640 6.2. Mapping to SDP Parameters 642 The media types audio/UEMCLIP are mapped to fields in the Session 643 Description Protocol (SDP) [RFC4566] as follows: 645 Media name: The "m=" line of SDP MUST be audio. 647 Encoding name: Registered media subtype name should be used for the 648 "a=rtpmap" line. 650 Sampling Frequency: Depending on the mode, clock rate (sampling 651 frequency) specified in "a=rtpmap" MUST be selected from the ones 652 defined in Table 2. See Section 6.2.1 for details. 654 Encoding parameters: Since this is an audio stream, the encoding 655 parameters indicate the number of audio channels, and this SHOULD 656 default to "1", as selected from the ones defined in Table 2. 657 This is OPTIONAL. 659 Packet time: A frame length of any UEMCLIP is 20 ms, thus the 660 argument of "a=ptime" SHOULD be a multiple of "20". When not 661 listed in SDP, it should also default to the minimum size: "20". 663 UMECLIP specific: Any description specific to UEMCLIP are defined in 664 the Format Specification Parameters ("a=fmtp"). Each parameter 665 MUST be separated with ";", and if any attributes (value) exists, 666 it MUST be defined with "=". For compatibility reasons, any 667 application/terminal MUST ignore any parameters that it does not 668 understand. This is to ensure the upper-compatibility with 669 parameters added in future enhancements. The mode specification 670 should be made here (see Section 6.2.1). 672 6.2.1. Mode specification 674 Since UEMCLIP codec can operate in number of modes (bit-rates), it is 675 desirable to specify the range of modes that an encoder or a decoder 676 can operate at. When exchanging SDP messages, an offerer should 677 specify all possible combination of mode numbers as arguments to 678 "mode=" in "a=fmtp" line, delimited by commas ",". In case of 679 specifying multiple modes, those SHOULD appear in the descending 680 priority order. 682 Although UEMCLIP decoders SHOULD accept bitstreams in any modes, an 683 implementation may fail to adopt to the dynamic mode changes during a 684 session. For this reason, an application may choose to operate 685 either with one fixed mode or with multiple modes that can be 686 dynamically changed. If the mode is to be fixed and changes are not 687 allowed, this can be indicated by specifying a single mode per 688 payload type. 690 The mode numbers that can be specified in a payload type as arguments 691 to "mode" are restricted by a combination of a clock rate and a 692 number of audio channels. This is because SDP binds a payload type 693 to a combination of a sampling frequency and a number of audio 694 channels. Table 4 gives selectable mode numbers that attributed with 695 clock rates. When mode specifications are not given at all, a 696 payload type MUST default to a single mode using the default value 697 specified in this table. 699 +------------+----------+------------------+--------------+ 700 | Clock rate | Channels | Selectable modes | Default mode | 701 +------------+----------+------------------+--------------+ 702 | 8000 | 1 | 0,3 | 0 | 703 | | | | | 704 | 16000 | 1 | 0,1,3,4 | 1 | 705 +------------+----------+------------------+--------------+ 707 Table 4: Default modes 709 It should be noted that a mode attributed with a larger sampling 710 frequency (Fs) is not used in conjunction with smaller clock rates 711 specified in "a=rtpmap". This means that Modes 0 and 3 can be 712 specified in a payload type having clock rate of both 8000 and 16000 713 in "a=rtpmap", but Modes 1 and 4 cannot be specified with one having 714 clock rate of 8000. 716 6.3. Offer-Answer Model Considerations 717 6.3.1. Offer-Answer Guidelines 719 The procedures related to exchanging SDP messages MUST follow 720 [RFC3264]. The following is a detalied list on the semantics of 721 using the UEMCLIP payload format in an offer-answer exchange. 723 o An offerer SHOULD offer every possible combination of UEMCLIP 724 payload type it can handle, i.e., sampling frequency, channel 725 number, and fmtp parameters, in a preferred order. When the 726 transmission bandwidth is restricted, it MUST be offered in 727 accordance to the restriction. 729 o When multiple UEMCLIP payload types are offered, it is RECOMMENDED 730 that the answerer selects a single UEMCLIP payload type and 731 answers it back. 733 o In a UEMCLIP payload type, an answerer MUST answer back suitable 734 mode number(s) as a subset of what has been offered. This means 735 that there is a symmetry assumption on sent and received streams, 736 and offerer MUST NOT send in modes that it does not offer. 738 o In an offering/answering SDP, any fmtp parameters which are not 739 known MUST be ignored. If any unknown/undefined parameters should 740 be offered, an answerer MUST delete the entry from the answer 741 message. 743 o A receiver of an SDP message MUST only use specified payload types 744 and modes. When a mode specification is missing, i.e., a mode is 745 not specified at all, the session MUST default to one single mode 746 without mode changes during a session. For this case, the default 747 mode values, as shown in Table 4, MUST be used based on the 748 sampling frequency and number of channels. This table must be 749 looked up only when there are no mode specifications, thus the 750 offerer/answerer MUST NOT assume that the default modes are always 751 available when it is not in the specified list of modes. 753 o When an offered condition does not fit an answerer's capabilities, 754 it naturally MUST NOT answer any of the conditions, and session 755 MAY proceed to re-INVITE, if possible. If a condition (mode) is 756 decided upon, an offerer and an answerer MUST transmit on this 757 condition. 759 6.3.2. Examples 761 When an offerer indicates that he/she wishes to dynamically switch 762 between modes (0,1,3, and 4) during a session, an example of an 763 offered SDP can be: 765 v=0 766 o=john 51050101 51050101 IN IP4 offhost.example.com 767 s=- 768 c=IN IP4 offhost.example.com 769 t=0 0 770 m=audio 5004 RTP/AVP 96 771 a=rtpmap:96 UEMCLIP/16000/1 772 a=fmtp:96 mode=4,1,3,0 774 It should be noted that the listed modes appears in the offerer's 775 preference. 777 When an answerer can only operate in Modes 1 and 0 but can 778 dynamically switch between those modes during a session, an answerer 779 MUST delete the entries of Mode 3 and 4, and answer back as: 781 v=0 782 o=lena 549947322 549947322 IN IP4 anshost.example.org 783 s=- 784 c=IN IP4 anshost.example.org 785 t=0 0 786 m=audio 5004 RTP/AVP 96 787 a=rtpmap:96 UEMCLIP/16000/1 788 a=fmtp:96 mode=1,0 790 As a result, both would start communicating in either Mode 1 or 0, 791 and can dynamically switch between those modes during the session. 793 On the other hand, when the answerer is capable of communicating 794 either in Modes 1 or 0, and cannot switch between modes during a 795 session, an example of such answer is as follows: 797 v=0 798 o=lena 549947322 549947322 IN IP4 anshost.example.org 799 s=- 800 c=IN IP4 anshost.example.org 801 t=0 0 802 m=audio 5004 RTP/AVP 96 803 a=rtpmap:96 UEMCLIP/16000/1 804 a=fmtp:96 mode=1 806 As a result, both will start communicating in Mode 1. It should be 807 noted that mode change during this session is not allowed because the 808 answerer responded with a single mode, and answerer selected Mode 1 809 above Mode 0 according to the offered order. 811 If an offerer does not want a mode change during a session but is 812 capable of receiving either Modes 4 or 1 bitstreams, the SDP should 813 somewhat look like: 815 v=0 816 o=john 51050101 51050101 IN IP4 offhost.example.com 817 s=- 818 c=IN IP4 offhost.example.com 819 t=0 0 820 m=audio 5004 RTP/AVP 96 97 821 a=rtpmap:96 UEMCLIP/16000/1 822 a=fmtp:96 mode=4 823 a=rtpmap:97 UEMCLIP/16000/1 824 a=fmtp:97 mode=1 826 and if the answerer prefers to communicate in Mode 1, an answer would 827 be: 829 v=0 830 o=lena 549947322 549947322 IN IP4 anshost.example.org 831 s=- 832 c=IN IP4 anshost.example.org 833 t=0 0 834 m=audio 5004 RTP/AVP 97 835 a=rtpmap:97 UEMCLIP/16000/1 836 a=fmtp:97 mode=1 838 Please note that it is RECOMMENDED to select a single UEMCLIP payload 839 type for answers. 841 The "ptime" attribute is used to denote the desired packetization 842 interval. When not specified, it SHOULD default to 20. Since 843 UEMCLIP uses 20 msec frames, ptime values of multiples of 20 imply 844 multiple frames per packet. In the example below, the ptime is set 845 to 60, and this means that offerer wants to receive 3 frames in each 846 packet. 848 v=0 849 o=kosuke 2890844730 2890844730 IN IP4 anotherhost.example.com 850 s=- 851 c=IN IP4 anotherhost.example.com 852 t=0 0 853 m=audio 5004 RTP/AVP 96 854 a=ptime:60 855 a=rtpmap:96 UEMCLIP/16000/1 857 When mode specification is not present, it should default to a fixed 858 mode, and in this case, Mode 1 (see Section 6.2.1). 860 7. Security Considerations 862 RTP packets using the payload format defined in this specification 863 are subject to the security considerations discussed in the RTP 864 specification [RFC3550] and any appropriate profiles. This implies 865 that confidentiality of the media streams is achieved by encryption 866 unless the applicable profile specifies other means. 868 A potential denial-of-service threat exists for data encoding using 869 compression techniques that have non-uniform receiver-end 870 computational load. The attacker can inject pathological datagrams 871 into the stream that are complex to decode and cause the receiver 872 output to become overloaded. However, UEMCLIP covered in this 873 document do not exhibit any significant non-uniformity. 875 Another potential threats are memory attacks by illegal layer indices 876 or byte numbers. The implementor of the decoder should always be 877 aware that the indicated numbers may be corrupted and does not point 878 to the right sub-layer and may force reading beyond the bitstream 879 boundaries. It is advised that a decoder implementation rejects 880 layers such indices. 882 8. IANA Considerations 884 It is requested that one new media subtype (audio/UEMCLIP) is 885 registered by IANA. For details, see Section 6.1. 887 9. References 889 9.1. Normative References 891 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 892 Requirement Levels", BCP 14, RFC 2119, March 1997. 894 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 895 with Session Description Protocol (SDP)", RFC 3264, 896 June 2002. 898 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 899 Jacobson, "RTP: A Transport Protocol for Real-Time 900 Applications", STD 64, RFC 3550, July 2003. 902 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 903 Video Conferences with Minimal Control", STD 65, RFC 3551, 904 July 2003. 906 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 907 Registration Procedures", BCP 13, RFC 4288, December 2005. 909 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 910 Description Protocol", RFC 4566, July 2006. 912 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 913 Formats", RFC 4855, February 2007. 915 [RFC4856] Casner, S., "Media Type Registration of Payload Formats in 916 the RTP Profile for Audio and Video Conferences", 917 RFC 4856, February 2007. 919 [ITU-T G.711] 920 International Telecommunications Union, "Pulse code 921 modulcation (PCM) of Voice Frequencies", ITU- 922 T Recommendation G.711, November 1988. 924 9.2. Informative References 926 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 927 January 2008. 929 [ITU-T G.711 Appendix 1] 930 International Telecommunications Union, "Pulse code 931 modulcation (PCM) of Voice Frequencies, Appendix I: A high 932 quality low-complexity algorithm for packet loss 933 concealment with G.711", ITU-T Recommendation G.711 934 Appendix I, September 1999. 936 Authors' Addresses 938 Yusuke Hiwasaki 939 NTT Corporation 940 3-9-11 Midori-cho, 941 Musashino-shi 942 Tokyo 180-8585 943 Japan 945 Phone: +81(422)59-4815 946 Email: hiwasaki.yusuke@lab.ntt.co.jp 948 Hitoshi Ohmuro 949 NTT Corporation 950 3-9-11 Midori-cho, 951 Musashino-shi 952 Tokyo 180-8585 953 Japan 955 Phone: +81(422)59-2151 956 Email: ohmuro.hitoshi@lab.ntt.co.jp