idnits 2.17.1 draft-hiwasaki-avt-rtp-uemclip-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 801. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 812. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 819. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 825. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o When an offered condition does not fit an answerer's capabilities, it naturally MUST not answer the conditions, and session MAY proceed to re-INVITE, if possible. If a condition (mode) is decided upon, an offerer and an answerer MUST transmit on this condition. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 9, 2007) is 6226 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'Hz' on line 669 ** Obsolete normative reference: RFC 3555 (ref. '5') (Obsoleted by RFC 4855, RFC 4856) ** Obsolete normative reference: RFC 4288 (ref. '6') (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (ref. '7') (Obsoleted by RFC 8866) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Audio/Video Transport Y. Hiwasaki 3 Internet-Draft H. Ohmuro 4 Intended status: Standards Track NTT Corporation 5 Expires: October 11, 2007 April 9, 2007 7 RTP payload format for UEMCLIP speech codec 8 draft-hiwasaki-avt-rtp-uemclip-02 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on October 11, 2007. 35 Copyright Notice 37 Copyright (C) The IETF Trust (2007). 39 Abstract 41 This document describes the RTP payload format of an ITU-T G.711 42 enhanced speech codec, UEMCLIP. The bitstream has a scalable 43 structure with an embedded u-law bitstream, also known as PCMU, thus 44 providing a handy transcoding operation between narrowband and 45 wideband speech. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Media Format Background . . . . . . . . . . . . . . . . . . . 4 52 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 6 53 3.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 6 54 3.2. Multiple frames in an RTP packet . . . . . . . . . . . . . 7 55 3.3. Payload Data . . . . . . . . . . . . . . . . . . . . . . . 7 56 3.3.1. Main Header . . . . . . . . . . . . . . . . . . . . . 7 57 3.3.2. Sub-layer data . . . . . . . . . . . . . . . . . . . . 11 58 4. G.711 interoperability . . . . . . . . . . . . . . . . . . . . 13 59 5. Congestion Control Considerations . . . . . . . . . . . . . . 14 60 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 15 61 6.1. Media type registration . . . . . . . . . . . . . . . . . 15 62 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 16 63 6.2.1. Dynamic transmission definition . . . . . . . . . . . 16 64 6.3. Offer-answer Model Considerations . . . . . . . . . . . . 17 65 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 66 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 67 9. Normative References . . . . . . . . . . . . . . . . . . . . . 21 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 69 Intellectual Property and Copyright Statements . . . . . . . . . . 23 71 1. Introduction 73 This document specifies the payload format for sending UEMCLIP 74 encoded speech using the Real-time Transport Protocol (RTP) [3]. 75 UEMCLIP is an enhanced version of u-law ITU-T G.711, and designed to 76 help the market for smooth transition towards the forthcoming 77 wideband communication environment and while maintaining the 78 interoperability and less transcoding load with the existing 79 terminals, in which the implementation of G.711 is mandatory. 81 The payload format is described in Section 3. The interoperability 82 with G.711 issues are discussed in Section 4. In Section 6.1, a 83 media type registration for UEMCLIP RTP payload format is provided. 85 1.1. Terminology 87 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 88 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 89 document are to be interpreted as described in [1]. 91 2. Media Format Background 93 UEMCLIP stands for "U-law EMbedded Coder for Low-delay IP 94 communication", and is basically an enhanced version of u-law ITU-T 95 G.711, otherwise known as PCMU [5]. It is developed for VoIP (Voice 96 over Internet Protocol) applications, and is especially suitable for 97 wideband multi-point conferencing. The main goal of this codec is to 98 provide a wideband communication platform that is highly 99 interoperable with existing terminals equipped with G.711, and to 100 stimulate the market to gradually shift to the wideband 101 communication. Because the G.711 bitstream is embedded in the 102 bitstream, costly transcoding would be avoided especially when 103 interoperating with narrowband terminals. 105 This document does not discuss the implementation detail of the 106 encoder and decoder, but only describes the bitstream format. The 107 implementation detail will be available by other means. 109 Because of its scalable nature, there are a number of sub-bitstreams 110 (layer data) with in a UEMCLIP bitstream. By choosing appropriate 111 sub-layers, the codec can adapt to the following requirements: 113 o Sampling frequency, 115 o Number of channels, 117 o Speech quality, and 119 o Bit-rate. 121 The current implementation of UEMCLIP codec includes three sub- 122 coders, as shown in Table 1. The core layer is G.711 core, and other 123 two are quality and bandwidth enhancement layers with bit-rate of 16 124 kbit/s each. 126 +-------+---------------------+----------+--------------------------+ 127 | Layer | Description | Bit-rate | Coding algorithm | 128 +-------+---------------------+----------+--------------------------+ 129 | a | G.711 core | 64 | u-law PCM | 130 | | | | | 131 | b | Lower-band | 16 | Time domain block | 132 | | enhancement | | quantization | 133 | | | | | 134 | c | Higher-band | 16 | MDCT block quantization | 135 +-------+---------------------+----------+--------------------------+ 137 Table 1: Sub-layer description 139 Based on these sub-layers, UEMCLIP codec operates in four modes as 140 shown in Table 2. Here, "Fs" is the sampling frequency in kHz. The 141 absent Modes 2 and 5 are reserved for future extension to 32 kHz 142 sampling modes. As the mode definition is expected to grow, any 143 other modes not defined in this table MUST NOT be used for 144 compatibility and interoperability reasons. 146 +------+----+----+-------+-------+-------+-------------+------------+ 147 | Mode | Ch | Fs | Layer | Layer | Layer | Bit-rate | Total | 148 | | | | a | b | c | w/o headers | bit-rate | 149 | | | | | | | [kbps] | [kbps] | 150 +------+----+----+-------+-------+-------+-------------+------------+ 151 | 0 | 1 | 8 | x | - | - | 64 | 68.8 | 152 | | | | | | | | | 153 | 1 | 1 | 16 | x | - | x | 80 | 85.6 | 154 | | | | | | | | | 155 | 2 | - | - | - | - | - | - | - | 156 | | | | | | | | | 157 | 3 | 1 | 8 | x | x | - | 80 | 85.6 | 158 | | | | | | | | | 159 | 4 | 1 | 16 | x | x | x | 96 | 102.4 | 160 | | | | | | | | | 161 | 5 | - | - | - | - | - | - | - | 162 +------+----+----+-------+-------+-------+-------------+------------+ 164 Table 2: Mode description 166 UEMCLIP bitstream contains internal headers and other side- 167 information apart from the layer data. This results in total bit- 168 rate larger than the sum of the layers shown in the above table. The 169 detail of the internal headers and auxiliary information are 170 described in Section 3.3.1. 172 Defining the sampling frequency and the number of channels does not 173 result in a singular mode, i.e., there can be multiple modes for the 174 same sampling frequency or number of channels. The supported modes 175 would differ from the implementations, thus the sender and the 176 receiver must exchange what mode to use for transmission. 178 3. Payload Format 180 As an RTP payload, UEMCLIP bitstream can contain one or more frames 181 as shown in Figure 1. 183 0 1 2 3 184 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 186 | RTP Header | 187 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 188 | | 189 | one or more frames of UEMCLIP | 190 | | 191 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 193 Figure 1: RTP payload format 195 UEMCLIP bitstream has a scalable structure, thus it is possible to 196 reconstruct the signal by decoding a part of it. A UEMCLIP frame is 197 composed of a main header followed by one or more sub-layers. As a 198 sub-layer, the core layer, i.e., "Layer a", MUST always be included. 199 It should be noted that the location of the base layer may not be 200 located at the top. The decoder MUST always refer to the layer ID 201 for proper decoding. The bitstream, for the case of enhancement 202 header with length 0, is shown in Figure 2, where sub-layer #1 can be 203 any arbitrary sub-layer data. 205 +--+-------+-------+-------+-------+//-+ 206 |MH| SD #1 | SD #2 | SD #3 | SD #4 |...| 207 | | | | | | | 208 +--+-------+-------+-------+-------+//-+ 210 Figure 2: A UEMCLIP frame (bitstream format) 212 The UEMCLIP bitstream does not include the following information: a) 213 the codec type, b) Mode, c) I/O sampling frequency, and d) encoder 214 version. As described before, these information SHOULD be exchanged 215 while establishing a connection, for example, by means of SDP. 217 3.1. RTP Header Usage 219 Each RTP packet starts with a fixed RTP header, as explained in [3]. 220 The following fields of the RTP fixed header used specifically for 221 UEMCLIP streams are emphasized: 223 Payload type: The assignment of an RTP payload type for this packet 224 format is outside the scope of this document, however, it is 225 expected that a payload type in the dynamic range shall be 226 assigned. 228 Timestamp: This encodes the sampling instant of the first speech 229 signal sample in the RTP data packet. For UEMCLIP streams, the 230 RTP timestamp MUST be a multiple of 8 kHz, and in case the 231 sampling rate can change during a session, this figure should 232 equal to the maximum rate (in Hz) given in Table 2 . 234 Marker bit: If the codec is used for applications with discontinuous 235 transmission (DTX, or silence compression), the first packet after 236 a silence period during which packets have not been transmitted 237 contiguously SHOULD have the marker bit in the RTP data header set 238 to one. The marker bit in all other packets MUST be zero. 239 Applications without DTX MUST set the marker bit to zero. 241 3.2. Multiple frames in an RTP packet 243 More than one UEMCLIP frame may be included in a single RTP packet by 244 a sender. However, senders have the following additional 245 restrictions: 247 o SHOULD NOT include more UEMCLIP frames in a single RTP packet than 248 will fit in the MTU of the RTP transport protocol. 250 o All frames contained in a single RTP packet MUST be of the same 251 length, i.e., they MUST have the same bit rate (octets per frame). 253 o Frames MUST NOT be split between RTP packets. 255 It is RECOMMENDED that the number of frames contained within an RTP 256 packet be consistent with the application. Since UEMCLIP is designed 257 form a telephony application where delay is important, then the fewer 258 frames per packet the lower the delay, is preferable. 260 3.3. Payload Data 262 3.3.1. Main Header 264 The main header (MH) is placed at the top of a payload and has size 265 of 10 bytes with additional optional enhanced header size. The 266 content of the main header is defined in Figure 3. 268 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | ID | BS | MX | 271 | | | | 272 |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5|0 1 2 3 4 5 6 7| 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | PC | 275 | | 276 |0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 | PC(cont'd) | ES | EH | 279 | | | (if exists) | 280 |2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7| ... 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+ 283 Figure 3: UEMCLIP main header format (MH) 285 Identification (ID): 8 bits 287 The value should be "0x95". 289 Byte size (BS): 16 bits 291 Indicates the byte size of the following UEMCLIP payload. This 292 means that the RTP header size, ID and BS are not included. It is 293 encoded in network byte-order. 295 Mixing information (MX): 8 bits 297 Mixing information field. 299 Packet-loss Concealment information (PC): 40 bits 301 Packet-loss concealment (PLC) information field. 303 Enhanced-header Size (ES): 8 bits 305 Size of EH (enhanced header) in bytes. 307 Enhanced header (EH): 8*ES bits 309 Content of the enhanced header. When ES is 0, the enhanced header 310 is non-existent. 312 3.3.1.1. Mixing information field 314 0 1 2 3 4 5 6 7 315 +-+-+-+-+-+-+-+-+ 316 |C|R|V| PW1 | 317 |1|1|1| | 318 | | | |0 1 2 3 4| 319 +-+-+-+-+-+-+-+-+ 321 Figure 4: Mixing information field (MX) 323 Check bit #1 (C1): 1 bit 325 Validity flag of V1 and PW1. This bit being "1" indicates that 326 both parameters are valid, and "0" indicates that the parameters 327 should be ignored. 329 Reserved bit #1 (R1): 1 bit 331 This bit should be ignored. 333 VAD flag #1 (V1): 1 bit 335 Voice activity detection flag of the current frame. This flag 336 being "1" indicates that the frame is an active (voice) segment, 337 and "0" indicates that it is an inactive (non-voice) or a silent 338 segment. 340 Power #1 (PW1): 5 bits 342 Signal power code of the current frame. 344 3.3.1.2. PLC information field 346 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 |C|C|R|V| K |R| P1 |R| P2 | PW2 | 349 |2|3|2|2| |3| |4| | | 350 | | | | |0 1 2 3| |0 1 2 3 4 5 6| |0 1 2 3 4 5 6|0 1 2 3 4 5 6 7| 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 352 | R5 | 353 | | 354 |0 1 2 3 4 5 6 7| 355 +-+-+-+-+-+-+-+-+ 357 Figure 5: PLC information field (PC) 359 Check bit #2 (C2): 1 bit 361 Validity flag of V2, K, P1, P2, and PW2. If the flag is "1", it 362 means that all these parameters are valid, and "0" means that the 363 parameters should be ignored. If any of these parameters is 364 invalid, C1 should be set to "0". 366 Check bit #3 (C3): 1 bit 368 Payload validity indicator. This flag is normally set to "0". If 369 a received packet has this flag set to "1", the payload data 370 should be ignored and packet-loss concealment should be performed 371 by the receiver. This flag is used in case of a multi-point 372 conferencing, where the upstream packet was lost and the mixing 373 server did not execute packet-loss concealment. 375 Reserved bit #2 (R2): 1 bit 377 This bit should be ignored. 379 VAD flag #2 (V2): 1 bit 381 Voice activity detection flag of the current frame. This may be 382 as same as V1 in the mixing information. 384 Frame indicator (K): 4 bits 386 This value indicates the frame offset of P2 and PW2. Since it is 387 a better idea to carry the pitch and power parameters as PLC 388 information in a different frame, this frame offset value gives 389 which frame the parameters are to be associated with. Since there 390 are 4 bits allocated, it ranges between "0" and "15". 392 Reserved bit #3 (R3): 1 bit 394 This bit should be ignored. 396 Pitch lag #1 (P1): 7 bits 398 Pitch code of the current frame. The actual pitch lag is 399 calculated as P1+20 samples in 8-kHz sampling rate. Pitch lag 400 must be 20 <= pitch length <= 120. Codes ranging between "0x65" 401 and "0x7F" are not used. 403 Reserved bit #4 (R4): 1 bit 404 This bit should be ignored. 406 Pitch lag #2 (P2): 7 bits 408 Pitch code of the offset frame. The actual pitch lag is 409 calculated as P2+20 samples in 8-kHz sampling rate. Pitch lag 410 must be 20 <= pitch length <= 120. Codes ranging between "0x65" 411 and "0x7F" are not used. The offset value is defined as K. 413 Power #2 (PW2): 8 bits 415 Signal power code of the offset frame. The offset value is 416 defined as K. 418 Reserved bits #5 (R5): 8 bits 420 These bits should be ignored. 422 3.3.2. Sub-layer data 424 Sub-layer data (SD) is a sub-header followed by layer bitstreams, as 425 shown in Figure 6. The sub-header indicates the layer location and 426 the number of bytes. 428 0 1 2 429 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 . . . 430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+ 431 | CI| FI| QI| R6| SB | LD ... | 432 | | | | | | | 433 |0 1|0 1|0 1|0 1|0 1 2 3 4 5 6 7| | 434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+ 436 Figure 6: Sub-layer format (SD) 438 Channel index (CI): 2 bits 440 Indicates the channel number. For all modes given in Table 2, 441 this should be "0x1". The detail is given in Table 3. 443 Frequency index (FI): 2 bits 445 Indicates the frequency number. "0" means that the layer is in the 446 base frequency band, higher number means that the layer is in 447 respective frequency band. The detail is given in Table 3. 449 Quality index (QI): 2 bits 451 Indicates the quality layer number. "0" means that the layer is in 452 the base layer, and higher number means that the layer is in 453 respective quality layer. The detail is given in Table 3. 455 Reserved #6 (R6): 2 bits 457 Not used (reserved). The value must be "0". 459 Sub-layer Size (SB): 8 bits 461 Indicates the byte size of the following sub-layer data. 463 Layer Data (LD): SB*8 bits 465 The actual sub-layer data. 467 3.3.2.1. Layer index encoding 469 The layer index is encoded using values of channel number, quality 470 number, and frequency-band number encoded with 2-bits each, in the 471 appearing order. The last 2 bits are reserved for future use, and 472 all implementation should ignore this field. For all the layers 473 shown in Table 1, the layer indices are shown in Table 3. 475 +-------+----+----+----+ 476 | Layer | CI | FI | QI | 477 +-------+----+----+----+ 478 | a | 0 | 0 | 0 | 479 | | | | | 480 | b | 0 | 0 | 1 | 481 | | | | | 482 | c | 0 | 1 | 0 | 483 +-------+----+----+----+ 485 Table 3: Layer indices 487 4. G.711 interoperability 489 As given in Section 2, u-law encoded G.711 bitstream (Layer a) is the 490 core layer of a UEMCLIP bitstream, and is always embedded. This 491 means that transcoding from UEMCLIP bitstream to G.711 does not have 492 to undergo decoding and re-encoding procedures, but simple extraction 493 would only suffice. However, this does not apply for the reverse 494 procedure, i.e., transcoding from G.711 to UEMCLIP, because the side 495 information in the main header must be assigned separately. 497 The transcoding from UEMCLIP to u-law G.711 can be done easily by 498 finding an appropriate sub-layer. The transcoder should look for a 499 sub-layer with the layer index of 0x00, and subsequent LD which has 500 size of SB*8 bits (usually for 20 ms frame, SB=160) are the actual 501 G.711 bitstream data. It should be noted that transcoder should not 502 always expect the core layer to be located right after the main 503 header. 505 On the other hand, the transcoding from G.711 to UEMCLIP is not 506 entirely straight-forward. Since there are no means to generate 507 enhancement sub-layers, a G.711 bitstream can only be converted to 508 UEMCLIP Mode 0 bitstream. If the original G.711 bitstream is encoded 509 in A-law, it should first be converted to u-law to become the core 510 layer. Because the default packetization size is 20 ms, u-law 511 encoded G.711 bitstream MUST be a 160-sample chunk. For the main 512 header contents, when the UEMCLIP encoder is not available, it should 513 follow the following guidelines. 515 o ID must be set "0x95". 517 o Byte size (BS) should be set 7 bytes of the main header, plus sub- 518 header size (2) added with number of samples in G.711 (SB) . 520 o The enhanced-header size (ES) set to "0x00". 522 o The check bit for mixing and PLC (C1 and C2) should be set 0. 524 o The payload validity indicator (C3) should be set 0. 526 For the core layer (i.e., u-law G.711 bitstream), it should have the 527 following sub-layer header: 529 o All CI, FI, QI, R6 MUST be 0. 531 o Sub-layer size (SB) MUST be 160 for 20 ms frame. 533 5. Congestion Control Considerations 535 The general congestion control considerations for transporting RTP 536 data apply to UEMCLIP over RTP [3] as well as any applicable RTP 537 profile like AVP [4]. UEMCLIP does not have any built-in mechanism 538 for reducing the bandwidth. Packing more frames in each RTP payload 539 can reduce the number of packets sent, and hence the overhead from 540 IP/UDP/RTP headers, at the expense of increased delay and reduced 541 error robustness against packet losses. 543 6. Payload Format Parameters 545 6.1. Media type registration 547 This registration is done using the template defined in [6] and 548 following [5]. 550 MIME media type name: audio 552 MIME media subtype name: UEMCLIP 554 Required parameters: Mode information: this defines bit-rate, 555 sampling frequency and layer structure of the bitstream. This 556 parameter is necessary because the this is not signaled within the 557 bitstream. 559 Optional parameters: none. 561 Encoding considerations: This type is defined for transferring 562 UEMCLIP-encoded data via RTP using the payload format specified in 563 Section 3 "Payload Format". Audio data is binary data and must be 564 encoded for non-binary transport; the Base64 encoding is suitable 565 for e-mail. 567 Security considerations: See Section 7 "Security Considerations" of 568 this document. 570 Interoperability considerations: This media is interoperable with 571 u-law encoded ITU-T G.711. see Section 4 "G.711 interoperability" 572 of this document. 574 Published specification: (T.B. assigned) 576 Applications that use this media type: Audio and video streaming and 577 conferencing tools. 579 Additional information: None 581 Intended usage: COMMON 583 Person & email address to contact for further information: Yusuke 584 Hiwasaki 586 Author/Intended change controller: 588 Author: Yusuke Hiwasaki 590 Intended Change Controller: IETF Audio/Video Transport Working 591 Group delegated from the IESG 593 6.2. Mapping to SDP Parameters 595 Payload type: Since it is not registered in [4], any RTP packets 596 that carry UEMCLIP as payload type MUST be treated as a dynamic 597 payload type. 599 Codec name: MIME registered codec name should be used. 601 Sampling Frequency: Depending on the mode to communicate, sampling 602 frequency MUST be selected from the ones defined in Table 2. 604 Channel numbers: It SHOULD default to "1", as selected from the ones 605 defined in Table 2. 607 Packet intervals: Since frame length of any UEMCLIP is 20 ms, when 608 specifying a=ptime line, the argument MUST be a multiple of "20". 609 When not listed in SDP, it should also default to the minimum 610 size: "20". 612 Bandwidth: As described in [7], bandwidth line is OPTIONAL. When 613 there is no bandwidth restrictions, the numbers MUST be the 614 largest value out of the Table 2, and the unit should be "kbit/s" 615 with the fraction raised to the unit, including header overheads 616 down to Layer 3. If any restrictions apply, then the value MUST 617 be the largest of the Table 2 that satisfy the restriction, by the 618 same calculation procedure. It MUST NOT encode with bit-rate 619 larger than the answered bit-rate bandwidth. 621 UMECLIP specific: Any description specific to UEMCLIP are defined in 622 the Format Specification Parameters (fmtp). Each parameters MUST 623 be separated with ";", and if any attributes (value) exists, it 624 MUST be defined with "+". For compatibility reasons, any 625 application/terminal MUST ignore any parameters that does not 626 appear below. This is to ensure the upper-compatibility with 627 later added parameters for the future enhancements. 629 6.2.1. Dynamic transmission definition 631 Since UEMCLIP codec can operate in number of modes, it is desirable 632 to specify the range of modes that an encoder or a decoder can 633 operate at. 635 UEMCLIP decoders are designed to accept bitstreams in any modes. 637 However, the implementation limitation may fail to adopt to the 638 dynamic bit-rate change. Thus introduced here is two concepts: 639 "dynamic mode" (denoted as "dynmode"), where the dynamic mode (bit- 640 rate) change is allowed, and "fixed mode" (denoted as "fixmode"), 641 where the change is not permitted. Both modes MUST be used 642 exclusively. 644 "fixmode" is used to specify no modification of the operating mode 645 (bit-rate) during the session. It MUST operate exclusively to 646 "dynmode". It should specify the possible combination of mode 647 numbers, delimited by commas ",". When offering a "fixmode", the 648 offerer SHOULD list the mode numbers in descending priority order. 649 The answerer MUST select a single suitable mode number and reply as 650 "fixmode" with one argument. 652 On the other hand, "dynmode" is used to allow modification of the 653 operating mode during the session. It MUST operate exclusively to 654 "fixmode". The offerer should specify the possible combination of 655 mode numbers, delimited by commas ",". The answerer can either 656 select a number of suitable modes and reply as "dynmode" in the same 657 manner, or select a single suitable mode number and reply as 658 "fixmode" with one argument. 660 The mode numbers that can be specified as arguments to "fixmode" or 661 "dynmode" are restricted by a combination of a sampling frequency and 662 a number of audio channels, as shown in Table 2. This is because SDP 663 binds a payload type to a combination of a sampling frequency and a 664 number of audio channels. When a "fixmode" or "dynmode" is not 665 given, it MUST be interpreted as being defaulting to the fixed mode 666 ("fixmode") and MUST use the default value specified in Table 4. 668 +---------+----------+------------------+--------------+ 669 | Fs [Hz] | Channels | Selectable modes | Default mode | 670 +---------+----------+------------------+--------------+ 671 | 8000 | 1 | 0,3 | 0 | 672 | | | | | 673 | 16000 | 1 | 1,4 | 1 | 674 +---------+----------+------------------+--------------+ 676 Table 4: Default modes 678 6.3. Offer-answer Model Considerations 680 The procedures related to exchanging SDP messages MUST follow [2]. 682 o When multiple UEMCLIP dynamic payload type number is offered, an 683 answerer SHOULD select a single payload type number, i.e., one 684 sampling frequency and channel condition. 686 o The ptime SHOULD be 20. 688 o An offerer SHOULD offer every possible combination of sampling 689 frequency, channel number, and fmtp parameters including dynamic/ 690 fixed mode. When the transmission bandwidth is restricted, it 691 MUST be offered in accordance to the restriction. 693 o When offering/answering SDP, any fmtp parameters which are 694 undefined MUST be ignored. If any unknown/undefined parameters 695 should be offered, an answerer MUST delete the entry from the 696 answer message. In this case, the offerer MUST use the default 697 value for any deleted parameters. 699 o If a dynamic mode ("dynmode") is offered, an answerer MUST select 700 either "dynmode" or "fixmode", according to ones capabilities. 701 When fixed mode ("fixmode") is offered, an answerer MUST only 702 answer "fixmode". In the case of answering fixed mode 703 ("fixmode"), answerer MUST select a single mode out of offered 704 mode, regardless of dynamic/fixed mode specification. If a mode 705 is not offered at all, the session MUST default to fixed mode, and 706 the default mode value, as shown in Table 4, MUST be used, based 707 on the sampling frequency and number of channels specified 708 elsewhere. 710 o When an offered condition does not fit an answerer's capabilities, 711 it naturally MUST not answer the conditions, and session MAY 712 proceed to re-INVITE, if possible. If a condition (mode) is 713 decided upon, an offerer and an answerer MUST transmit on this 714 condition. 716 7. Security Considerations 718 RTP packets using the payload format defined in this specification 719 are subject to the security considerations discussed in the RTP 720 specification [3] and any appropriate profiles. This implies that 721 confidentiality of the media streams is achieved by encryption. 723 A potential denial-of-service threat exists for data encoding using 724 compression techniques that have non-uniform receiver-end 725 computational load. The attacker can inject pathological datagrams 726 into the stream that are complex to decode and cause the receiver 727 output to become overloaded. However, UEMCLIP covered in this 728 document do not exhibit any significant non-uniformity. 730 Another potential threats are memory attacks by illegal layer indices 731 or byte numbers. The implementor of the decoder should always be 732 aware that the indicated numbers may be corrupted and does not point 733 to the right sub-layer or the allows reading beyond the bitstream 734 boundaries. 736 8. IANA Considerations 738 It is requested that one new media subtype (audio/UEMCLIP) is 739 registered by IANA. For details, see Section 6.1. 741 9. Normative References 743 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 744 Levels", BCP 14, RFC 2119, March 1997. 746 [2] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 747 Session Description Protocol (SDP)", RFC 3264, June 2002. 749 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 750 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 751 RFC 3550, July 2003. 753 [4] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 754 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 756 [5] Casner, S. and P. Hoschka, "MIME Type Registration of RTP 757 Payload Formats", RFC 3555, July 2003. 759 [6] Freed, N. and J. Klensin, "Media Type Specifications and 760 Registration Procedures", BCP 13, RFC 4288, December 2005. 762 [7] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 763 Description Protocol", RFC 4566, July 2006. 765 Authors' Addresses 767 Yusuke Hiwasaki 768 NTT Corporation 769 3-9-11 Midori-cho, 770 Musashino-shi 771 Tokyo 180-8585 772 Japan 774 Phone: +81(422)59-4815 775 Email: hiwasaki.yusuke@lab.ntt.co.jp 777 Hitoshi Ohmuro 778 NTT Corporation 779 3-9-11 Midori-cho, 780 Musashino-shi 781 Tokyo 180-8585 782 Japan 784 Phone: +81(422)59-2151 785 Email: ohmuro.hitoshi@lab.ntt.co.jp 787 Full Copyright Statement 789 Copyright (C) The IETF Trust (2007). 791 This document is subject to the rights, licenses and restrictions 792 contained in BCP 78, and except as set forth therein, the authors 793 retain all their rights. 795 This document and the information contained herein are provided on an 796 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 797 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 798 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 799 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 800 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 801 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 803 Intellectual Property 805 The IETF takes no position regarding the validity or scope of any 806 Intellectual Property Rights or other rights that might be claimed to 807 pertain to the implementation or use of the technology described in 808 this document or the extent to which any license under such rights 809 might or might not be available; nor does it represent that it has 810 made any independent effort to identify any such rights. Information 811 on the procedures with respect to rights in RFC documents can be 812 found in BCP 78 and BCP 79. 814 Copies of IPR disclosures made to the IETF Secretariat and any 815 assurances of licenses to be made available, or the result of an 816 attempt made to obtain a general license or permission for the use of 817 such proprietary rights by implementers or users of this 818 specification can be obtained from the IETF on-line IPR repository at 819 http://www.ietf.org/ipr. 821 The IETF invites any interested party to bring to its attention any 822 copyrights, patents or patent applications, or other proprietary 823 rights that may cover technology that may be required to implement 824 this standard. Please address the information to the IETF at 825 ietf-ipr@ietf.org. 827 Acknowledgment 829 Funding for the RFC Editor function is provided by the IETF 830 Administrative Support Activity (IASA).