idnits 2.17.1 draft-ietf-avt-vmr-wb-file-format-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 521. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 492. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 499. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 505. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 7) being 76 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 18, 2004) is 7130 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '3' is defined on line 444, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 3267 (ref. '4') (Obsoleted by RFC 4867) Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio Video Transport WG Sassan Ahmadi 2 INTERNET-DRAFT Nokia Inc. 3 Expires: April 18, 2005 October 18, 2004 5 Storage File Format for the Variable-Rate Multimode Wideband (VMR-WB) 6 Audio Codec 7 9 Status of this Memo 11 By submitting this Internet-Draft, I certify that any applicable 12 patent or other IPR claims of which I am aware have been disclosed, 13 and any of which I become aware will be disclosed, in accordance 14 with RFC 3668. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as 19 Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other 23 documents at any time. It is inappropriate to use Internet-Drafts 24 as reference material or to cite them other than as "work in 25 progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This document is a submission of the IETF AVT WG. Comments should 34 be directed to the AVT WG mailing list, avt@ietf.org. 36 Abstract 38 This document specifies a file format for the storage of 39 variable-rate multimode wideband (VMR-WB) speech codec. A MIME type 40 registration is included for VMR-WB files. 42 VMR-WB is a variable-rate multimode wideband speech codec that has a 43 number of operating modes, one of which is interoperable with AMR-WB 44 (i.e., RFC 3267) audio codec at certain rates. Therefore, provisions 45 have been made in this draft to facilitate retrieval of VMR-WB 46 stored data (generated in the interoperable mode) by AMR-WB decoder. 48 Table of Contents 50 1.Introduction.................................................2 51 2.Conventions and Acronyms.....................................2 52 3.Overview of VMR-WB. ............ ............................3 53 4. VMR-WB File Format..........................................4 54 4.1. Single Channel Header..................................4 55 4.2. Multi-Channel Header...................................4 56 4.3. Speech Frames..........................................5 57 5. Security Considerations.....................................6 58 6. VMR-WB File Format MIME Registration........................7 59 7. IANA Considerations.........................................9 60 8. Acknowledgements............................................9 61 References.....................................................9 62 Normative References........................................9 63 Informative References......................................9 64 Author's Address...............................................9 65 IPR Notice....................................................10 66 Copyright Notice..............................................10 68 1. Introduction 70 This document specifies a file format for storage of VMR-WB encoded 71 Speech/audio data. The VMR-WB file format supports single and 72 multi-channel storage. It further facilitates decoding of VMR-WB 73 generated files by AMR-WB decoder [4]. 75 The file format is specified in Section 4. A MIME type registration 76 for VMR-WB file format is provided in Section 6. 78 The VMR-WB RTP payload formats have been specified in a separate 79 document [2]. 81 To ensure coherence with RFC YYYY [2], common tables and parameters 82 are not defined in this document, rather corresponding tables and 83 parameters of [2] are referenced. 85 2. Conventions and Acronyms 87 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 88 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 89 in this document are to be interpreted as described in RFC2119 [2]. 91 The following acronyms are used in this document: 93 3GPP2 - The Third Generation Partnership Project 2 94 CDMA - Code Division Multiple Access 95 AMR-WB - Adaptive Multi-Rate Wideband Codec 96 VMR-WB - Variable-Rate Multimode Wideband Codec 97 MIME - Multipurpose Internet Mail Extension 99 The term "interoperable mode" in this document refers to VMR-WB 100 mode 3, which is interoperable with AMR-WB codec modes 0, 1, and 2. 102 The term "non-interoperable modes" in this document refers to 103 VMR-WB modes 0, 1, and 2. 105 The term "frame-block" is used in this document to describe the 106 time-synchronized set of speech frames in an N-channel storage 107 scenario. A frame-block will contain N speech frames, one from each 108 of the channels, and all N speech frames represent exactly the same 109 time period. 111 3. Overview of VMR-WB 113 VMR-WB is the wideband speech-coding standard developed by Third 114 Generation Partnership Project 2 (3GPP2) for encoding/decoding 115 wideband/narrowband speech content in multimedia services in 3G CDMA 116 cellular systems [1,2]. It has a number of operating modes, where 117 each mode is a tradeoff between voice quality and average data rate. 119 While VMR-WB is a native CDMA codec complying with all CDMA system 120 requirements, it is further interoperable with AMR-WB [4] at 12.65, 121 8.85, and 6.60 kbps. 123 VMR-WB by default is a wideband codec operating with 16000 Hz 124 sampled media (i.e., speech or audio); however, it is further 125 capable of processing 8000 Hz sampled media in all modes of 126 operation [1]. The VMR-WB decoder does not require a priori 127 knowledge about the sampling rate of the original media (i.e., 128 speech/audio signals sampled at 8 or 16 kHz) at the input of the 129 encoder. 131 The VMR-WB decoder, by default, generates 16000 Hz wideband output 132 Regardless of the encoder input sampling frequency, unless 133 instructed otherwise. 135 4. VMR-WB File Format 137 The storage format is used for storing VMR-WB encoded speech 138 frames in a file or as an e-mail attachment. Multiple channel 139 content is also supported. The storage format described in section 140 is fully consistent with the one described in Section 8.5 of [1]. 142 Note: The storage format described in this document uses several 143 magic numbers to differentiate between interoperable and 144 non-interoperable modes of VMR-WB as well as single and 145 multi-channel files. This may be accomplished in other ways that are 146 simpler and more straightforward that one should consider in design 147 of future storage formats. The use of different magic numbers and 148 file extensions for the files generated by the interoperable and 149 non-interoperable modes of VMR-WB enables a file reader to decide if 150 it is capable of decoding the content without opening the file or 151 attempting to decode the content. 153 In general, VMR-WB file has the following structure: 155 +------------------+ 156 | Header | 157 +------------------+ 158 | Speech frame 1 | 159 +------------------+ 160 : ... : 161 +------------------+ 162 | Speech frame n | 163 +------------------+ 165 4.1. Single channel Header 167 A single channel VMR-WB file header contains only a magic number. 169 The magic number for single channel VMR-WB files containing 170 speech data generated in the non-interoperable modes; i.e., 171 VMR-WB modes 0, 1, or 2, MUST consist of ASCII character 172 string 174 "#!VMR-WB\n" 175 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in hexadecimal). 177 Note, the "\n" is an important part of the magic numbers and 178 MUST be included in the comparison; otherwise, the single 179 channel magic number above will become indistinguishable from 180 that of the multi-channel file defined in the next section. 182 The magic number for single channel VMR-WB files containing 183 speech data generated in the interoperable mode; i.e., VMR-WB 184 mode 3, MUST consist of ASCII character string 186 "#!VMR-WB_I\n" 187 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a in 188 hexadecimal). 190 In the interoperable mode, a file generated by VMR-WB is decodable 191 with AMR-WB (with the exception of different magic numbers). 192 However, VMR-WB can only decode AMR-WB codec modes 0, 1, and 2. 194 The AMR-WB single channel magic number and AMR-WB file extension 195 [4] can also be used to store speech data generated by VMR-WB 196 encoder operating in the interoperable mode to facilitate decoding 197 of the file by an AMR-WB decoder. Since VMR-WB decoder is only 198 capable of decoding certain AMR-WB codec modes, it MUST be ensured 199 that only supported codec modes of AMR-WB are presented to the 200 VMR-WB decoder. 202 4.2. Multi-channel Header 204 The multi-channel header consists of a magic number followed 205 by a 32-bit channel description field, giving the multi-channel 206 header the following structure: 208 +----------------------------+ 209 | Magic Number | 210 +----------------------------+ 211 | Channel Description Field | 212 +----------------------------+ 214 The magic number for multi-channel VMR-WB files containing 215 speech data generated in the non-interoperable modes; i.e., 216 VMR-WB modes 0, 1, or 2, MUST consist of the ASCII character string 218 "#!VMR-WB_MC1.0\n" 219 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x31 220 0x2E 0x30 0x0a in hexadecimal). 222 The version number in the magic numbers refers to the version 223 of the file format. 225 The magic number for multi-channel VMR-WB files containing 226 speech data generated in the interoperable mode; i.e., VMR-WB 227 mode 3, MUST consist of the ASCII character string 229 "#!VMR-WB_MCI1.0\n" 230 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x49 231 0x31 0x2E 0x30 0x0a in hexadecimal). 233 The 32-bit channel description field is defined as 235 0 1 2 3 236 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 238 | Reserved bits | CHAN | 239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 Reserved bits: MUST be set to 0 when written, and a reader 242 MUST ignore them. 244 CHAN (4 bit unsigned integer): Indicates the number of audio 245 channels contained in this storage file. The valid values and 246 the order of the channels within a frame-block are specified 247 in Section 4.1 in [5]. 249 The AMR-WB multi-channel magic number and AMR-WB file extension [4] 250 can also be used to store speech data generated by VMR-WB encoder 251 operating in the interoperable mode to facilitate decoding of the 252 file by an AMR-WB decoder. Since VMR-WB decoder is only capable of 253 decoding certain AMR-WB codec modes, it MUST be ensured that only 254 supported codec modes of AMR-WB are presented to the VMR-WB decoder. 256 4.3. Speech Frames 257 After the file header, speech frame-blocks consecutive in time are 258 stored in the file. Each frame-block contains a number of 259 octet-aligned speech frames equal to the number of channels, and 260 stored in increasing order, starting with channel 1. Each stored 261 speech frame starts with a one-octet frame header with the following 262 format: 264 0 1 2 3 4 5 6 7 265 +-+-+-+-+-+-+-+-+ 266 |P| FT |Q|P|P| 267 +-+-+-+-+-+-+-+-+ 269 The FT field is defined in Table 3 of [2]. Padding bits MUST be 270 set to zero and MUST be ignored by a receiver. 272 Q (1 bit): Frame quality indicator. If set to 0, indicates 273 the corresponding frame is corrupted. The VMR-WB encoder 274 always sets Q bit to 1. The VMR-WB decoder may ignore the Q bit. 276 Following this one octet header, the speech bits are placed 277 as defined in Section 6.3.4 of [2]. The last octet of each frame is 278 padded with zeroes, if needed, to achieve octet alignment. 280 The following example shows a VMR-WB speech frame encoded at 281 Half-Rate (with 124 speech bits) in the storage format. 283 0 1 2 3 284 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 |0| FT=4 |1|0|0| | 287 +-+-+-+-+-+-+-+-+ + 288 | | 289 + Speech bits for frame-block n, channel k + 290 | | 291 + + 292 | | 293 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 294 | |0|0|0|0| 295 +-+-+-+-+-+-+-+-+ 297 Frame-blocks or speech frames lost in transmission MUST be stored as 298 Erasure/SPEECH_LOST (FT=14) and non-received frame-blocks between 299 SID updates during non-speech periods (when using DTX) MUST be 300 stored as Blank/NO_DATA frames (FT=15) in complete frame-blocks to 301 maintain synchronization with the original media. 303 5. Security Considerations 305 This document specifies a file format only, not a streaming protocol 306 payload format, nor a transfer method. As such, it introduces no 307 security risks in addition to those associated with any audio codec 308 or media file format (e.g., denial of service by transmitting a 309 file larger than the receiver can handle). Note that those security 310 concerns should be understood before using the file format specified 311 here. Clearly it is possible to author malicious files in order to 312 attack a receiver. However, clients can and usually do protect 313 themselves against this kind of attack. 315 There is currently no provision in the standards for encryption, 316 signing, or authentication of this file format. However, depending 317 on the application, external mechanisms can be used to provide 318 privacy, authentication, and protection against authorized 319 use or distribution of the media. 321 6. VMR-WB File Format MIME Registration 323 This section defines the parameters that may be used to select 324 optional features in the VMR-WB storage format. 326 The parameters are defined here as part of the MIME subtype 327 registration for the VMR-WB file format. 329 The MIME subtype for the Variable-Rate Multimode Wideband 330 (VMR-WB) audio codec is allocated from the IETF tree. This MIME 331 registration covers non-real-time transfers via stored files. 333 Note, the receiver MUST ignore any unspecified parameter and 334 use the default values instead. 336 Media Type name: audio 338 Media subtype name: VMR-WB-FILE 340 Required parameters: none 342 Note that if no input parameters are defined, the default values 343 will be used. 345 OPTIONAL file format parameters: 347 mode-set: see RFC YYYY [2] 349 channels: see RFC YYYY [2] 351 Encoding considerations: 353 This type is defined for transfer of VMR-WB data using the 354 file format specified in Section 4 of RFC XXXX. The stored 355 file format is binary data and must be encoded for non-binary 356 transport; the Base64 encoding is suitable in many cases. 358 Security considerations: 360 See Section 5 of this document. 362 Public specification: 364 The VMR-WB speech codec is specified in following 365 3GPP2 specifications C.S0052-0 version 1.0. 366 File format is specified in RFC XXXX. 367 Transfer methods are specified in RFC YYYY. 369 Additional information: 371 Magic numbers: 373 Single channel (for the non-interoperable modes) 374 ASCII character string "#!VMR-WB\n" 375 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in 376 hexadecimal) 378 Single channel (for the interoperable mode) 379 ASCII character string "#!VMR-WB_I\n" 380 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a 381 in hexadecimal) 383 Multi-channel (for the non-interoperable modes) 384 ASCII character string "#!VMR-WB_MC1.0\n" 385 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 386 0x31 0x2E 0x30 0x0a in hexadecimal) 388 Multi-channel (for the interoperable mode) 389 ASCII character string "#!VMR-WB_MCI1.0\n" 390 (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 391 0x49 0x31 0x2E 0x30 0x0a in hexadecimal) 393 File extensions for the non-interoperable modes: vmr, VMR 394 Macintosh file type code: none 395 Object identifier or OID: none 397 File extensions for the interoperable mode: vmi, VMI 398 Macintosh file type code: none 399 Object identifier or OID: none 401 Person & email address to contact for further information: 403 Sassan Ahmadi, Ph.D. Nokia Inc. USA 404 sassan.ahmadi@nokia.com 406 Intended usage: COMMON. 408 This file format is expected to be widely used in Internet email 409 user agents, multimedia authoring and playing software, and 410 CDMA2000 mobile terminals. 412 Author/Change controller: 414 IETF Audio/Video Transport working group delegated from the IESG 416 7. IANA Considerations 418 It is requested that one new MIME subtype (audio/VMR-WB-FILE) is 419 registered by IANA, see Section 6. 421 8. Acknowledgements 423 The author would like to thank Redwan Salami of VoiceAge 424 Corporation, Ari Lakaniemi of Nokia Inc., and IETF/AVT chairs Colin 425 Perkins and Magnus Westerlund for their technical comments 426 to improve this document. 428 Also, the author would like to acknowledge that some parts of 429 RFC 3267 [4] has been used in this document. 431 References 433 Normative References 435 [1] 3GPP2 C.S0052-0 v1.0 "Source-Controlled Variable-Rate 436 Multimode Wideband Speech Codec (VMR-WB) Service Option 437 62 for Spread Spectrum Systems", 3GPP2 Technical Specification, 438 July 2004. 440 [2] S. Ahmadi, "Real-Time Transport Protocol (RTP) Payload Formats 441 for the Variable-Rate Multimode Wideband (VMR-WB) Audio Codec", 442 RFC YYYY, Internet Engineering Task Force, Dec. 2004. 444 [3] S. Bradner, "Key words for use in RFCs to Indicate 445 Requirement Levels", BCP 14, RFC 2119, Internet Engineering 446 Task Force, March 1997. 448 [4] J. Sjoberg, et al., "Real-Time Transport Protocol (RTP) 449 Payload Format and File Storage Format for the Adaptive 450 Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 451 (AMR-WB) Audio Codecs", RFC 3267, Internet 452 Engineering Task Force, June 2002. 454 Informative References 456 [5] H. Schulzrinne, "RTP Profile for Audio and Video 457 Conferences with Minimal Control" STD 65, RFC 3551, Internet 458 Engineering Task Force, July 2003. 460 Any 3GPP2 document can be downloaded from the 3GPP2 web 461 server, "http://www.3gpp2.org/", see specifications. 463 Author's Address 464 The editor will serve as the point of contact for all 465 technical matters related to this document. 467 Dr. Sassan Ahmadi Phone: 1 (858) 831-5916 468 Fax: 1 (858) 831-4174 469 Nokia Inc. Email: sassan.ahmadi@nokia.com 470 12278 Scripps Summit Dr. 471 San Diego, CA 92131 USA 473 This Internet-Draft expires in six months from October 18, 2004. 475 RFC Editor Considerations 477 The RFC editor is requested to replace all occurrences of XXXX with 478 the RFC number that this document will receive. 480 The RFC editor is also requested to replace all occurrences of YYYY 481 with the RFC number that [2] will receive. 483 IPR Notice 485 The IETF takes no position regarding the validity or scope of any 486 Intellectual Property Rights or other rights that might be claimed 487 to pertain to the implementation or use of the technology described 488 in this document or the extent to which any license under such 489 rights might or might not be available; nor does it represent that 490 it has made any independent effort to identify any such rights. 491 Information on the procedures with respect to rights in RFC 492 documents can be found in BCP 78 and BCP 79. 494 Copies of IPR disclosures made to the IETF Secretariat and any 495 assurances of licenses to be made available, or the result of an 496 attempt made to obtain a general license or permission for the use 497 of such proprietary rights by implementers or users of this 498 specification can be obtained from the IETF on-line IPR repository 499 at http://www.ietf.org/ipr. 501 The IETF invites any interested party to bring to its attention any 502 copyrights, patents or patent applications, or other proprietary 503 rights that may cover technology that may be required to implement 504 this standard. Please address the information to the IETF at 505 ietf-ipr@ietf.org. 507 Copyright Notice 509 Copyright (C) The Internet Society (2004). This document is 510 subject to the rights, licenses and restrictions contained in BCP 511 78, and except as set forth therein, the authors retain all their 512 rights. 514 This document and the information contained herein are provided on 515 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 516 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND 517 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 518 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT 519 THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR 520 ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A 521 PARTICULAR PURPOSE.