Audio Video Transport WG Sassan Ahmadi INTERNET-DRAFT Nokia Inc. Expires: April 18, 2005 October 18, 2004 Storage File Format for the Variable-Rate Multimode Wideband (VMR-WB) Audio Codec Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF AVT WG. Comments should be directed to the AVT WG mailing list, avt@ietf.org. Abstract This document specifies a file format for the storage of variable-rate multimode wideband (VMR-WB) speech codec. A MIME type registration is included for VMR-WB files. VMR-WB is a variable-rate multimode wideband speech codec that has a number of operating modes, one of which is interoperable with AMR-WB (i.e., RFC 3267) audio codec at certain rates. Therefore, provisions have been made in this draft to facilitate retrieval of VMR-WB stored data (generated in the interoperable mode) by AMR-WB decoder. Table of Contents 1.Introduction.................................................2 2.Conventions and Acronyms.....................................2 Sassan Ahmadi [page 1] INTERNET-DRAFT VMR-WB File Format Oct. 2004 3.Overview of VMR-WB. ............ ............................3 4. VMR-WB File Format..........................................4 4.1. Single Channel Header..................................4 4.2. Multi-Channel Header...................................4 4.3. Speech Frames..........................................5 5. Security Considerations.....................................6 6. VMR-WB File Format MIME Registration........................7 7. IANA Considerations.........................................9 8. Acknowledgements............................................9 References.....................................................9 Normative References........................................9 Informative References......................................9 Author's Address...............................................9 IPR Notice....................................................10 Copyright Notice..............................................10 1. Introduction This document specifies a file format for storage of VMR-WB encoded Speech/audio data. The VMR-WB file format supports single and multi-channel storage. It further facilitates decoding of VMR-WB generated files by AMR-WB decoder [4]. The file format is specified in Section 4. A MIME type registration for VMR-WB file format is provided in Section 6. The VMR-WB RTP payload formats have been specified in a separate document [2]. To ensure coherence with RFC YYYY [2], common tables and parameters are not defined in this document, rather corresponding tables and parameters of [2] are referenced. 2. Conventions and Acronyms The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [2]. The following acronyms are used in this document: 3GPP2 - The Third Generation Partnership Project 2 CDMA - Code Division Multiple Access AMR-WB - Adaptive Multi-Rate Wideband Codec VMR-WB - Variable-Rate Multimode Wideband Codec MIME - Multipurpose Internet Mail Extension The term "interoperable mode" in this document refers to VMR-WB mode 3, which is interoperable with AMR-WB codec modes 0, 1, and 2. The term "non-interoperable modes" in this document refers to Sassan Ahmadi [page 2] INTERNET-DRAFT VMR-WB File Format Oct. 2004 VMR-WB modes 0, 1, and 2. The term "frame-block" is used in this document to describe the time-synchronized set of speech frames in an N-channel storage scenario. A frame-block will contain N speech frames, one from each of the channels, and all N speech frames represent exactly the same time period. 3. Overview of VMR-WB VMR-WB is the wideband speech-coding standard developed by Third Generation Partnership Project 2 (3GPP2) for encoding/decoding wideband/narrowband speech content in multimedia services in 3G CDMA cellular systems [1,2]. It has a number of operating modes, where each mode is a tradeoff between voice quality and average data rate. While VMR-WB is a native CDMA codec complying with all CDMA system requirements, it is further interoperable with AMR-WB [4] at 12.65, 8.85, and 6.60 kbps. VMR-WB by default is a wideband codec operating with 16000 Hz sampled media (i.e., speech or audio); however, it is further capable of processing 8000 Hz sampled media in all modes of operation [1]. The VMR-WB decoder does not require a priori knowledge about the sampling rate of the original media (i.e., speech/audio signals sampled at 8 or 16 kHz) at the input of the encoder. The VMR-WB decoder, by default, generates 16000 Hz wideband output Regardless of the encoder input sampling frequency, unless instructed otherwise. 4. VMR-WB File Format The storage format is used for storing VMR-WB encoded speech frames in a file or as an e-mail attachment. Multiple channel content is also supported. The storage format described in section is fully consistent with the one described in Section 8.5 of [1]. Note: The storage format described in this document uses several magic numbers to differentiate between interoperable and non-interoperable modes of VMR-WB as well as single and multi-channel files. This may be accomplished in other ways that are simpler and more straightforward that one should consider in design of future storage formats. The use of different magic numbers and file extensions for the files generated by the interoperable and non-interoperable modes of VMR-WB enables a file reader to decide if it is capable of decoding the content without opening the file or attempting to decode the content. In general, VMR-WB file has the following structure: Sassan Ahmadi [page 3] INTERNET-DRAFT VMR-WB File Format Oct. 2004 +------------------+ | Header | +------------------+ | Speech frame 1 | +------------------+ : ... : +------------------+ | Speech frame n | +------------------+ 4.1. Single channel Header A single channel VMR-WB file header contains only a magic number. The magic number for single channel VMR-WB files containing speech data generated in the non-interoperable modes; i.e., VMR-WB modes 0, 1, or 2, MUST consist of ASCII character string "#!VMR-WB\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in hexadecimal). Note, the "\n" is an important part of the magic numbers and MUST be included in the comparison; otherwise, the single channel magic number above will become indistinguishable from that of the multi-channel file defined in the next section. The magic number for single channel VMR-WB files containing speech data generated in the interoperable mode; i.e., VMR-WB mode 3, MUST consist of ASCII character string "#!VMR-WB_I\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a in hexadecimal). In the interoperable mode, a file generated by VMR-WB is decodable with AMR-WB (with the exception of different magic numbers). However, VMR-WB can only decode AMR-WB codec modes 0, 1, and 2. The AMR-WB single channel magic number and AMR-WB file extension [4] can also be used to store speech data generated by VMR-WB encoder operating in the interoperable mode to facilitate decoding of the file by an AMR-WB decoder. Since VMR-WB decoder is only capable of decoding certain AMR-WB codec modes, it MUST be ensured that only supported codec modes of AMR-WB are presented to the VMR-WB decoder. 4.2. Multi-channel Header The multi-channel header consists of a magic number followed by a 32-bit channel description field, giving the multi-channel Sassan Ahmadi [page 4] INTERNET-DRAFT VMR-WB File Format Oct. 2004 header the following structure: +----------------------------+ | Magic Number | +----------------------------+ | Channel Description Field | +----------------------------+ The magic number for multi-channel VMR-WB files containing speech data generated in the non-interoperable modes; i.e., VMR-WB modes 0, 1, or 2, MUST consist of the ASCII character string "#!VMR-WB_MC1.0\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x31 0x2E 0x30 0x0a in hexadecimal). The version number in the magic numbers refers to the version of the file format. The magic number for multi-channel VMR-WB files containing speech data generated in the interoperable mode; i.e., VMR-WB mode 3, MUST consist of the ASCII character string "#!VMR-WB_MCI1.0\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x49 0x31 0x2E 0x30 0x0a in hexadecimal). The 32-bit channel description field is defined as 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved bits | CHAN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved bits: MUST be set to 0 when written, and a reader MUST ignore them. CHAN (4 bit unsigned integer): Indicates the number of audio channels contained in this storage file. The valid values and the order of the channels within a frame-block are specified in Section 4.1 in [5]. The AMR-WB multi-channel magic number and AMR-WB file extension [4] can also be used to store speech data generated by VMR-WB encoder operating in the interoperable mode to facilitate decoding of the file by an AMR-WB decoder. Since VMR-WB decoder is only capable of decoding certain AMR-WB codec modes, it MUST be ensured that only supported codec modes of AMR-WB are presented to the VMR-WB decoder. 4.3. Speech Frames Sassan Ahmadi [page 5] INTERNET-DRAFT VMR-WB File Format Oct. 2004 After the file header, speech frame-blocks consecutive in time are stored in the file. Each frame-block contains a number of octet-aligned speech frames equal to the number of channels, and stored in increasing order, starting with channel 1. Each stored speech frame starts with a one-octet frame header with the following format: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |P| FT |Q|P|P| +-+-+-+-+-+-+-+-+ The FT field is defined in Table 3 of [2]. Padding bits MUST be set to zero and MUST be ignored by a receiver. Q (1 bit): Frame quality indicator. If set to 0, indicates the corresponding frame is corrupted. The VMR-WB encoder always sets Q bit to 1. The VMR-WB decoder may ignore the Q bit. Following this one octet header, the speech bits are placed as defined in Section 6.3.4 of [2]. The last octet of each frame is padded with zeroes, if needed, to achieve octet alignment. The following example shows a VMR-WB speech frame encoded at Half-Rate (with 124 speech bits) in the storage format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| FT=4 |1|0|0| | +-+-+-+-+-+-+-+-+ + | | + Speech bits for frame-block n, channel k + | | + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |0|0|0|0| +-+-+-+-+-+-+-+-+ Frame-blocks or speech frames lost in transmission MUST be stored as Erasure/SPEECH_LOST (FT=14) and non-received frame-blocks between SID updates during non-speech periods (when using DTX) MUST be stored as Blank/NO_DATA frames (FT=15) in complete frame-blocks to maintain synchronization with the original media. 5. Security Considerations This document specifies a file format only, not a streaming protocol payload format, nor a transfer method. As such, it introduces no security risks in addition to those associated with any audio codec or media file format (e.g., denial of service by transmitting a Sassan Ahmadi [page 6] INTERNET-DRAFT VMR-WB File Format Oct. 2004 file larger than the receiver can handle). Note that those security concerns should be understood before using the file format specified here. Clearly it is possible to author malicious files in order to attack a receiver. However, clients can and usually do protect themselves against this kind of attack. There is currently no provision in the standards for encryption, signing, or authentication of this file format. However, depending on the application, external mechanisms can be used to provide privacy, authentication, and protection against authorized use or distribution of the media. 6. VMR-WB File Format MIME Registration This section defines the parameters that may be used to select optional features in the VMR-WB storage format. The parameters are defined here as part of the MIME subtype registration for the VMR-WB file format. The MIME subtype for the Variable-Rate Multimode Wideband (VMR-WB) audio codec is allocated from the IETF tree. This MIME registration covers non-real-time transfers via stored files. Note, the receiver MUST ignore any unspecified parameter and use the default values instead. Media Type name: audio Media subtype name: VMR-WB-FILE Required parameters: none Note that if no input parameters are defined, the default values will be used. OPTIONAL file format parameters: mode-set: see RFC YYYY [2] channels: see RFC YYYY [2] Encoding considerations: This type is defined for transfer of VMR-WB data using the file format specified in Section 4 of RFC XXXX. The stored file format is binary data and must be encoded for non-binary transport; the Base64 encoding is suitable in many cases. Security considerations: See Section 5 of this document. Sassan Ahmadi [page 7] INTERNET-DRAFT VMR-WB File Format Oct. 2004 Public specification: The VMR-WB speech codec is specified in following 3GPP2 specifications C.S0052-0 version 1.0. File format is specified in RFC XXXX. Transfer methods are specified in RFC YYYY. Additional information: Magic numbers: Single channel (for the non-interoperable modes) ASCII character string "#!VMR-WB\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in hexadecimal) Single channel (for the interoperable mode) ASCII character string "#!VMR-WB_I\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a in hexadecimal) Multi-channel (for the non-interoperable modes) ASCII character string "#!VMR-WB_MC1.0\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x31 0x2E 0x30 0x0a in hexadecimal) Multi-channel (for the interoperable mode) ASCII character string "#!VMR-WB_MCI1.0\n" (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x49 0x31 0x2E 0x30 0x0a in hexadecimal) File extensions for the non-interoperable modes: vmr, VMR Macintosh file type code: none Object identifier or OID: none File extensions for the interoperable mode: vmi, VMI Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: Sassan Ahmadi, Ph.D. Nokia Inc. USA sassan.ahmadi@nokia.com Intended usage: COMMON. This file format is expected to be widely used in Internet email user agents, multimedia authoring and playing software, and CDMA2000 mobile terminals. Author/Change controller: IETF Audio/Video Transport working group delegated from the IESG Sassan Ahmadi [page 8] INTERNET-DRAFT VMR-WB File Format Oct. 2004 7. IANA Considerations It is requested that one new MIME subtype (audio/VMR-WB-FILE) is registered by IANA, see Section 6. 8. Acknowledgements The author would like to thank Redwan Salami of VoiceAge Corporation, Ari Lakaniemi of Nokia Inc., and IETF/AVT chairs Colin Perkins and Magnus Westerlund for their technical comments to improve this document. Also, the author would like to acknowledge that some parts of RFC 3267 [4] has been used in this document. References Normative References [1] 3GPP2 C.S0052-0 v1.0 "Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems", 3GPP2 Technical Specification, July 2004. [2] S. Ahmadi, "Real-Time Transport Protocol (RTP) Payload Formats for the Variable-Rate Multimode Wideband (VMR-WB) Audio Codec", RFC YYYY, Internet Engineering Task Force, Dec. 2004. [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, Internet Engineering Task Force, March 1997. [4] J. Sjoberg, et al., "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, Internet Engineering Task Force, June 2002. Informative References [5] H. Schulzrinne, "RTP Profile for Audio and Video Conferences with Minimal Control" STD 65, RFC 3551, Internet Engineering Task Force, July 2003. Any 3GPP2 document can be downloaded from the 3GPP2 web server, "http://www.3gpp2.org/", see specifications. Author's Address Sassan Ahmadi [page 9] INTERNET-DRAFT VMR-WB File Format Oct. 2004 The editor will serve as the point of contact for all technical matters related to this document. Dr. Sassan Ahmadi Phone: 1 (858) 831-5916 Fax: 1 (858) 831-4174 Nokia Inc. Email: sassan.ahmadi@nokia.com 12278 Scripps Summit Dr. San Diego, CA 92131 USA This Internet-Draft expires in six months from October 18, 2004. RFC Editor Considerations The RFC editor is requested to replace all occurrences of XXXX with the RFC number that this document will receive. The RFC editor is also requested to replace all occurrences of YYYY with the RFC number that [2] will receive. IPR Notice The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Copyright Notice Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on Sassan Ahmadi [page 10] INTERNET-DRAFT VMR-WB File Format Oct. 2004 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Sassan Ahmadi [page 11]