Network Working Group S. Wenger Internet Draft C. Perkins Document: draft-ietf-avt-variable-rate-audio-00.txt Expires: April 2005 October 2004 RTP Timestamp Frequency for Variable Rate Audio Codecs Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF AVT WG. Comments should be directed to the AVT WG mailing list, avt@ietf.org. Abstract This memo discusses the problems of audio codecs with variable external sampling rates. Historically, for audio codecs, the RTP timestamp frequency was chosen to match the sampling rate of the audio codec. However, this choice is nowadays more difficult to justify, because of the advent of audio codecs (and, even more important, practical use cases) that support multiple sample rates and the switch between the sample rates during the lifetime of an RTP session. This Internet draft addresses the problem by suggesting that RTP Payload RFCs for such codecs to utilize a single, high, unified RTP timestamp frequency. 1. Introduction Internet Draft October 2004 One key property of audio codecs is the external input sample rate. For many of codecs, this sample rate is fixed. ITU-T G.711 [2], also known as a-law and mu-law, uses, for example, a sample rate of 8 kHz. Other audio codecs give the user a choice between different sample rates. However, until recently, applications never changed the sample rate during the lifetime of an RTP session, even if this is technically feasible and probably advantageous from both the user perception, and the network point-of-view. One example for such a codec is MPEG-1 audio, layers 1, 2, or 3 [3]. At the time RTP [1] and the AV-profile [4] was developed, it was a reasonable design choice to use an RTP timestamp frequency that is identical to the codec's input sample rate, as this facilitates sample exact synchronization and processing of media data in endpoints, mixers and translators, among other advantages. Although neither RTP [1] nor the audio-visual profile [4] require the codec sample rate being the same as the RTP timestamp frequency, this paradigm was observed in practice. Recently, codecs have been developed which do not only support variable sample rates, but use unannounced (in-band only signaled) changes of the sample rate as one of their key mechanisms. Similarly, applications have emerged, that not only support variable sample rates, but, to some extend, rely on this feature. For most (if not all) of these codecs, it is true that the required bit rate and the user experience scales with the sample rate selected. This allows, in the future, a network-dictated scaling of the transmission bit rate of an audio codec -- a feature that was not available before -- which could turn out to be very useful in Internet environments, for example to support congestion control. With the modern codecs mentioned, the current paradigm of RTP time stamp frequency equal to codec sample rate does not make much sense any more. The purpose of this draft is to provide guidance for the developers of RTP payload specs for codecs with variable sample rate to use a single, relatively high, RTP timestamp frequency, which is specified in this draft. 2. Audio codecs with variable sample rates: Examples Examples for audio codecs with variable sample rates, that (at least in theory) could switch the sample rate on the fly without out-of-band signaling support, include: * AMR-WB+ [5] with a choice of 56 different sample rates * VMR-WB [6] with the choice of 8 kHz and 16 kHz sample rates * MPEG-4 AAC+ [7] with the choice of (need details here) * Any others? All these codecs use in-band signaling of the sample rate. 3. Rounding Wenger, Perkins Expires April 2005 Page 2 Internet Draft October 2004 It is possible (even likely) that no unified RTP timestamp frequency can be found that, on one hand, fulfills one key requirement spelled out later (namely: is low enough to make timestamp wrap-around during erasure periods unlikely for all practical application scenarios) and, one the other hand, is an integer multitude of all sampling frequencies the codecs support. It is well possible that, in the future, codecs be developed that can make sample rate choices in a granularity of 1 Hz or even finer. Considering this, it is required to specify a rounding algorithm for such cases where no sample-exact position of an audio frame can be found in the RTP timestamp numbering space. Specifying this rounding algorithm ensures that all equipment conforming to this draft use the same rounding algorithm. If that selected rounding algorithm guaranties that inaccuracies do not add up (as spelled out in the requirements later), then even frequent transcoding steps will not lead to an increase to inaccuracy of the timing beyond the unavoidable minimum. 4. Requirements discussion 4.1. Requirements for this draft (general) 1) This draft MUST specify a unified RTP timestamp frequency that fulfills the requirements of section 4.2. 2) This draft MUST specify a rounding algorithm that can be used for non-sample exact alignment of samples stemming from more than one audio codec, at least one of which having a variable sample rate). The rounding algorithm MUST fulfill the requirements of section 4.3. 3) This draft SHOULD state that its provisions MUST be used for the design of future RTP payload formats for audio codecs with variable sample rates 4) This draft SHOULD state that its provisions SHOULD be considered in the design of future RTP payload formats for non-audio codecs that have similar problems as variable sample rate audio codecs. 5) This draft SHOULD provide an application example for a well-understood variable sample rate codec. 4.2. Requirements for the unified RTP timestamp rate 6) The unified RTP timestamp rate (uRTR) MUST be sufficiently high to fulfill the requirements for timestamps in RFC3550[1] 7) The uRTR MUST be low enough to make wrap-arounds of the RTP timestamp during erasure periods (packet loss bursts) unlikely in all reasonable application scenarios. Informative note: Such scenarios include, for example, cell handovers in wireless cellular networks, where erasure periods of a few seconds can occur. 8) The uRTR SHOULD share the prime factors of the sample rates of the most commonly used fixed sample rate audio codecs, so to allow for sample exact mixing of streams coded by those fixed sample rate audio codecs. 9) The uRTR SHOULD be chosen to include a sufficiently high number of prime factors so to support as many future variable sample rate codec code points as possible for sample-exact mixing Wenger, Perkins Expires April 2005 Page 3 Internet Draft October 2004 4.3. Requirements for a rounding algorithm 10) The rounding algorithm MUST be applicable for all sample rates lower than the 0.5 * uRTR specified in this draft. 11) The rounding algorithm MAY specify a minimum and maximum sample rate, in units of x * uRTR. Only within this band it is a reasonable expectation that the application of the rounding algorithm does not lead to audible distortions for the common user. 12) The rounding algorithm MUST be simple enough to be implemented, without a serious cycle burden, in networking equipment. 13) The rounding algorithm SHOULD be imlementable in fixed-point arithmetic 14) The rounding algorithm MAY, advantageously, be specified such that it does not require division operations 15) The rounding algorithm SHOULD be designed such that that multiple applications of the algorithm does not lead to the introduction of errors larger than one tick of the uRTR. Informative Note: this is a much more difficult requirement as it seems at the first glance. Think of a transcoding scenario where variable goes to 44.1 kHz goes to variable, and the unified timestamp frequency does not share all prime factors of 44.1 kHz. One way out of this would be to rewrite all fixed rate payload specs that use timestamp frequencies that do not fit into the prime factors of the uRTR to be rewritten so to use the uRTR. Is it possible to do this for 44.1 -- or is this nailed down in RFC3551? 5. Open issues * Very general: is this a good idea? * What would be a good choice for the uRTR? 192 kHz? * Is it a good idea to require ALL future I-Ds on audio (not only the variable clock frequency ones) to use the uRTR? * Or only those that do not fit the uRTR (fit == subset of prime factors)? * Revisit CD 44.1. No variable sample rate needed? Are there proposals for an 88.2 CD audio codec? 6. Security Considerations None 7. Congestion Control None 8. IANA Consideration None 9. Acknowledgements None 10. Full Copyright Statement Wenger, Perkins Expires April 2005 Page 4 Internet Draft October 2004 Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 11. Intellectual Property Notice The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. 12. References 12.1. Normative References [1] RTP, RFC 3550, STD 64 12.2. Informative References [2] G.711 [3] ISO/IEC 11172 part 3 [4] RTP AV profile, RFC 3551, STD 65 [5] AMR-WB+ [6] VMR-WB [7] ISO/IEC 14496 part xxx, AAC+ 13. Author's Addresses Stephan Wenger Phone: +358-50-486-0637 Nokia Research Center Email: stewe@stewe.org Wenger, Perkins Expires April 2005 Page 5 Internet Draft October 2004 P.O. Box 100 FIN-33721 Tampere Finland Colin Perkins University of Glasgow Department of Computing Science 17 Lilybank Gardens Glasgow G12 8QQ United Kingdom 14. RFC Editor Considerations none Wenger, Perkins Expires April 2005 Page 6