Network Working Group                                         S. Wenger 
Internet Draft                                               C. Perkins 
Document: draft-ietf-avt-variable-rate-audio-00.txt      
Expires: April 2005                                   
                                                           October 2004 
 
 
 
 
 
         RTP Timestamp Frequency for Variable Rate Audio Codecs 
 
 
 
Status of this Memo 
 
   By submitting this Internet-Draft, I certify that any applicable 
   patent or other IPR claims of which I am aware have been disclosed, 
   or will be disclosed, and any of which I become aware will be 
   disclosed, in accordance with RFC 3668. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or cite them other than as "work in progress". 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
   This document is a submission of the IETF AVT WG.  Comments should 
   be directed to the AVT WG mailing list, avt@ietf.org. 
 
 
Abstract 
 
   This memo discusses the problems of audio codecs with variable 
   external sampling rates.  Historically, for audio codecs, the RTP 
   timestamp frequency was chosen to match the sampling rate of the 
   audio codec.  However, this choice is nowadays more difficult to 
   justify, because of the advent of audio codecs (and, even more 
   important, practical use cases) that support multiple sample rates 
   and the switch between the sample rates during the lifetime of an 
   RTP session.  This Internet draft addresses the problem by 
   suggesting that RTP Payload RFCs for such codecs to utilize a 
   single, high, unified RTP timestamp frequency. 
 
1.   Introduction 
  
Internet Draft                                            October 2004 
   One key property of audio codecs is the external input sample rate.  
   For many of codecs, this sample rate is fixed.  ITU-T G.711 [2], 
   also known as a-law and mu-law, uses, for example, a sample rate of 
   8 kHz.  Other audio codecs give the user a choice between different 
   sample rates.  However, until recently, applications never changed 
   the sample rate during the lifetime of an RTP session, even if this 
   is technically feasible and probably advantageous from both the user 
   perception, and the network point-of-view.  One example for such a 
   codec is MPEG-1 audio, layers 1, 2, or 3 [3].  At the time RTP [1] 
   and the AV-profile [4] was developed, it was a reasonable design 
   choice to use an RTP timestamp frequency that is identical to the 
   codec's input sample rate, as this facilitates sample exact 
   synchronization and processing of media data in endpoints, mixers 
   and translators, among other advantages.  Although neither RTP [1] 
   nor the audio-visual profile [4] require the codec sample rate being 
   the same as the RTP timestamp frequency, this paradigm was observed 
   in practice. 
 
   Recently, codecs have been developed which do not only support 
   variable sample rates, but use unannounced (in-band only signaled) 
   changes of the sample rate as one of their key mechanisms.  
   Similarly, applications have emerged, that not only support variable 
   sample rates, but, to some extend, rely on this feature.  For most 
   (if not all) of these codecs, it is true that the required bit rate 
   and the user experience scales with the sample rate selected.  This 
   allows, in the future, a network-dictated scaling of the 
   transmission bit rate of an audio codec -- a feature that was not 
   available before -- which could turn out to be very useful in 
   Internet environments, for example to support congestion control. 
 
   With the modern codecs mentioned, the current paradigm of RTP time 
   stamp frequency equal to codec sample rate does not make much sense 
   any more.  The purpose of this draft is to provide guidance for the 
   developers of RTP payload specs for codecs with variable sample rate 
   to use a single, relatively high, RTP timestamp frequency, which is 
   specified in this draft. 
 
 
2.   Audio codecs with variable sample rates: Examples 
 
   Examples for audio codecs with variable sample rates, that (at least 
   in theory) could switch the sample rate on the fly without  
   out-of-band signaling support, include: 
    
   *  AMR-WB+ [5] with a choice of 56 different sample rates 
   *  VMR-WB [6] with the choice of 8 kHz and 16 kHz sample rates 
   *  MPEG-4 AAC+ [7] with the choice of (need details here) 
   *  Any others? 
    
   All these codecs use in-band signaling of the sample rate. 
 
 
3.   Rounding 
 
Wenger, Perkins           Expires April 2005                    Page 2 
Internet Draft                                            October 2004 
   It is possible (even likely) that no unified RTP timestamp frequency 
   can be found that, on one hand, fulfills one key requirement spelled 
   out later (namely: is low enough to make timestamp wrap-around 
   during erasure periods unlikely for all practical application 
   scenarios) and, one the other hand, is an integer multitude of all 
   sampling frequencies the codecs support.  It is well possible that, 
   in the future, codecs be developed that can make sample rate choices 
   in a granularity of 1 Hz or even finer.  Considering this, it is 
   required to specify a rounding algorithm for such cases where no 
   sample-exact position of an audio frame can be found in the RTP 
   timestamp numbering space.  Specifying this rounding algorithm 
   ensures that all equipment conforming to this draft use the same 
   rounding algorithm.  If that selected rounding algorithm guaranties 
   that inaccuracies do not add up (as spelled out in the requirements 
   later), then even frequent transcoding steps will not lead to an 
   increase to inaccuracy of the timing beyond the unavoidable minimum. 
 
4.   Requirements discussion 
 
4.1. Requirements for this draft (general) 
 
   1) This draft MUST specify a unified RTP timestamp frequency that 
      fulfills the requirements of section 4.2. 
   2) This draft MUST specify a rounding algorithm that can be used for 
      non-sample exact alignment of samples stemming from more than one 
      audio codec, at least one of which having a variable sample 
      rate).  The rounding algorithm MUST fulfill the requirements of 
      section 4.3. 
   3) This draft SHOULD state that its provisions MUST be used for the 
      design of future RTP payload formats for audio codecs with 
      variable sample rates 
   4) This draft SHOULD state that its provisions SHOULD be considered 
      in the design of future RTP payload formats for non-audio codecs 
      that have similar problems as variable sample rate audio codecs. 
   5) This draft SHOULD provide an application example for a  
      well-understood variable sample rate codec. 
 
4.2. Requirements for the unified RTP timestamp rate 
 
   6) The unified RTP timestamp rate (uRTR) MUST be sufficiently high 
      to fulfill the requirements for timestamps in RFC3550[1] 
   7) The uRTR MUST be low enough to make wrap-arounds of the RTP 
      timestamp during erasure periods (packet loss bursts) unlikely in 
      all reasonable application scenarios.   
         Informative note: Such scenarios include, for example, cell 
         handovers in wireless cellular networks, where erasure periods 
         of a few seconds can occur. 
   8) The uRTR SHOULD share the prime factors of the sample rates of 
      the most commonly used fixed sample rate audio codecs, so to 
      allow for sample exact mixing of streams coded by those fixed 
      sample rate audio codecs. 
   9) The uRTR SHOULD be chosen to include a sufficiently high number 
      of prime factors so to support as many future variable sample 
      rate codec code points as possible for sample-exact mixing 
 
Wenger, Perkins           Expires April 2005                    Page 3 
Internet Draft                                            October 2004 
4.3. Requirements for a rounding algorithm 
 
   10)    The rounding algorithm MUST be applicable for all sample 
          rates lower than the 0.5 * uRTR specified in this draft. 
   11)    The rounding algorithm MAY specify a minimum and maximum 
          sample rate, in units of x * uRTR.  Only within this band it 
          is a reasonable expectation that the application of the 
          rounding algorithm does not lead to audible distortions for 
          the common user. 
   12)    The rounding algorithm MUST be simple enough to be 
          implemented, without a serious cycle burden, in networking 
          equipment. 
   13)    The rounding algorithm SHOULD be imlementable in fixed-point 
          arithmetic 
   14)    The rounding algorithm MAY, advantageously, be specified such 
          that it does not require division operations 
   15)    The rounding algorithm SHOULD be designed such that that 
          multiple applications of the algorithm does not lead to the 
          introduction of errors larger than one tick of the uRTR.   
             Informative Note: this is a much more difficult 
             requirement as it seems at the first glance.  Think of a 
             transcoding scenario where variable goes to 44.1 kHz goes 
             to variable, and the unified timestamp frequency does not 
             share all prime factors of 44.1 kHz.  One way out of this 
             would be to rewrite all fixed rate payload specs that use 
             timestamp frequencies that do not fit into the prime 
             factors of the uRTR to be rewritten so to use the uRTR.  
             Is it possible to do this for 44.1 -- or is this nailed 
             down in RFC3551? 
 
5.   Open issues 
 
   *  Very general: is this a good idea? 
   *  What would be a good choice for the uRTR? 192 kHz?  
   *  Is it a good idea to require ALL future I-Ds on audio (not only 
      the variable clock frequency ones) to use the uRTR?   
   *  Or only those that do not fit the uRTR (fit == subset of prime 
      factors)? 
   *  Revisit CD 44.1.  No variable sample rate needed? Are there 
      proposals for an 88.2 CD audio codec? 
 
6.   Security Considerations 
   None 
 
7.   Congestion Control 
   None 
 
8.   IANA Consideration 
   None 
 
9.   Acknowledgements 
   None 
 
10.  Full Copyright Statement 
 
Wenger, Perkins           Expires April 2005                    Page 4 
Internet Draft                                            October 2004 
   Copyright (C) The Internet Society (2004).  This document is subject 
   to the rights, licenses and restrictions contained in BCP 78, and 
   except as set forth therein, the authors retain all their rights. 
    
   This document and the information contained herein are provided on 
   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 
   INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 
   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
 
 
11.  Intellectual Property Notice 
 
   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed 
   to pertain to the implementation or use of the technology described 
   in this document or the extent to which any license under such 
   rights might or might not be available; nor does it represent that 
   it has made any independent effort to identify any such rights. 
   Information on the procedures with respect to rights in RFC 
   documents can be found in BCP 78 and BCP 79. 
    
   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use 
   of such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository 
   at http://www.ietf.org/ipr. 
    
   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
   this standard.  Please address the information to the IETF at ietf- 
   ipr@ietf.org. 
 
12.   References 
 
12.1.     Normative References 
[1]  RTP, RFC 3550, STD 64 
 
12.2.     Informative References 
 
[2]  G.711 
[3]  ISO/IEC 11172 part 3 
[4]  RTP AV profile, RFC 3551, STD 65 
[5]  AMR-WB+ 
[6]  VMR-WB 
[7]  ISO/IEC 14496 part xxx, AAC+ 
 
13.      Author's Addresses 
 
    Stephan Wenger                    Phone: +358-50-486-0637 
    Nokia Research Center              Email: stewe@stewe.org 
Wenger, Perkins           Expires April 2005                    Page 5 
Internet Draft                                            October 2004 
    P.O. Box 100 
    FIN-33721 Tampere 
    Finland 
 
    Colin Perkins <csp@csperkins.org> 
    University of Glasgow 
    Department of Computing Science 
    17 Lilybank Gardens 
    Glasgow G12 8QQ 
    United Kingdom 
 
 
14.  RFC Editor Considerations 
 
none 
 
Wenger, Perkins           Expires April 2005                    Page 6