[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Sampling Rates comment on draft-ietf-avt-rtp-wmr-wb-02



Colin,

My understanding to the vmr-wb codec is still quite limited, but I think there is some very unique feature in its design, that particularly impacts the sampling rate handling and meaning. And that may be the root cause of the misunderstanding between you and Gino.

Please read my earlier comments to Sassan (Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate) where I tried to get some more clarification about the sampling rate in vmr-wb.

regards,
-Qiaobing

Colin Perkins wrote:

Gino,

On 31 Aug 2004, at 01:08, Scribano Gino-QA1087 wrote:
...

The VMR-WB decoder can operate at 16 KHz or 8 KHz, regardless of the
encoding sampling rate. Further, the VMR-WB decoder operating at 16
KHz sampling mode can process frames that have been encoded  with 8
KHz sampled format with minimal audio degradation and minimal audio
artifacts. This is especially true for tones and announcements that
are preceded by at least 100ms of silence, which is typically the
case. Therefore, we believe that the subject draft should allow for
sending frames that have been encoded using 16 KHz sampling or encoded
using 8 KHz sampling without requiring an associated end-to-end
session renegotiation.


I strongly disagree. This would violate the fundamental RTP timestamp
rules. An RTP audio session should use a single clock rate throughout.

[Gino] Apologies if this was not clear, but our intention and understanding is that our proposal does not violate the fundamental RTP timestamp rules.


I disagree.

We are proposing use of a single clock rate, with an additional specification for this clock rate to be referenced at specific point in the codec (ie, the decoder). Further, this clock rate does not change within a session.


However the RTP clock rate does change, which is the problem.

Rather, our intention is to enable sufficient flexibility for clock rate adaptations (eg, up-sampling and down-sampling), which are typically outside the scope of RTP, but not in the case of VMR-WB. As an example, it is our understanding that with G.729 you could resample the audio on the input from, say, a 44.1 kHz source to 8 KHz, and set up an "8 KHz session" via SDP without violating RTP.


Correct.

This cannot be flexibly supported for two common sampling modes (8 KHz and 16 KHz) with VMR-WB because they are included in the VMR-WB payload specification. Since these functions are specified, two unique sampling modes exist, and hence two timestamp intervals exists, and hence sampling rate restriction exists. We think of the tone injection scenario as taking an 8 KHz input, up-sampling to 16 KHz, and running into the VMR-WB.


This would be acceptable.

The issue is that the VMR-WB codec specification contains this up-sampling function, and hence calls out a separate timestamp period and mode for the 8 KHz operation. As such, the required up-sampling function is not outside the scope of RTP.


Which is what makes it unacceptable unless signalled, since it requires changing the RTP timestamp rate within a session.



_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt