Gino,
On 31 Aug 2004, at 01:08, Scribano Gino-QA1087 wrote:
...
The VMR-WB decoder can operate at 16 KHz or 8 KHz, regardless of the
encoding sampling rate. Further, the VMR-WB decoder operating at 16
KHz sampling mode can process frames that have been encoded with 8
KHz sampled format with minimal audio degradation and minimal audio
artifacts. This is especially true for tones and announcements that
are preceded by at least 100ms of silence, which is typically the
case. Therefore, we believe that the subject draft should allow for
sending frames that have been encoded using 16 KHz sampling or encoded
using 8 KHz sampling without requiring an associated end-to-end
session renegotiation.
I strongly disagree. This would violate the fundamental RTP timestamp
rules. An RTP audio session should use a single clock rate throughout.
[Gino] Apologies if this was not clear, but our intention and
understanding is that our proposal does not violate the fundamental
RTP timestamp rules.
I disagree.
We are proposing use of a single clock rate, with an additional
specification for this clock rate to be referenced at specific point
in the codec (ie, the decoder). Further, this clock rate does not
change within a session.
However the RTP clock rate does change, which is the problem.
Rather, our intention is to enable sufficient flexibility for clock
rate adaptations (eg, up-sampling and down-sampling), which are
typically outside the scope of RTP, but not in the case of VMR-WB. As
an example, it is our understanding that with G.729 you could resample
the audio on the input from, say, a 44.1 kHz source to 8 KHz, and set
up an "8 KHz session" via SDP without violating RTP.
Correct.
This cannot be flexibly supported for two common sampling modes (8 KHz
and 16 KHz) with VMR-WB because they are included in the VMR-WB
payload specification. Since these functions are specified, two unique
sampling modes exist, and hence two timestamp intervals exists, and
hence sampling rate restriction exists. We think of the tone injection
scenario as taking an 8 KHz input, up-sampling to 16 KHz, and running
into the VMR-WB.
This would be acceptable.
The issue is that the VMR-WB codec specification contains this
up-sampling function, and hence calls out a separate timestamp period
and mode for the 8 KHz operation. As such, the required up-sampling
function is not outside the scope of RTP.
Which is what makes it unacceptable unless signalled, since it requires
changing the RTP timestamp rate within a session.