[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate
Hi, Sassan,
...<snip>...
Is it true that all the coded frames output from a VMR-WB
__encoder__ use the 12.8k sampling
rate, independent of the original sampling rate of the speech?
The above statement is true. However, I want to make sure that it is not misinterpreted.
The VMR-WB encoder converts the 8 or 16 kHz sampled input speech to 12.8 kHz prior to
the encoding functions. This INTERNAL sampling frequency is transparent (hidden) to the
user. The bit stream generated by the encoder is then transmitted to the VMR-WB decoder.
So this is sort of a "normalization" of the original sampling frequency, and is part of the
pre-processing in the encoder. I guess in theory the original speech can be of sampling
rates other than 8k and 16k and as far as the right "normalization/re-sampling" algorithm is
applied, the encoder will work just fine.
The VMR-WB decoding functions are independent of the encoder input speech sampling frequency.
Of cause, since it knows that it only needs to deal with the 12.8k internal sampling rate.
But without outside hints, can the decoder still be able to tell what the original sampling
rate is? I think this is related to Magnus's question about the missing rate information in
the storage file format. By looking at the data in a stored vmr-wb file, I guess one won't
be able to tell what the original sampling rate of the speech is, i.e., the original
sampling rate info is lost forever after the speech was "rate normalized", encoded, and put
into a file with the current file format definition, right?
By default, the VMR-WB decoder generates a wideband output, unless instructed otherwise.
The internal sampling frequency must now be converted to 16 kHz (for wideband output)
and the higher frequency band (6.4 to 7 kHz spectrum) must be reconstructed by the decoder.
If a narrowband output is desired then 12.8 kHz sampling frequency must be converted to 8
kHz. Therefore, you CANNOT use the 12.8 kHz internal sampling frequency for any other
purposes than the encoding-decoding functions.
Depending on the output audio interface (or the network interface), one may wish to
instruct the decoder to generate a narrowband or wideband output.
How and from whom will the decoder be "instructed" about the final output sampling rate to
use (8k or 16k)? Strictly speaking, the decoder does not need this instruction. More
precisely, it is the re-sampler after the decoder that needs to be told about the desired
output sampling rate, right?
For proper operation, the RTP timestamp clock rate must be either 8000 or 16000 depending
on the narrowband or wideband operation, respectively. The 12800 Hz internal sampling
rate CANNOT be used for the RTP timestamp clock rate. The correct timestamp or clock rate
(8000 or 16000) is required for proper buffering and other functions in the transmitting
and receiving sites.
This means that, if we take an RTP packet that contains some vmr-wb coded frames, even
though the timestamp in the header may say 8000 or 16000 rate, the speech bits are actually
always from 12.8k speech, right?
regards,
-Qiaobing
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt