[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate



Hi, Sassan,

...<snip>...

Is it true that all the coded frames output from a VMR-WB __encoder__ use the 12.8k sampling rate, independent of the original sampling rate of the speech?

The above statement is true. However, I want to make sure that it is not misinterpreted.

The VMR-WB encoder converts the 8 or 16 kHz sampled input speech to 12.8 kHz prior to the encoding functions. This INTERNAL sampling frequency is transparent (hidden) to the
user. The bit stream generated by the encoder is then transmitted to the VMR-WB decoder.

So this is sort of a "normalization" of the original sampling frequency, and is part of the pre-processing in the encoder. I guess in theory the original speech can be of sampling rates other than 8k and 16k and as far as the right "normalization/re-sampling" algorithm is applied, the encoder will work just fine.

The VMR-WB decoding functions are independent of the encoder input speech sampling frequency.

Of cause, since it knows that it only needs to deal with the 12.8k internal sampling rate. But without outside hints, can the decoder still be able to tell what the original sampling rate is? I think this is related to Magnus's question about the missing rate information in the storage file format. By looking at the data in a stored vmr-wb file, I guess one won't be able to tell what the original sampling rate of the speech is, i.e., the original sampling rate info is lost forever after the speech was "rate normalized", encoded, and put into a file with the current file format definition, right?

By default, the VMR-WB decoder generates a wideband output, unless instructed otherwise.
The internal sampling frequency must now be converted to 16 kHz (for wideband output) and the higher frequency band (6.4 to 7 kHz spectrum) must be reconstructed by the decoder. If a narrowband output is desired then 12.8 kHz sampling frequency must be converted to 8 kHz. Therefore, you CANNOT use the 12.8 kHz internal sampling frequency for any other
purposes than the encoding-decoding functions.


Depending on the output audio interface (or the network interface), one may wish to instruct the decoder to generate a narrowband or wideband output.

How and from whom will the decoder be "instructed" about the final output sampling rate to use (8k or 16k)? Strictly speaking, the decoder does not need this instruction. More precisely, it is the re-sampler after the decoder that needs to be told about the desired output sampling rate, right?


For proper operation, the RTP timestamp clock rate must be either 8000 or 16000 depending on the narrowband or wideband operation, respectively. The 12800 Hz internal sampling
rate CANNOT be used for the RTP timestamp clock rate. The correct timestamp or clock rate
(8000 or 16000) is required for proper buffering and other functions in the transmitting and receiving sites.

This means that, if we take an RTP packet that contains some vmr-wb coded frames, even though the timestamp in the header may say 8000 or 16000 rate, the speech bits are actually always from 12.8k speech, right?


regards,
-Qiaobing


_______________________________________________ Audio/Video Transport Working Group avt at ietf.org https://www1.ietf.org/mailman/listinfo/avt