[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate
Hi,
On 13 Sep 2004, at 16:41, Magnus Westerlund wrote:
I think we have two issues:
A. Is there any benefit to indicate or request that the sampling
frequency used at the sender.
Yes. This is why RTP has the "rate" parameter, and uses the sampling
rate as the RTP timestamp rate.
B. Is it necessary to use the sampling frequency as RTP timestamp rate.
It's highly desirable.
I will start with A that I think is easier to explain and also can
provide some information for issue B. If you find any of my
assumptions and statements are incorrect, please correct me.
To my understanding of the VMR-WB after a conversation with Jonas
Svedberg is that the VMR-WB will provide a somewhat better encoding of
8kHz material if it is indicated that the input is 8kHz. However there
is no need due to compatibility or decoder operation to signal the
case where the 8kHz is used as input into the encoder. These would
then result that the only case needed to be signaled between encoder
and decoder is cases where the decoder will use output at 8kHz.
Because if the decoder can request that the encoder uses 8kHz input
some improvement of the 8kHz material is achieved. In the other cases
where the receiver is capable of 16kHz it doesn't matter for the
receiver if the original audio was 8 or 16kHz from a decoding point of
view.
Colin, if one looks at issue B. Is it really needed to use the RTP
timestamp frequency equal to the sampling rate used? I would say NO to
that question.
Yes, it is necessary to use an RTP timestamp equal to the sampling rate.
My reasoning is the following.
- Many audio input is sampled from a source at a higher rate then the
encoder may handle. Thus a resampling and pre-processing stage is
employed based on the encoders input frequency rather then producing
that rate initially from the hardware. Some of the reason is that the
pre-processing may actually yield better results than what the
hardware at given input rate can gain. Another reason may be that one
like to avoid switching the hardware between rate if changing the
encoding.
- The frame based decoders does not need to know the encoders input
rate. The encoder may anyway resample this into other rates for
internal processing and band limited signals. I would claim that
VMR-WB, AMR-WB+ and AAC are all example of codecs that perform this
kind of tricks. On the receiver side they produce a output signal that
has any sampling frequency the receiver finds most useful. Either
causing clipping of the higher frequencies, but more commonly to a
higher clock rate, despite that no more information is provided simply
for ease of use.
- The frame based codecs do only need a RTP timestamp that allows the
receiver to correctly reconstruct the time line when the encoding is
done with the most audio bandwidth. In the VMR-WB case this is 16kHz.
AMR-WB+ is even more strange, as we have selected an RTP timestamp
rate that results in that all internal sampling frequencies will
result in integer timestamp ticks. Thus actually allowing one to
correctly calculate frame alignment when the internal sampling
frequency changes. That the frequency also is possible to recalculate
into several common sampling frequencies with few partial sample
alignments was also considered.
Thus I would use this to argue that indicating the actual sampling
frequency is not necessarily as long as the receiver is capable of
correctly reconstruct the media stream with its timing information in
full resolution.
True, but it greatly simplifies the system if all codecs use the
sampling rate as the RTP clock rate. You can make things work if each
codec uses a different rate, but it's desirable that RTP is consistent
where possible. Why is this codec so special that it needs to break
this rule?
In the VMR-WB case I would think that having only one timestamp rate
of 16kHz does not effect codec operation and would simplify the
handling when one has some senders that do use 8kHz, especially when
gateways need to encoded sometime 8kHz material from pre-recorded
responses and in other cases WB channel data. This do avoid the need
to perform RTP timestamp rate switches.
But in the process you make senders that support multiple codecs more
complex, since they can't use the sampling rate to drive the media
clock for all codecs.
If desired to have this possibility to request by a receiver that the
sender do use 8kHz input then one should introduce a MIME parameter
for this. However I would like to avoid using the "rate" parameter as
it results in unnecessary barriers in form of signalling and RTP
timestamp rate switching.
I disagree. Using the rate parameter is consistent with other payload
formats, and so will simplify the system overall.
Colin
--
Colin Perkins
http://csperkins.org/
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt