[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate
On 24 Sep 2004, at 20:15, Randell Jesup wrote:
Colin Perkins <csp at csperkins.org> writes:
On 13 Sep 2004, at 16:41, Magnus Westerlund wrote:
I think we have two issues:
A. Is there any benefit to indicate or request that the sampling
frequency used at the sender.
Yes. This is why RTP has the "rate" parameter, and uses the sampling
rate as the RTP timestamp rate.
My apologies, but I don't see how that answers the question as
to what the benefit is. How does the receiver make use of this
information? Why is this better for the receiver?
If the codec can have different rates for source material and output
material, it's clearly advantageous to signal that. More information is
clearly better, as I hope is obvious.
For RTP audio, the convention is to use the sampling rate at the input
to drive the timestamp. The benefit is that it gives a consistent RTP
timing model, with well defined semantics independent of the codec.
This makes it simpler to design senders that operate with a range of
codecs, and to design receivers that perform sample-accurate
synchronisation (since the timestamp always increases by exactly 1 per
audio sample in the input, meaning receivers can do synchronisation
independent of the codec).
I can see some advantage in having the RTP rate parameter be
the
normal _output_ sample rate from the codec (which may be uncorrelated
to
the input rate!) in that you could use that to implement an audio codec
with non-fixed frame sizes.
I agree that might be useful. However, it doesn't fit with the
semantics assigned to the RTP timestamp.
B. Is it necessary to use the sampling frequency as RTP timestamp
rate.
It's highly desirable.
Again, asserting this without a reason doesn't convince me.
To make a consistent definition of the RTP timestamp.
Colin, if one looks at issue B. Is it really needed to use the RTP
timestamp frequency equal to the sampling rate used? I would say NO
to
that question.
Yes, it is necessary to use an RTP timestamp equal to the sampling
rate.
Again, no reasons given.
Again, to avoid breaking standard RTP timing model.
My reasoning is the following.
- Many audio input is sampled from a source at a higher rate then the
encoder may handle. Thus a resampling and pre-processing stage is
employed based on the encoders input frequency rather then producing
that
rate initially from the hardware. Some of the reason is that the
pre-processing may actually yield better results than what the
hardware
at given input rate can gain. Another reason may be that one like to
avoid switching the hardware between rate if changing the encoding.
- The frame based decoders does not need to know the encoders input
rate. The encoder may anyway resample this into other rates for
internal
processing and band limited signals. I would claim that VMR-WB,
AMR-WB+
and AAC are all example of codecs that perform this kind of tricks.
On
the receiver side they produce a output signal that has any sampling
frequency the receiver finds most useful. Either causing clipping of
the
higher frequencies, but more commonly to a higher clock rate, despite
that no more information is provided simply for ease of use.
- The frame based codecs do only need a RTP timestamp that allows the
receiver to correctly reconstruct the time line when the encoding is
done
with the most audio bandwidth. In the VMR-WB case this is 16kHz.
AMR-WB+
is even more strange, as we have selected an RTP timestamp rate that
results in that all internal sampling frequencies will result in
integer
timestamp ticks. Thus actually allowing one to correctly calculate
frame
alignment when the internal sampling frequency changes. That the
frequency also is possible to recalculate into several common
sampling
frequencies with few partial sample alignments was also considered.
Thus I would use this to argue that indicating the actual sampling
frequency is not necessarily as long as the receiver is capable of
correctly reconstruct the media stream with its timing information in
full resolution.
True, but it greatly simplifies the system if all codecs use the
sampling rate as the RTP clock rate. You can make things work if each
codec uses a different rate, but it's desirable that RTP is
consistent where possible. Why is this codec so special that it needs
to break this rule?
Again, I simply don't see how things are simplified by having
the RTP timestamp rate be the sender-side sample rate. If the
timestamp rate is 2x or 4x the input rate, or if it's the lowest
common multiple (if you know what I mean) of a range of sample rates
the encoder might use, how does that hurt the receiver?
See above.
In fact, if it _is_ possible for the encoder to use multiple rates
(like VMR-WB), isn't it a lot easier on both sides if we can just us
16KHz for RTP and avoid having the play games if we want to switch
input rates dynamically? Setting the timestamp at the sender end is
trivial.
Also, sticking with a strict timestamp rate == sample rate
dictate means that various classes of codecs won't fit well, will be
harder to write and/or use with RTP, or won't be developed because the
transports for them would be painful.
I agree. This is a limitation of the way RTP has evolved, using the
input sampling rate for the audio timestamp. Unfortunately changing the
meaning of the RTP timestamp will break things, so we're stuck with the
current model.
Or, perhaps worse for the participant of this group, such codecs will
push the use of alternative mechanisms such as IAX2 or other possible
alternatives. I could simplify my life CONSIDERABLY by ignoring RTP
and using a single multiplexed stream, with lower overhead due to
piggybacked nacks and acks, and much less hassle blowing multiple
holes through firewall/NATS. But for the sake of standards and
compatibility I've gone through the pain of getting dealing with all
the issues surrounding RTP, feedback, RTP-through-NATs, etc, and have
probably hewn (much) closer to the specs than most, I suspect.
Don't put up roadblocks where there isn't a reason to. In this
case, there is a good argument _for this codec_ to use a fixed 16K
timestamp rate.
However, doing so would fragment the consistent RTP timing model. I
agree that using a 16kHz clock for this codec solves a short term
implementation issue; however it makes RTP implementations that
support multiple codec more complex in the long term, and fragments the
standard.
Colin
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt