[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate



On 24 Sep 2004, at 20:15, Randell Jesup wrote:
Colin Perkins <csp at csperkins.org> writes:
On 13 Sep 2004, at 16:41, Magnus Westerlund wrote:
I think we have two issues:

A. Is there any benefit to indicate or request that the sampling
frequency used at the sender.

Yes. This is why RTP has the "rate" parameter, and uses the sampling rate as the RTP timestamp rate.

My apologies, but I don't see how that answers the question as to what the benefit is. How does the receiver make use of this information? Why is this better for the receiver?

If the codec can have different rates for source material and output material, it's clearly advantageous to signal that. More information is clearly better, as I hope is obvious.


For RTP audio, the convention is to use the sampling rate at the input to drive the timestamp. The benefit is that it gives a consistent RTP timing model, with well defined semantics independent of the codec. This makes it simpler to design senders that operate with a range of codecs, and to design receivers that perform sample-accurate synchronisation (since the timestamp always increases by exactly 1 per audio sample in the input, meaning receivers can do synchronisation independent of the codec).

I can see some advantage in having the RTP rate parameter be the
normal _output_ sample rate from the codec (which may be uncorrelated to
the input rate!) in that you could use that to implement an audio codec
with non-fixed frame sizes.

I agree that might be useful. However, it doesn't fit with the semantics assigned to the RTP timestamp.


B. Is it necessary to use the sampling frequency as RTP timestamp rate.

It's highly desirable.

Again, asserting this without a reason doesn't convince me.

To make a consistent definition of the RTP timestamp.

Colin, if one looks at issue B. Is it really needed to use the RTP
timestamp frequency equal to the sampling rate used? I would say NO to
that question.

Yes, it is necessary to use an RTP timestamp equal to the sampling rate.

Again, no reasons given.

Again, to avoid breaking standard RTP timing model.

My reasoning is the following.

- Many audio input is sampled from a source at a higher rate then the
encoder may handle. Thus a resampling and pre-processing stage is
employed based on the encoders input frequency rather then producing that
rate initially from the hardware. Some of the reason is that the
pre-processing may actually yield better results than what the hardware
at given input rate can gain. Another reason may be that one like to
avoid switching the hardware between rate if changing the encoding.


- The frame based decoders does not need to know the encoders input
rate. The encoder may anyway resample this into other rates for internal
processing and band limited signals. I would claim that VMR-WB, AMR-WB+
and AAC are all example of codecs that perform this kind of tricks. On
the receiver side they produce a output signal that has any sampling
frequency the receiver finds most useful. Either causing clipping of the
higher frequencies, but more commonly to a higher clock rate, despite
that no more information is provided simply for ease of use.


- The frame based codecs do only need a RTP timestamp that allows the
receiver to correctly reconstruct the time line when the encoding is done
with the most audio bandwidth. In the VMR-WB case this is 16kHz. AMR-WB+
is even more strange, as we have selected an RTP timestamp rate that
results in that all internal sampling frequencies will result in integer
timestamp ticks. Thus actually allowing one to correctly calculate frame
alignment when the internal sampling frequency changes. That the
frequency also is possible to recalculate into several common sampling
frequencies with few partial sample alignments was also considered.


Thus I would use this to argue that indicating the actual sampling
frequency is not necessarily as long as the receiver is capable of
correctly reconstruct the media stream with its timing information in
full resolution.

True, but it greatly simplifies the system if all codecs use the sampling rate as the RTP clock rate. You can make things work if each codec uses a different rate, but it's desirable that RTP is consistent where possible. Why is this codec so special that it needs to break this rule?

Again, I simply don't see how things are simplified by having the RTP timestamp rate be the sender-side sample rate. If the timestamp rate is 2x or 4x the input rate, or if it's the lowest common multiple (if you know what I mean) of a range of sample rates the encoder might use, how does that hurt the receiver?

See above.

In fact, if it _is_ possible for the encoder to use multiple rates (like VMR-WB), isn't it a lot easier on both sides if we can just us 16KHz for RTP and avoid having the play games if we want to switch input rates dynamically? Setting the timestamp at the sender end is trivial.

Also, sticking with a strict timestamp rate == sample rate dictate means that various classes of codecs won't fit well, will be harder to write and/or use with RTP, or won't be developed because the transports for them would be painful.

I agree. This is a limitation of the way RTP has evolved, using the input sampling rate for the audio timestamp. Unfortunately changing the meaning of the RTP timestamp will break things, so we're stuck with the current model.


Or, perhaps worse for the participant of this group, such codecs will push the use of alternative mechanisms such as IAX2 or other possible alternatives. I could simplify my life CONSIDERABLY by ignoring RTP and using a single multiplexed stream, with lower overhead due to piggybacked nacks and acks, and much less hassle blowing multiple holes through firewall/NATS. But for the sake of standards and compatibility I've gone through the pain of getting dealing with all the issues surrounding RTP, feedback, RTP-through-NATs, etc, and have probably hewn (much) closer to the specs than most, I suspect.

        Don't put up roadblocks where there isn't a reason to.  In this
case, there is a good argument _for this codec_ to use a fixed 16K
timestamp rate.

However, doing so would fragment the consistent RTP timing model. I agree that using a 16kHz clock for this codec solves a short term implementation issue; however it makes RTP implementations that support multiple codec more complex in the long term, and fragments the standard.


Colin


_______________________________________________ Audio/Video Transport Working Group avt at ietf.org https://www1.ietf.org/mailman/listinfo/avt