[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] VMR-WB RTP Payload and Storage Formats
sassan.ahmadi at nokia.com writes:
>In the one hand, based on the distinctive capability of VMR-WB, there are
>people who want to use a fixed RTP clock rate of 16000 Hz to enable
>processing/injection of the 8000 Hz sampled media. Note that 8000 and
>16000 Hz sampled media have identical VMR-WB output frames. I believe
>there is technically nothing wrong and revision -04 of the I-D
>appropriately addresses this concern.
This is a very relevant point - the decoder doesn't need to know
the input sample rate, and certainly doesn't need to know it from the timestamp.
>On the other hand, you persist on your opinion that RTP clock rate must be
>identical to the input media sampling rate regardless of the codec
>capabilities.
>
>The following excerpt from Section 4.1 of RFC 3551 (line 434)
>
>"...
> The RTP clock rate used for generating the RTP timestamp is
> independent of the number of channels and the encoding; it usually
> equals the number of sampling periods per second. For N-channel
> encodings, each sampling period (say, 1/8,000 of a second) generates
> N samples. (This terminology is standard, but somewhat confusing, as
> the total number of samples generated per second is then the sampling
> rate times the channel count.)
>..."
>
>indicates that (there is no normative language here) the RTP clock rate
>"usually" equals the input media sampling rate and that it is independent
>of the encoding.
As Colin has indicated, this is the traditional usage. However, I
(personally) see no reason for this other that tradition, and Colin's
response to me didn't include any significant reason other than tradition
(I'll respond to that shortly in detail).
>Also the following excerpt from Section 6.4.4 of RFC 3550 (line 2391)
>
>"...
> Since that timestamp is
> independent of the clock rate for the data encoding, it is possible
> to implement encoding- and profile-independent quality monitors.
>..."
>
>Therefore, you have no technical ground to assert that RTP clock rate MUST
>be equal to input media sampling frequency.
In fact, in Colin's defense, I don't think Colin asserted it "MUST"
be equal to the sampling frequency, but instead he asserted (very strongly)
that it should be; that he saw no reason to break the tradition. I (and
apparently others such as Sassan) do.
>Please think of VMR-WB as a dual-rate system where both 8000 and 16000 Hz
>sampled media are supported and that decoding can proceed without knowing
>the input media sampling frequency.
And, if you do need to know the sampling frequency to decode, it
should either be part of the encoding or part of the SDP/etc that creates
the stream - and not the timestamp frequency portion of the SDP.
As I see it, there are two primary uses of the RTP timestamp
and it's frequency:
1) Playback timing
Knowing when a frame was sampled (the time the first sample of the frame
was) is important for playback, especially if the codec can omit samples
for some reason without otherwise indicating it. You can infer lost
packets from sequence numbers as well, but there isn't a guaranteed
sequence number -> time equation for many codecs.
Note that the time of the first sample of the frame can be in any units
of sufficient resolution. It's handy if it's an integer multiple of the
actual sampling rate (1->N times), but even that might not be important
given a high enough rate, and for certain non-audio codecs that's
already the norm (video).
2) Stream synchronization
Combined with RTCP with NTP times to synchronize streams. Again, the
same issues are above for playback; in this case higher multiples of
the sampling rate might have an advantage in some cases.
For example, when trying to synchronize multiple audio streams for a mix
for music, using a higher timestamp rate when possible may allow tighter
synchronization and less chance of introducing phasing issues. If
you're synchronizing an 250Hz-samples LFE stream to video at 60fps, or
much worse to multi-channel audio at 44KHz, you don't want the
timestamps in the LFE stream to be 250Hz - that might allow the LFE
channel to be as much as 2ms out-of-phase with the channels carrying the
higher frequencies. Now this might not matter a huge deal - but you
certainly aren't _gaining_ anything by restricting the LFE timestamps to
250Hz. It may become worse when you're trying to merge multiple streams
with 250Hz-samples audio coming from different sources; the phase issues
may become quite apparent.
It's like my old video character-generation example - even with a
display that can handle at most 350 "lines" of analog video across,
using a circa-1400 pixel source to generate the analog produces better
output because you can control edge _positions_ more finely, even if you
can't make a sharper edge transition or (same thing) go from one
color/brightness to another any quicker. It's an easy thing to forget
about when you're used to working in the digital domain.
I thought I had a third, but I think it's covered by those.
Note that for video, current codecs generally use a 90 KHz timestamp rate.
This is definitely not the actual sample rate of most/any inputs. That's
circa 3000 ticks per 30Hz video frame, way under even QCIF sample rates (if
you're thinking video pixels). This 90Khz rate works for video because it
has _enough_ accuracy for the purpose. Codec/etc writers have to assume
there isn't a perfect 1:1 correlation of timestamp to time-of-first-sample.
Given that RTP is not designed for easy switching of timestamp rates on a
single stream, using a single fixed "high enough" timestamp rate for a
codec with multiple input rates makes a ton of sense to me. It definitely
simplifies intermediate pieces and simplifies receivers as well - given the
apparent complexity of dealing with multiple on-the-fly timestamp rates.
--
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team
rjesup at wgate.com
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt