[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [AVT] Clock skew and RTP



I think you have to think about all the clocks that may be involved here. There may be as many as 3 on each end: the NTP clock, the audio sampling clock, and the video sampling clock.

a) The audio sampling clock on both transmitter and receiver may be free-running with respect to the NTP clock on that system (and usually is).

b) The video clock usually isn't; most people seem to drive video frame rates from the same clock as NTP, but they might take monitor refresh, which wouldn't be sync'd to whatever clock they are using for NTP.

c) The source and destination NTP clocks may also drift with respect to each other.

Note that NTP is only the format of the clocks in RTCP; it's not the required source (which is just as well, for systems which don't have the ability to sync over a network, such as an RTP camera). And even if NTP is used, many systems correct only daily or thereabouts, and free-run in between.

Generally, there is no assurance that an RTP packet was transmitted at its 'RTP timestamp' (because of traffic smoothing). However, the spec. does require that the RTCP sender reports be 'at' their NTP timestamp. A long low-pass filter on these to remove network jitter can help you track the source clock. Buffer fullness might provide such a filter, expect that source transmission rates etc. can vary (i.e. your buffer may fill when the playout is easy and consuming few bytes, and the source decides to send ahead keeping the network full).

Likewise the sender reports tell you when the source audio sampling is not quite what it claims. You can compare the NTP and RTP differences and see if the sample count is right for the elapsed interval.

Then there is the question of what you do once you know the sender clock is running faster than you expect (or slower). Unless you are lucky and there are frequent silence intervals (common in telephony but rare in streaming), whereupon you can adjust during silence, you can amalgamate the two drift rates together and then adjust your own audio playout rate (either in your hardware or sample rate converter) to cope. You might think that this introduces a pitch shift, but of course this isn't true; the source *claimed* to be sampling at say 22.050 on his clock, but in real physical time that is a different rate on your clock. Actually, you are avoiding pitch shift.

You have to keep doing this. Temperature or other effects cause cheap clocks to wander. Generally it seems better to drive playout from the audio clock, as minor variations in video frame timing are imperceptible, whereas audio drop-outs (or attempted frame overlap) are not.
--
David Singer
Apple Computer/QuickTime
_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt