[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AVT] RTP: explanation of "sampling instant"
Here is the second edit to the RTP spec. In Section 5.1, I've added
text at the end of the specification of the "timestamp" field in the
RTP header to explain the notion of "sampling instant" better. Do you
think this helps or hinders? Should I cut the last paragraph? I
included it to give an idea how sampling relates to prerecorded media.
-- Steve
OLD:
timestamp: 32 bits
The timestamp reflects the sampling instant of the first
octet in the RTP data packet. The sampling instant MUST be
derived from a clock that increments monotonically and
linearly in time to allow synchronization and jitter
calculations (see Section 6.4.1). The resolution of the
clock MUST be sufficient for the desired synchronization
accuracy and for measuring packet arrival jitter (one tick
per video frame is typically not sufficient). The clock
frequency is dependent on the format of data carried as
payload and is specified statically in the profile or
payload format specification that defines the format, or
MAY be specified dynamically for payload formats defined
through non-RTP means. If RTP packets are generated
periodically, the nominal sampling instant as determined
from the sampling clock is to be used, not a reading of the
system clock. As an example, for fixed-rate audio the
timestamp clock would likely increment by one for each
sampling period. If an audio application reads blocks
covering 160 sampling periods from the input device, the
timestamp would be increased by 160 for each such block,
regardless of whether the block is transmitted in a packet
or dropped as silent.
The initial value of the timestamp SHOULD be random, as for
the sequence number. Several consecutive RTP packets will
have equal timestamps if they are (logically) generated at
once, e.g., belong to the same video frame. Consecutive
RTP packets MAY contain timestamps that are not monotonic
if the data is not transmitted in the order it was sampled,
as in the case of MPEG interpolated video frames. (The
sequence numbers of the packets as transmitted will still
be monotonic.)
NEW ADDITION:
The sampling instant is chosen as the point of reference
for the RTP timestamp because it is known to the
transmitting endpoint and has a common definition for all
media independent of encoding delays or other processing.
The purpose is to allow synchronized presentation of all
media sampled at the same time. Comparing RTP timestamps
from different media is not effective because they have
independent random offsets and may advance at different
rates. Instead, for each medium the RTP timestamp is
related to the sampling instant by pairing it with a
timestamp from a reference clock (wallclock) that
represents the time when the data corresponding to the RTP
timestamp was sampled. The reference clock is shared by
all media to be synchronized. The timestamp pairs are not
transmitted in every data packet, but at a lower rate in
RTCP SR packets as described in Section 6.4.
Applications transmitting stored data rather than data
sampled in real time typically use a virtual clock to
determine when the next frame or other unit of each medium
in the stored data should be presented. In this case, the
sampling instant would be the presentation time for each
unit, and it would be related to the wallclock time at
which the unit was determined to be ready for presentation.
Live and prerecorded media can also be synchronized, for
example in a live audio narration of prerecorded video.
The video would be presented locally to the narrator as
well as transmitted using RTP. The sampling instant of the
video would be established by referencing the timestamp of
the video frame to the wallclock time when the video frame
is presented locally and the corresponding audio is
sampled.
_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt