[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] RTP: explanation of "sampling instant"



Here is the second edit to the RTP spec.  In Section 5.1, I've added
text at the end of the specification of the "timestamp" field in the
RTP header to explain the notion of "sampling instant" better.  Do you
think this helps or hinders?  Should I cut the last paragraph?  I
included it to give an idea how sampling relates to prerecorded media.

                                                        -- Steve

OLD:
        timestamp: 32 bits
             The timestamp reflects the sampling instant of the first
             octet in the RTP data packet.  The sampling instant MUST be
             derived from a clock that increments monotonically and
             linearly in time to allow synchronization and jitter
             calculations (see Section 6.4.1).  The resolution of the
             clock MUST be sufficient for the desired synchronization
             accuracy and for measuring packet arrival jitter (one tick
             per video frame is typically not sufficient).  The clock
             frequency is dependent on the format of data carried as
             payload and is specified statically in the profile or
             payload format specification that defines the format, or
             MAY be specified dynamically for payload formats defined
             through non-RTP means.  If RTP packets are generated
             periodically, the nominal sampling instant as determined
             from the sampling clock is to be used, not a reading of the
             system clock.  As an example, for fixed-rate audio the
             timestamp clock would likely increment by one for each
             sampling period.  If an audio application reads blocks
             covering 160 sampling periods from the input device, the
             timestamp would be increased by 160 for each such block,
             regardless of whether the block is transmitted in a packet
             or dropped as silent.

             The initial value of the timestamp SHOULD be random, as for
             the sequence number.  Several consecutive RTP packets will
             have equal timestamps if they are (logically) generated at
             once, e.g., belong to the same video frame.  Consecutive
             RTP packets MAY contain timestamps that are not monotonic
             if the data is not transmitted in the order it was sampled,
             as in the case of MPEG interpolated video frames.  (The
             sequence numbers of the packets as transmitted will still
             be monotonic.)

NEW ADDITION:
             The sampling instant is chosen as the point of reference
             for the RTP timestamp because it is known to the
             transmitting endpoint and has a common definition for all
             media independent of encoding delays or other processing.
             The purpose is to allow synchronized presentation of all
             media sampled at the same time.  Comparing RTP timestamps
             from different media is not effective because they have
             independent random offsets and may advance at different
             rates.  Instead, for each medium the RTP timestamp is
             related to the sampling instant by pairing it with a
             timestamp from a reference clock (wallclock) that
             represents the time when the data corresponding to the RTP
             timestamp was sampled.  The reference clock is shared by
             all media to be synchronized.  The timestamp pairs are not
             transmitted in every data packet, but at a lower rate in
             RTCP SR packets as described in Section 6.4.

             Applications transmitting stored data rather than data
             sampled in real time typically use a virtual clock to
             determine when the next frame or other unit of each medium
             in the stored data should be presented.  In this case, the
             sampling instant would be the presentation time for each
             unit, and it would be related to the wallclock time at
             which the unit was determined to be ready for presentation.

             Live and prerecorded media can also be synchronized, for
             example in a live audio narration of prerecorded video.
             The video would be presented locally to the narrator as
             well as transmitted using RTP.  The sampling instant of the
             video would be established by referencing the timestamp of
             the video frame to the wallclock time when the video frame
             is presented locally and the corresponding audio is
             sampled.

_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt