[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] alignment of layers issue



Magnus Westerlund writes:
 
> No, a receiver can detect when the reference point by comparing data
> from previous SR packet and the present if there has been a change. So
> in the case of multi-layer the receiver continues to use the old
> synchronization information until it has received changes on all media
> streams.

This would mean that as a receiver of RTCP, when I receive an SR with an
RTP/NTP mapping significantly offset from the previous one, I would need
to figure out whether a) this is an SR from a sender which implements
this "change the mapping only when clock skew gets too big" algorithm,
in which case I need to tell the decoding layer to ignore this SR for
the time being; or whether b) it's an SR generated by an implementation
using the standard naive gettimeofday()-based implementation, in which
case I need to proceed with it and pass it to the presentation layer as
normal.

This is certainly very different from what any existing RTCP receiver
would do -- today, when you receive an SR, you update your clock
association, without any magic knowledge about your peers or
cross-session correlation.


I'm also concerned that if a sender with a large clock skew is sending
to a session with a large RTCP interval (due to low session bandwidth or
a large sender population), it could be that the timestamp mapping would
need to get changed more often than SRs are sent, so this algorithm
wouldn't work.


I'm curious, also, how many existing implementations actually get this
right.  Every implementation I've found of SRs either a) uses
gettimeofday() or equivalent, and is thus subject to the whims of system
clock precision, or b) establishes an NTP/RTP mapping once at startup
and never changes it, making SRs useless for lip sync if there's any
clock skew at all.


> The reason is that you anyway need to do it. You can't handle this in
a
> non-layered codec that is synched with something else without these
> mechanism, so what is the difference between synching audio and video
> with synching within a single media between the layers.

The difference is that for lip sync, time accuracy is a
quality-of-implementation issue.  A 30 ms inaccuracy in lip sync, due to
imprecision in the sender's clocks, might produce a degraded user
experience, but in general, the stream is still usable.

However, for data alignment, timestamp accuracy is a correctness issue.
For the layered codecs I know about, if you attempt to decode having
mis-aligned the base and enhancement layers, the decoder will output
unusable gibberish.


> I do understand why if one has a single media that it appears that
> aligning the timestamps solves all the problems. Well that isn't the
> truth, as you anyway need to ensure that you bind the correct SSRC in
> each session through CNAME to the right original source. So if you
want
> this to be possible to use you will require also SSRC alignment across
> all sessions.

Yes, as specified in RFC 3550 (section 8.3).

Also, CNAME-based stream association doesn't work if a single
participant (with a single CNAME) is sending several different
(independent) layered streams in the same session group, e.g. from
multiple cameras.  Thomas and I discuss this in our draft.


> Yes, this has been done before, however it doesn't come for free and
has
> a latent problem if you have sources that aren't using the layered
codec
> and don't think about these restrictions. So by accepting this as a
> general rule will result in restriction to the RTP model.

I don't understand this paragraph at all.  This is only being proposed
for layered codecs.

 
> So I do want to question if we really should do this restriction which
> only provides a benefit in the single media case. As soon you have
more
> than one media or streams that needs synchronization you anyway need
to
> use RTCP.

No one is suggesting not using RTCP for all the things it's currently
used for -- lip sync, statistics reporting, etc.  We're just saying that
it has enough limitations, at least as implemented in almost all
existing implementations, that it shouldn't be extended to this new
domain.

-- 
Jonathan Lennox
Vidyo, Inc
jonathan at vidyo.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www.ietf.org/mailman/listinfo/avt