[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] alignment of layers issue
Jonathan Lennox skrev:
> Magnus Westerlund writes:
>
>> No, a receiver can detect when the reference point by comparing data
>> from previous SR packet and the present if there has been a change. So
>> in the case of multi-layer the receiver continues to use the old
>> synchronization information until it has received changes on all media
>> streams.
>
> This would mean that as a receiver of RTCP, when I receive an SR with an
> RTP/NTP mapping significantly offset from the previous one, I would need
> to figure out whether a) this is an SR from a sender which implements
> this "change the mapping only when clock skew gets too big" algorithm,
> in which case I need to tell the decoding layer to ignore this SR for
> the time being; or whether b) it's an SR generated by an implementation
> using the standard naive gettimeofday()-based implementation, in which
> case I need to proceed with it and pass it to the presentation layer as
> normal.
>
> This is certainly very different from what any existing RTCP receiver
> would do -- today, when you receive an SR, you update your clock
> association, without any magic knowledge about your peers or
> cross-session correlation.
Yes, but then you aren't really using a multi-layer codec either today
either. The added requirement of maintaining the synch between the
different layers is the thing that makes this a bit more requiring than
earlier.
>
>
> I'm also concerned that if a sender with a large clock skew is sending
> to a session with a large RTCP interval (due to low session bandwidth or
> a large sender population), it could be that the timestamp mapping would
> need to get changed more often than SRs are sent, so this algorithm
> wouldn't work.
>
I agree that there are some corner cases in this. I noticed the issue
when writing the previous email.
>
> I'm curious, also, how many existing implementations actually get this
> right. Every implementation I've found of SRs either a) uses
> gettimeofday() or equivalent, and is thus subject to the whims of system
> clock precision, or b) establishes an NTP/RTP mapping once at startup
> and never changes it, making SRs useless for lip sync if there's any
> clock skew at all.
Very good question. I am not really running any real systems out there.
For most application B is actually better than A, it might affect the
buffering badly for low delay applications. I guess most streaming
servers never care to adjust anything.
>
>
>> The reason is that you anyway need to do it. You can't handle this in
> a
>> non-layered codec that is synched with something else without these
>> mechanism, so what is the difference between synching audio and video
>> with synching within a single media between the layers.
>
> The difference is that for lip sync, time accuracy is a
> quality-of-implementation issue. A 30 ms inaccuracy in lip sync, due to
> imprecision in the sender's clocks, might produce a degraded user
> experience, but in general, the stream is still usable.
>
> However, for data alignment, timestamp accuracy is a correctness issue.
> For the layered codecs I know about, if you attempt to decode having
> mis-aligned the base and enhancement layers, the decoder will output
> unusable gibberish.
>
Yes, fully agree. The synch data between the different layers need to
really provide sample exact alignment. That is possible with SR as long
as you use a common synch point.
>
>> I do understand why if one has a single media that it appears that
>> aligning the timestamps solves all the problems. Well that isn't the
>> truth, as you anyway need to ensure that you bind the correct SSRC in
>> each session through CNAME to the right original source. So if you
> want
>> this to be possible to use you will require also SSRC alignment across
>> all sessions.
>
> Yes, as specified in RFC 3550 (section 8.3).
>
> Also, CNAME-based stream association doesn't work if a single
> participant (with a single CNAME) is sending several different
> (independent) layered streams in the same session group, e.g. from
> multiple cameras. Thomas and I discuss this in our draft.
>
Okay, I think this is a something that needs clarification. However, it
is not obvious that the different cameras shouldn't how different
CNAMEs. I think that depends if you actually can provide them in a
synchronous manner or not. You anyway need to have a mechanism to
distinguish between the layers what is belonging together.
>
>> Yes, this has been done before, however it doesn't come for free and
> has
>> a latent problem if you have sources that aren't using the layered
> codec
>> and don't think about these restrictions. So by accepting this as a
>> general rule will result in restriction to the RTP model.
>
> I don't understand this paragraph at all. This is only being proposed
> for layered codecs.
If you have payload types of this type you forces all sources in the RTP
session to take this into consideration independently if they are
multi-layer or not. The RFC 4588 (Retransmission format) that has SSRC
alignment is very unlikely to get in its RTP session multiplexing mode
to get anything else than retransmission packet.
For a media codec that is not as true.
>
>
>> So I do want to question if we really should do this restriction which
>> only provides a benefit in the single media case. As soon you have
> more
>> than one media or streams that needs synchronization you anyway need
> to
>> use RTCP.
>
> No one is suggesting not using RTCP for all the things it's currently
> used for -- lip sync, statistics reporting, etc. We're just saying that
> it has enough limitations, at least as implemented in almost all
> existing implementations, that it shouldn't be extended to this new
> domain.
>
Okay, but it seems to a large degree be a implementation issues. There
seem to be some shortcomings to clock skew adjustments for the SR. That
I think needs more analysis. But I think if we are producing documents
in this area we really should clear up the implementation issues so that
people make it work correctly. Because if you don't fix them correctly,
even with timestamp and SSRC alignment over the RTP session you will get
the error in generating the SR packet results in a erroneous clock skew
correction in a receiver. Which will also affect quality, especially for
audio.
Cheers
Magnus Westerlund
IETF Transport Area Director & TSVWG Chair
----------------------------------------------------------------------
Multimedia Technologies, Ericsson Research EAB/TVM
----------------------------------------------------------------------
Ericsson AB | Phone +46 10 7148287
Färögatan 6 | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund at ericsson.com
----------------------------------------------------------------------
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www.ietf.org/mailman/listinfo/avt