[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AVT] Problems with uRTR: draft-ietf-avt-variable-rate-audio-00.txt
Hi,
I have discussed the issue around RTP timestamp for variable sampling
rate codecs with my colleagues and also Colin to get as good idea of the
issues as possible. We also went through the exercise of to see what the
usage of an uRTR (unified RTP timestamp rate) concept would mean for the
RTP payload format for AMR-WB+. This revealed a number of issues.
Lets start with a brief explanation of the system view for codec like
AMR-WB+:
1. Input sampling (Input Sampling Rate)
->
2. encoding into frames using a specific codec internal sampling
frequency (ISF)
->
3. RTP packetization, and assigning an RTP TS value for each frame (RTP
TS rate)
->
4. Transmission
->
5. RTP Reception
->
6. Buffering
->
7. Decoding to Output Sampling Rate
->
8. Audio playout (Output Sampling Rate)
In such a codec system, the Input Sampling Frequency must not necessary
match the output sampling frequency. The audio signals bandwidth is
dependent on the selected ISF. Thus Input and output sampling frequency
should be larger then ISF.
The RTP payload and timestamp must provide the receiver with sufficient
information for:
- Recovery of decoding and playout position and order
- Intermedia synchronization
First, a common thing when using an RTP timestamp rate not matching the
input sampling rate is that the sampling instance may not be represented
by an integer timestamp value, instead it may be fractional. This can
lead to a initial offset error to another media, when starting decoding,
due to the rounding.
When one uses uRTR or a timestamp rate that results in that the
transport units, either samples or audio frames, do not have integer
timestamp tick duration is that one may get in-stream jitter. This is
due to that a frame has a duration in the RTP timestamp domain that is
fractional, the rounding error becomes the error in placing the data
correctly on the timescale.
This may not be a serious problem for frame based codecs as long as all
data arrives, as then one can run a scheme that concatenates the data to
be decoded into a correct stream. Thus the decoder output should be
correctly and unjittered stream. However if losses occurs then one may
needs to insert the data without the help of prior data to determine
what the fractional offset it, thus potentially introducing jitter in
the placement.
If I understand things correctly, the inter media synchronization error
is normally not a problem as humans are quite tolerable to offset.
However we are very sensitive to jitter within an audio stream.
The error introduced by fractional frame lengths will also have impact
on the RTP payload design. When aggregating frames for frame-based
codecs the normal RTP timestamp recovery scheme is to calculate the RTP
TS as: RTP TS value + N * <frame duration in RTP TS ticks>, where N is
the number of frames prior to this frame. However if one can't express
the frame duration in integer number of RTP ticks, then the error is
multiplied by N. Thus an error can grove to several timestamp ticks. Or
one uses an scheme that provide absolute RTP TS offset values, which
will raise the need overhead for aggregation.
For sample based codecs where the smallest unit is a sample, the
fractional error may be even harder to handle due to need for greater
precision in alignment and potential less regular borders between packets.
There also seem very hard to select a uRTR that will work well. First,
audio has two families for frequencies used:
- 8000, 16000, 24000, 32000, 48000, 96000, 192000
- 11025, 22050, 44100, 88200
The frequency span is also quite large due different applications. The
higher values of 192k and 88.2k are used in SACD and DVD-Audio and can
be expected to occur. Thus selection of a common rate within what is
practically feasible (lower than a few MHz due to the wrap around)
within RTP seem to not be possible. Thus any selected rate would most
likely result in a compromise leading to bad conversion factors for
either of the two families.
Due to these issues, I think AMR-WB+ should keep its 72kHz RTP timestamp
rate, as it provides the codec with the necessary audio frame location
on full resolution clock without jitter. It also has quite good clock
conversion factors for commonly used output frequencies:
Hz # of 72kHz ticks per left column frequency tick
8000 9
16000 4,5
24000 3
25600 2,8125
32000 2,25
44100 1,632653061
48000 1,5
Another common problem with uRTR and more free choice of the RTP
timestamp rate is the impact on the client implementations. Most
multi-media clients are driven by the audio card clock. The client
implementation uses the audio clock and to know when it needs to decode,
remove data from receiver buffer, etc. Thus RTP timestamp rates will
need to be converted to that clock and errors may arise also here.
Allowing different rates to be used on different codecs will result in
the need for handling conversion for more than one rate. Thus making
codec plug-ins a more difficult.
In conclusion, not using uRTR will more likely allow for maintained
precision of where the audio data belongs. The client implementation
will be slightly more impacted then what it would be for uRTR however I
think this might be the price we need to accept.
I would also propose that the issues around RTP timestamp rates for
audio is documented in a informational RFC. This would included both the
recommendation to use input sampling frequency when applicable. Variable
rate codec do expose limitations in RTP and these should also be
documented. Further recommendations on how to select rates, and that
these may need to be considered already in codec development should also
be part of it.
Cheers
Magnus Westerlund
Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB | Phone +46 8 4048287
Torshamsgatan 23 | Fax +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt