[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] Problems with uRTR: draft-ietf-avt-variable-rate-audio-00.txt



Hi,

I have discussed the issue around RTP timestamp for variable sampling rate codecs with my colleagues and also Colin to get as good idea of the issues as possible. We also went through the exercise of to see what the usage of an uRTR (unified RTP timestamp rate) concept would mean for the RTP payload format for AMR-WB+. This revealed a number of issues.

Lets start with a brief explanation of the system view for codec like AMR-WB+:

1. Input sampling (Input Sampling Rate)
->
2. encoding into frames using a specific codec internal sampling frequency (ISF)
->
3. RTP packetization, and assigning an RTP TS value for each frame (RTP TS rate)
->
4. Transmission
->
5. RTP Reception
->
6. Buffering
->
7. Decoding to Output Sampling Rate
->
8. Audio playout (Output Sampling Rate)


In such a codec system, the Input Sampling Frequency must not necessary match the output sampling frequency. The audio signals bandwidth is dependent on the selected ISF. Thus Input and output sampling frequency should be larger then ISF.

The RTP payload and timestamp must provide the receiver with sufficient information for:
- Recovery of decoding and playout position and order
- Intermedia synchronization


First, a common thing when using an RTP timestamp rate not matching the input sampling rate is that the sampling instance may not be represented by an integer timestamp value, instead it may be fractional. This can lead to a initial offset error to another media, when starting decoding, due to the rounding.

When one uses uRTR or a timestamp rate that results in that the transport units, either samples or audio frames, do not have integer timestamp tick duration is that one may get in-stream jitter. This is due to that a frame has a duration in the RTP timestamp domain that is fractional, the rounding error becomes the error in placing the data correctly on the timescale.

This may not be a serious problem for frame based codecs as long as all data arrives, as then one can run a scheme that concatenates the data to be decoded into a correct stream. Thus the decoder output should be correctly and unjittered stream. However if losses occurs then one may needs to insert the data without the help of prior data to determine what the fractional offset it, thus potentially introducing jitter in the placement.

If I understand things correctly, the inter media synchronization error is normally not a problem as humans are quite tolerable to offset. However we are very sensitive to jitter within an audio stream.

The error introduced by fractional frame lengths will also have impact on the RTP payload design. When aggregating frames for frame-based codecs the normal RTP timestamp recovery scheme is to calculate the RTP TS as: RTP TS value + N * <frame duration in RTP TS ticks>, where N is the number of frames prior to this frame. However if one can't express the frame duration in integer number of RTP ticks, then the error is multiplied by N. Thus an error can grove to several timestamp ticks. Or one uses an scheme that provide absolute RTP TS offset values, which will raise the need overhead for aggregation.

For sample based codecs where the smallest unit is a sample, the fractional error may be even harder to handle due to need for greater precision in alignment and potential less regular borders between packets.

There also seem very hard to select a uRTR that will work well. First, audio has two families for frequencies used:
- 8000, 16000, 24000, 32000, 48000, 96000, 192000
- 11025, 22050, 44100, 88200


The frequency span is also quite large due different applications. The higher values of 192k and 88.2k are used in SACD and DVD-Audio and can be expected to occur. Thus selection of a common rate within what is practically feasible (lower than a few MHz due to the wrap around) within RTP seem to not be possible. Thus any selected rate would most likely result in a compromise leading to bad conversion factors for either of the two families.

Due to these issues, I think AMR-WB+ should keep its 72kHz RTP timestamp rate, as it provides the codec with the necessary audio frame location on full resolution clock without jitter. It also has quite good clock conversion factors for commonly used output frequencies:

Hz	# of 72kHz ticks per left column frequency tick
8000	9
16000	4,5
24000	3
25600	2,8125
32000	2,25
44100	1,632653061
48000	1,5

Another common problem with uRTR and more free choice of the RTP timestamp rate is the impact on the client implementations. Most multi-media clients are driven by the audio card clock. The client implementation uses the audio clock and to know when it needs to decode, remove data from receiver buffer, etc. Thus RTP timestamp rates will need to be converted to that clock and errors may arise also here. Allowing different rates to be used on different codecs will result in the need for handling conversion for more than one rate. Thus making codec plug-ins a more difficult.

In conclusion, not using uRTR will more likely allow for maintained precision of where the audio data belongs. The client implementation will be slightly more impacted then what it would be for uRTR however I think this might be the price we need to accept.

I would also propose that the issues around RTP timestamp rates for audio is documented in a informational RFC. This would included both the recommendation to use input sampling frequency when applicable. Variable rate codec do expose limitations in RTP and these should also be documented. Further recommendations on how to select rates, and that these may need to be considered already in codec development should also be part of it.

Cheers

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com


_______________________________________________ Audio/Video Transport Working Group avt at ietf.org https://www1.ietf.org/mailman/listinfo/avt