[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: TS clarification (was: [rohc] The discussion on slope(s) (red ux))
Hi,
Again, below is my proposal on the clarification of TS encoding.
It's essentially the same proposal I sent out back in April 2004.
I made some updates based on comments/questions from Carsten and
Kristofer (thanks).
I also added an example on AMR SID frames. Pawel mentioned SID
frames also back in April last year, but the message was ignored:
http://www1.ietf.org/mail-archive/web/rohc/current/msg02138.html
The text is a bit lengthy. But I think it's important to make it
as clear as possible, so that people can really look into the
bottom of the issue.
As always, your comments/questions are welcome.
BR, Zhigang
=================================================================
As it is described clearly in section 4.5.3 and other places
in RFC 3095 (4.5.4, 4.5.7, 5.7, 5.7.5, 5.7.5.2), TS_STRIDE is the
factor by which an RTP Timestamp is downscaled before compression.
Specifically, when scaling is used for TS, two steps are
performed by the compressor: 1) downscale of the original
TS to TS_SCALED; and 2) compression of TS_SCALED.
Another way to look at this: instead of compressing a sequence
of original TS, the compressor will compress (i.e. going through
FO/SO states, etc.) a sequence of TS_SCALED as if they were unscaled.
The only difference is that the compressor "pre-processes"
the original TS as specified in Step 2 in section 4.5.3 and
the decompressor derives the original unscaled TS as specified in
Step 3. Of course, both the compressor and decompressor must also
handle the wraparound case as described in Step 4 and 5.
It should be therefore clear that TS_STRIDE and slope are two
separate and independent ROHC parameters. In short, TS_STRIDE is
for scaling and slope is for SO state. To elaborate further:
- TS_STRIDE is a parameter independent of states, i.e., it applies
all the time regardless of FO or SO state. To the contrary, slope(TS)
or slope(TS_SCALED) is a parameter only used for SO state, as part
of the linear function from SN to TS or TS_SCALED.
- TS_STRIDE is meaningful only for the original unscaled TS, while
slope applies to both scaled and unscaled TS.
- TS_STRIDE cannot be 0 (see section 4.5.3) while slope(TS) or
slope(TS_SCALED) can.
- Although in certain scenario (see Example 1 below), TS_STRIDE
and the slope of unscaled TS may have the same value, it is not
always the case.
- One example is the TS for a video stream, where TS_STRIDE can
be set to 3000 while slope(TS_SCALED) can be set to 0 (see
Example 3 below).
- Another example is Adaptive Multi-rate (AMR) speech codec
(see Example 2 below). AMR is a mandatory codec by 3GPP for 3G
cellar systems, which is the main driver for and application
target of ROHC. During a silence period, AMR codec will generate
Silence Descriptor (SID) frames every 8*20=160 ms, instead of 20 ms
for regular speech frames in a talkspurt period. That means the TS
change between two RTP packets with consecutive SN will be
160 (=8000KHz*0.020sec) for talkspurt periods and 1280(=8000KHz*0.160sec)
for silence periods. In this case, the compressor can set
TS_STRIDE = 160 all the time to allow TS scaling, while set
slope(TS_SCALED) to 1 during talkspurt periods and 8 during
silence periods. This enables the compressor to enter SO states
during both types of periods without having to establish a new
TS_STRIDE value (which means sending the original unscaled TS together
with the new TS_STRIDE value in Extension 3, see section 4.5.7
and 5.7.5). Note that an AMR audio stream generated in a typical
voice conversation usually alternates between talkspurt and silence
periods since only one party speaks at any time.
The major source of confusion on this subject is the definition
of TS_STRIDE in section 2, which is inaccurate and contradicts the
text of section 4.5.3. It may be interpreted as TS_STRIDE is _always_
equal to the increase in TS value between two RTP packets with
consecutive sequence number, i.e., TS_STRIDE is the slope of TS.
This is incorrect and not the intent of the specification, as
evidenced by the description and usage of TS_STRIDE in section 4.5.3
and other sections. Unfortunately, this interpretation went uncaught
both on paper and in past interoperability tests because the
TS_STRIDE and slope(TS) do have the same value in case of certain
RTP packet streams (see above).
Another source of confusion is that although slope is described
in section 5.5.1.2 and 5.7, the text is not as clear as it should
be in terms of implementation. (One reason is that the authors of
RFC 3095 were under pressure to make the document concise, which
unfortunately left out some useful details in retrospect.) To clarify,
slope has the general meaning in mathematics. Namely, slope(X) equals
the change of a non-SN field X between two packets divided by the
change of SN field between the same two packets. It is one of the
two parameters (the other one being offset) that define a linear
function from SN to a header field X. The function can also be
visualized as line segments, with SN being the horizontal coordinate
and the field X being the vertical coordinate. In the case of RTP
timestamp compression, field X can be either TS or TS_SCALED.
Following is the decompressor logic on how to calculate the
slope(TS). Obviously, integer arithmetic is used here.
Note that the exact implementation may be different as long
as it generates the same result. Also note that the same logic
applies to the scaled TS, by simply replacing TS with TS_SCALED.
1) Initialize slope(TS) as specified in section 5.7.
Note: for the case of scaled TS, the default-slope(TS) below
means default-slope(TS_SCALED). It should be understood
that RFC 3095 uses TS to mean either unscaled or scaled
RTP Timestamp as it is clear based on the context.
If value(Tsc) = 1, Scaled RTP Timestamp encoding is used before
compression (see section 4.5.3), and default-slope(TS) = 1.
2a) When receiving more than one update packets that carry TS bits
before receiving Type 0 packets, calculate slope(TS) and offset(TS)
as follows:
slope(TS) = (TS2 - TS1) / (SN2 - SN1)
offset(TS) = TS2 - SN2 * slope(TS)
where {SN1, TS1} and {SN2, TS2} are the SN and TS values of the last
two context-update packets it has received (and ACKed if in R-mode).
2b) When receiving only one update packet that carries TS before
receiving Type 0 packets, slope(TS) is unchanged. Note that if this
is the first time the decompressor receives a Type 0 packet after
context initialization, this means slope(TS) = default-slope(TS).
3) The decompression of TS for a Type 0 packet N is straightforward as
described in RFC 3095, i.e., TS(N) = SN(N) * slope(TS) + offset(TS).
Or equivalently, TS(N) = (SN(N) - SN(U)) * slope(TS) + TS(U), where
U denotes the last received context-update packet. Note that for the
latter approach, the decompressor does not need to calculate offset(TS)
as shown above.
The compressor simply follows the logic described in RFC 3095.
Namely, it must ensure the decompressor has acquired the function
from SN to TS (or TS_SCALED in case of scaled TS) for current string
before sending any Type 0 packets.
To help implementers, following are some examples to illustrate how
TS encoding works exactly. A "X" mark in the CU-PKT row means the
corresponding packet is a context-update packet. All other packets
can be Type 0 packets. Note that for simplicity, the examples assume
no packet loss between the source and ROHC compressor, and between
the ROHC compressor and decompressor. For the same reason, the examples
assume TS_STRIDE has already been synchronized. Also, R-mode is assumed
here but the same examples apply to UO-mode with only minor changes.
Example 1:
(an audio stream with TS_STRIDE = 160 and TS_OFFSET = 40; no packets
transmitted during silence periods. String 1 and String 2 correspond to
two talkspurts. String 1 is the first string for this context.)
<----- String 1 ------> <----- String 2 ------->
SN 11 12 13 14 ... 60 61 62 63 64 ...
CU-PKT x x
TS 200 360 520 680 ... 8040 16040 16200 16360 16520 ...
TS_SCALED 1 2 3 4 ... 50 100 101 102 103 ...
slope <---------- 1 ---------> <---------- 1 --------->
offset <-------- (-10) -------> <--------- (39) ------->
Note: only 1 context-update packet is needed for each of String 1
and String 2, because the slope(TS_SCALED) is same as
default-slope(TS_SCALED).
Note: to give an example, packet 61 can be an UOR-2-TS with
Extension 0, which is 4 bytes in total excluding CID. Its
TS field (in both base and extension) carries 01100100,
i.e., the 8 LSBs of 100.
Example 2:
(an AMR audio stream with TS_STRIDE = 160 and TS_OFFSET = 40; SID
frames transmitted during silence periods. String 1 corresponds
to a talkspurt while String 2 corresponds to a silence period
following the talkspurt. String 1 is the first string for this
context.)
<----- String 1 ------> <---- String 2 ------->
SN 11 12 13 14 ... 60 61 62 63 64 ...
CU-PKT x x x
TS 200 360 520 680 ... 8040 8520 9800 11080 12360 ...
TS_SCALED 1 2 3 4 ... 50 53 61 69 77 ...
slope <---------- 1 ---------> <--------- 8 --------->
offset <-------- (-10) -------> <------ (-435) ------->
Note: two context-update packets are needed for String 2
since the slope(TS_SCALED) changes from 1 to 8. Packet
61 and 62 can be UOR-2-TS with Extension 0.
Example 3:
(a video stream with TS_STRIDE = 3000 and TS_OFFSET = 1000;
String 1 is the first string for this context. )
<------ String 1 ------> <--- String 2 -->
SN 11 12 13 ... 25 26 27 28 ...
CU-PKT x x x
TS 4000 4000 4000 ... 4000 7000 7000 7000 ...
TS_SCALED 1 1 1 ... 1 2 2 2 ...
slope <-------- 0 ---------> <------ 0 ------->
offset <--------- 1 ---------> <------ 2 ------->
Note: two context-update packets are needed for String 1
because the slope(TS_SCALED) is different from the
default-slope(TS_SCALED).
Note: packet 26 can be a UOR-2-TS (3 bytes).
One concern has been raised about a possible worst case scenario,
in which a loss of synchronization on slope may remain undetected
across multiple strings - essentially a damage propagation. For
instance, if packet 11 in Example 3 contains a residual error
(i.e. error not detected by link layer) and that error is not
detected by ROHC CRC either, it will result in an incorrect SN or
TS or both on the decompressor side. That in turn will likely
(but not necessarily) lead to an incorrect slope when packet 11
is used during the calculation (see decompressor logic above).
If the compressor does not update the slope in subsequent strings
by sending two context-update packets AND the strings are too
short to force the compressor to send R-0-CRC packets, the
decompressor may keep using the incorrect slope in those strings
to decompress R-0 packets and thus cause damage propagation.
At a first glance, this is a perfect disaster scenario which
may mean (in theory) all of subsequent R-0 packets in the entire
ROHC session will be damaged. However, a deeper look turns
out that it is very unlikely to happen because it has to survive
through a series of defense lines and the very cause why it
was started in the first place:
1) The expected residual error rate is low. It is recommended
that lower layers deploy error detection for ROHC headers
and do not deliver ROHC headers with high residual error
rates. Although no hard limit on the rates was given,
1E-5 was referred to as the ROHC design target.
2) UOR-2 packets carry stronger 7-bit CRC (instead of weaker 3
bits). So the chance of a residual error not being detected by
ROHC CRC is very low (< 1/128).
3) Even if the error escaped the CRC, an incorrect (SN, TS)
does not necessarily lead to an incorrect slope. For example,
if SN of packet 11 in Example 3 is incorrectly decompressed
as 10, the decompressor will still derive slope = 0.
4) The residual error that planted the incorrect slope into
decompressor context will face 7-bit CRC validation in the
subsequent UOR-2 packets . In the above example, decompression
of packet 12 will use packet 11 as reference. Therefore, the
CRC in packet 12 will likely to catch the errors in (SN, TS) of
packet 11. Similarly, packet 26 will check on packet 12, and
so on. Therefore, it is very likely that the decompressor will
detect context damage (which injected incorrect slope) and start
a context recovery procedure which will stop damage propagation.
5) Going back to the root, the problem is significant only if
the residual error rate at the link layer is high. However, if
that is the case, a residual error will likely to happen again on
a UOR-2 packet after the one that introduced the incorrect slope.
When that happens, it is very likely (probability > 127:1) that
the new error will be caught by ROHC CRC. Again, this will
trigger a context recovery procedure and stop damage propagation.
Above analysis shows that the probability of damage propagation
across multiple strings caused by incorrect slope is extremely low.
Nevertheless, there are a few options available if an implementation
is really concerned about any possibility of that being happening:
- The compressor sends a R-0-CRC packet during SO state even if
it does not have to.
- The compressor updates the slope(TS) or slope(TS_SCALED) even
if the slope does not change. In fact, an implementation probably
need to do this anyway, for the robustness against packet loss.
For example, the compressor may send 2 UOR-2 packets even though
only 1 UOR-2 packet is needed by the decompressor. The extra
packet introduce redundancy against possible packet loss.
=============================================================
_______________________________________________
Rohc mailing list
Rohc at ietf.org
https://www1.ietf.org/mailman/listinfo/rohc