|
The problem to be solved can be summarised
as follows. The video encoder, or other source of coded video data, produces a
sequence of chunks of data known as NAL units. These are to be transmitted over
two or more RTP sessions. At the receiver, the data is to be put back into a
single sequence with the same order as in the original sequence from the
encoder or data source. This data is then input to the video decoder and is
decoded and output. There are other variants, where for example, the receiver
is not a decoder, but some other device such as a MANE, but the core problem of
re-establishing the original order of NAL units is the same. One of the solutions to this problem, the
CL-DON solution, allocates a monotonic increasing sequence number to each of the
NAL units from the encoder, transports these numbers through the network, and
uses these numbers to re-establish the original order of NAL units. The NAL
units received on the multiple RTP sessions are simply ordered according to
this monotonic increasing sequence of numbers. The other solution to this problem, the
classical solution, when operated as in the rules in the current version of the
draft operates as follows. The NAL units from the encoder are grouped into NAL
units and associated with a non- monotonic number (the timestamp representing
output (display) order rather than decoding order). Effectively the NAL units
are being labelled with (almost) arbitrary labels. These labelled NAL units are
then separated into multiple RTP streams, and a monotonic increasing sequence
is applied independently in each RTP stream. Note both of these steps are
performed in the CL-DON solution, but do not have to be used to restore
decoding order. At the receiver, the independent monotonic increasing sequence
numbers are used to re-order packets in each RTP stream. These are then grouped
according to label (timestamp) in each RTP stream. Then the NAL units from the
lower layers are “merged” with the NAL units in the highest
enhancement layer, grouping together NAL units with the same label (timestamp).
Finally, SEI NAL units must be moved to the start of each group (access unit),
if they were transmitted anywhere other than the base RTP session. This suffers from the need for the highest
layer to have NAL units at every time instant for which there is a NAL unit in
any lower layer. And due to the need for this process to work regardless of how
many of the RTP sessions are received, the same has to apply to any layer with
regards to the layers below it. While this can be overcome by inserting filler
data NAL units, it does seem to have a problem with packet loss, as this
situation can not be guaranteed after loss. Given that the highest layer may
often be transmitted with the least error protection, this is a major limitation
of this approach. But the classical solution can be operated
in a different way at the receiver to overcome this limitation, but with
additional complexity. As before, at the receiver, the independent monotonic
increasing sequence numbers are used to re-order packets in each RTP stream,
and then these are grouped according to label (timestamp) in each RTP stream.
Then the sequences of labels (timestamps) in each stream can be analysed, and
in many (but not all) cases, the decoding order of the labels (timestamps) can
be deduced, and then used to restore the decoding order. In the example below, the top two RTP
sessions operate at a given frame rate and the base layer is operating at half
the frame rate. Packet loss has affected one access unit in the top layer and
one access unit in the middle layer. 4
1 3 8 6 5 7 12 10 4
2 1 3 8 5 7 12 10 4
2 8
6 12 10 However, decoding order can be restored by
noticing from the middle layer that NAL units with label =2 are to be decoded
before those with label=1. Similarly, the top layer tells us that NAL units
with label =6 are to be decoded before those with label=5. But if both middle and top layers lost
their NAL units with label=2, as shown below, it would be more difficult to
re-establish decoding order as from the RTP and payload layer it is not
possible to determine if label=2 comes before or after label=1. It may be
possible to determine order by looking into pic_timing SEI messages, if present
(not guaranteed), or a best guess could be made by making assumptions based on
previous GOP structures (and the order of timestamps). Alternatively it may be
better to discard all NAL units with labels 1 and 3 rather than to risk feeding
data to the decoder in the wrong order. 4
1 3 8 6 5 7 12 10 4
1 3 8 5 7 12 10 4
2 8
6 12 10 My conclusion is that while using a
non-monotonic set of numbers (timestamps) to re-establish decoding order is
possible in many but not all cases, it is a fairly complex process,
particularly if it is to make best use of all packets received when some are
lost, as in the second method above. And in practice I feel that the second
method would be implemented because the performance of the first in the case of
packet loss could be unacceptably poor. The major weakness of the CL-DON method is
that it is not backwards compatible with the single NAL unit mode of RFC 3984. One way to overcome this would be to use
some backward compatible mechanism to transport the CL-DON information in the
base RTP session operating in single NAL unit mode. The RTP header extension
mechanism is one way that this could be done, but I know that there are
objections to doing this. However, the single NAL unit mode was
introduced into RFC 3984 primarily “for low-delay applications that are
compatible with systems using ITU-T Recommendation H.241”. Hence, if there is a need for backwards
compatibility with the single NAL unit mode, and this is itself very debatable,
then this need would seem to be restricted to low delay applications, where it
is unlikely that access units would be encoded in a different order to output
(display) order. Consequently, a solution to the whole
problem of restoring the decoding order of NAL units is define a class of
receiver that supports the full CL-DON method, and the classical method
restricted to cases where the timestamps are monotonic increasing. This restricted
case of the classical method is much simpler to implement than the general
case, and provides backwards compatibility with the intended applications of
the single NAL unit mode. Best regards Mike Mike Nilsson Sirius House (B54-MH), Room 92 |
_______________________________________________ Audio/Video Transport Working Group avt at ietf.org http://www.ietf.org/mailman/listinfo/avt