[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] Comments on draft-ietf-avt-rtp-svc-07.txt
Thanks a lot to Mike for another round of careful review and good
comments. Please see my replies inline, all started with "[YK]".
Note that I have left some comments (on enhancement RTP session
definition and text for "the classical decoding order recovery mode")
for my co-author Thomas.
BR, YK
________________________________
From: ext mike.nilsson at bt.com
[mailto:mike.nilsson at bt.com]
Sent: Friday, February 08, 2008 6:01 PM
To: roni.even at polycom.co.il; Wang Ye-Kui
(Nokia-NRC/Tampere); jonathan at vidyo.com; schierl at hhi.fhg.de;
csp at csperkins.org; Yann.Leprovost at alcatel-lucent.fr; stewe at stewe.org;
tom at vidyo.com; tom.taylor at rogers.com; rjesup at wgate.com
Cc: avt at ietf.org
Subject: Comments on draft-ietf-avt-rtp-svc-07.txt
I have read this draft up to the end of section 8, and
have the following comments. Mostly they are about clarifications and
editorial improvements to the text.
I have not included any comments on the major open issue
of cross layer decoding order dependency, which I hope to be able to
send soon in a separate e-mail.
1) Introduction, page 5
I think the document could be improved by moving (some
of) the information in the scope section, which appears on pages 11 and
12, closer to the front of the document. After the third paragraph of
the introduction, some text could be added (using what is in the scope
section) to state such things as:
this specification allows NAL units to be encapsulated
into one or more RTP sessions;
this specification supports the use of the interleaved
and non-interleaved packetisation modes of RFC3984, but not the single
NAL unit mode;
when NAL units are encapsulated into more than one RTP
session, different packetisation modes can be used in each session.
Then the current fourth paragraph of the introduction
makes more sense, especially if the wording is slightly modified to make
it clear that it applies to the case of more than one session and
interleaved mode not being used on all sessions. Note the first sentence
in this paragraph is not good English. The following would be better.
"This memo includes two processes to recover NAL unit
decoding order when NAL units are transported using multiple RTP
sessions, and interleaved mode is not used in all of those sessions."
[YK] The current fourth paragraph was added only to
let readers know the major open issue, which should be removed after the
issue has been resolved. In this case, do you still think moving the
text from the Scope section is needed?
2) Scope, page 11
"When Session multiplexing is not used, ... When a
subset of the base layer containing the T0 base Layer and one or more
temporal enhancement Layers is transmitted ..."
Should this be zero or more temporal enhancement Layers?
This would make it clear that if all that was transmitted was the T0
base layer, then that should be encapsulated according to RFC 3984.
[YK] Exactly. Agreed.
3) anchor layer representation, page 14
My reading of this definition is that it is possible to
start decoding at the first NAL unit of the anchor layer representation,
and everything will be OK. This would not send NAL units of the lower
Layers of the same access unit to the decoder.
Is this the intention?
[YK] No, the intention is to say that you can start
accessing the specific layer from an anchor layer representation but
lower layers that layer depends on are also needed. I will improve the
definition to make the intention clear.
4) Base RTP session, page 14
"The Base RTP session may contain NAL units of NAL unit
type equal to 14 and 15."
Should type 20 be specifically mentioned? Presumably
this is allowed.
[YK] Yes.
5) Enhancement RTP session, page 14
"An RTP session containing the RTP stream which at least
depends on one other RTP session ..."
This is not good English. The following would be better.
"An RTP session containing an RTP stream that depends on
at least one other RTP session ..."
Is it the session that depends on another session, or
the stream that depends on another stream? Even the above improved
wording has confused the two. Another alternative is as follows.
"An RTP session containing an RTP stream that depends on
at least one RTP stream in another RTP session ..."
Depending on how this definition is fixed (if at all),
the definition above for Base RTP session may need to be fixed (to align
with it).
[YK] Left for Thomas. And I have one comment in
addition. The current defintion does not allow for simulcast use cases,
where an independently coded enhancement layer is carried in one of the
RTP sessions - such an RTP session should also be an enhancement RTP
session.
6) Session multiplexing, page 16
"Each RTP session requires a separate signaling and has
a separate Timestamp, Sequence Number, and SSRC space."
I am unclear about what is meant by a separate Timestamp
space. Does this mean that the encoded values are different (for
security purposes) but are based on the same clock, or does it mean that
they are different and based on different clocks.
Presumably the former is the intention. I can not see
how the non-CL-DON decoding order recovery method could work if the
latter were the intention.
[YK] The intention is that timestamp values are
independent for the RTP sessions. I think different RTP sessions should
use the same clock, but probably that should not be mandated - when
considering the potential use cases with parallel encoding devices with
different clocks. Different clocks certainly adds difficulties for the
non-CL-DON mode to work, but it should still be possible, after the
timestamp values of different sessions are mapped to a same clock.
7) SVC NAL unit, page 16
"A NAL unit of NAL unit type 14 or 20 as specified in
Annex G of [SVC]. An SVC NAL unit has a four-byte NAL unit header."
What about type 15? (subset sequence parameter set)
[YK] Subset sequence parameter set NAL units only have
one-byte NAL unit header. The current defintion is sufficient for the
text. Maybe we should add a note stating this to avoid confusing
readers.
8) PACSI NAL unit definition, page 21
The definition of a term for "the layer representation
to which the first NAL unit in the aggregation packet after the PACSI
NAL unit belongs" would make some of the subsequent definitions easier
to read.
[YK] Agreed.
My first thought for the name for this term was "target
layer representation" for consistency with the local term "target NAL
units", but it had already been used. Perhaps "associated layer
representation" could be used?
[YK] "associated layer representation" is not bad.
9) X bit, page 22
"The X bit SHOULD be identical for all the PACSI NAL
units involved in all the RTP sessions conveying an SVC bitstream."
I suggest removing "involved", and adding "of": "The X
bit SHOULD be identical for all the PACSI NAL units in all of the RTP
sessions conveying an SVC bitstream.".
[YK] Agreed.
10) T bit, page 22
"...MUST be present and specified as in below.".
I suggest removing "in".
[YK] Agreed.
11) A bit, page 22
"The A bit MUST be set to 1 if all the target NAL units
belong to anchor layer representations. Otherwise, the A bit MUST be
set to 0. The A bit SHOULD be identical for all the PACSI NAL units for
which the target NAL units belong to the same access unit."
I understand the intention of this bit, and the note is
clear, but the actual definition does not seem to specify what is
intended. If some of the NAL units belonging to the anchor layer
representations were in an earlier aggregation packet, but were not
target NAL units in that packet, the A bit being set in the current
aggregation packet would not actually indicate a switching point? The
other issue I have is related to comment 3 above, about whether other
(lower layer) NAL units are needed in addition to those in the anchor
layer representation?
Perhaps the solution is to add some words that say when
the A bit is set to "1", all NAL units necessary to perform switching
shall be transmitted as target NAL units in one or more aggregation
packets, in each of which the A bit is set to "1".
[YK] This has been discussed in our earlier email
exchanges to the reflector on Jan 4th? I originally wrote the semantics
in the same spirit as you suggested. However, there has been a comment
from Magnus suggesting that the semantics should apply only within a
packet.
12) C bit, page 23
Same comment as for the A bit in comment 11: this also
seems to allow some of the (vital) intra NAL units to be "hidden" in
aggregation packets where they are not target NAL units.
[YK] Same as above for the A bit.
13) S bit and E bit, page 23
These definitions of start and end bits only seem to
make sense if NAL units of a layer representation have to transmitted in
decoding order. But this constraint does not seem to be specified
anywhere.
What is the intention?
Should "decoding order" in the definition be replaced
with "transmission order"?
These definitions could be much simplified using the
term "associated layer representation" as in comment 8.
[YK] Replacing "decoding order" with "transmission
order" makes more sense.
14) SEI NAL units in PACSI, page 24
"SEI NAL units included in the PACSI NAL unit, if any,
MUST contain a subset of the SEI messages associated with the access
unit of the first NAL unit following the PACSI NAL unit within the
aggregation packet."
The comment I made on 4 January, as below, has not been
addressed.
As a subset could presumably be an empty set, does this
paragraph actually say anything at all? Which SEI messages must be
included? And are any excluded, such as those associated with some other
access unit?
Ye-Kui's response on the same day was:
Here we are trying to say the following:
- Never include SEI messages associated with other
access units than the
one (target access unit) containing the first NAL unit
following the
PACSI in the aggregation packet.
- A subset (zero to all) of the SEI messages associated
with the target
access unit can be included in the PACSI.
This suggests that the following text would be OK.
"The PACSI NAL unit SHALL include a subset (zero to all)
of the SEI NAL units associated with the access unit to which the target
NAL units belong, and SHALL NOT contain SEI NAL units associated with
any other access unit."
[YK] Agreed.
15) SEI NAL units in PACSI, page 24
In the last paragraph before section 7: "An SEI message
SHOULD NOT be included in a PACSI NAL unit and included in one of the
remaining NAL units contained in the same aggregation packet at the same
time.", "at the same time" is not needed.
[YK] Agreed.
Also it is interesting that this is worded in terms of
SEI messages and not SEI NAL units. An SEI NAL unit can contain one or
more SEI messages. I am not sure whether H.264/AVC/SVC has any
restriction on repeating either SEI messages or SEI NAL units within an
access unit. If not, then we are adding a restriction to a bitstream
that does not exist in H.264/AVC/SVC?
[YK] I think such restriction does not exist in
H.264/AVC/SVC, but it should have been there. Let's see whether we
should propose this as an corrigendum item to H.264/AVC/SVC. Herein the
wording should be in terms of SEI messages because the RTP packetizer
may take out an important subset of the SEI messages in one SEI NAL unit
to be in the PACSI.
I think the intention is clear that the RTP
packetisation process should not repeat SEI messages by putting them in
both the PACSI and the later part of the aggregation packet, but the
current text is more restricting than this.
[YK] Yes, your understanding of the intention is
correct.
16) Packetization Rules, page 25
"... the single NAL unit packetization mode SHOULD NOT
be used whenever possible ..."
This probably does not have the intended meaning. To me,
when read explicitly as it is written, it is saying that the single NAL
unit packetization mode should not be used at every possible
opportunity, and instead, occasionally, something else should be done.
Instead, "... use of the single NAL unit packetization
mode SHOULD be avoided whenever possible ..."
[YK] I don't really see the difference but I believe
that your wording is better.
17) Packetization Rules, page 25
In the first informative note on this page (relating to
historical ballast), there is no mention of FU-A, which, presumably,
also has to be implemented to conform to non-interleaved mode (of RFC
3984).
[YK] Good point. Should be added.
18) Packetization Rules, page 25
"A prefix NAL unit SHOULD be aggregated to the same
packet as the associated NAL unit following the prefix NAL unit in
decoding order."
The wording of this could be improved, such as in the
following.
"A prefix NAL unit and the NAL unit with which it is
associated, and which follows the prefix NAL unit in decoding order,
SHOULD be included in the same aggregation packet."
[YK] Agreed.
19) Packetization Rules for Layered Multicast, page 25
This section, in both the title and the body, uses the
term "Layered Multicast", which I think we have agreed should be
referred to as "session multiplexing".
[YK] Agreed - I recently noticed this too.
20) Packetization Rules for Layered Multicast, page 26
The editor's note from Thomas could be addressed by
writing the paragraph as below.
"If the CL-DON decoding order recovery mode is in use,
either the non-interleaved packetization mode, restricted to STAP-A
packets only, or the interleaved packetization mode MAY be used, but the
single NAL unit packetization mode MUST NOT be used. ..."
[YK] Agreed.
21) decoding order recovery mode, pages 28 and 30
The sentence at the top of page 28 states that the
method to use to recover decoding order is indicated by the presence or
absence of the parameter sprop-cl-don. I suggest that this is reinforced
by added a sentence at the start of 8.1.1 and 8.1.2 to state the
condition when the process is invoked: "This process is used when the
parameter sprop-cl-don is (not) present in the session description".
[YK] Agreed.
22) The classical RTP decoding order recovery mode, page
28
The sentence "Within each RTP stream, decoding order
recovery of NAL units SHALL be applied according to the following rules"
could be improved to something like "Within each RTP stream, the
decoding order of NAL units SHALL be recovered according to the
following rules".
Also, this text sounds too strong: surely we should only
be concerned with the end result and not the method, and hence could add
words such as "SHALL be recovered by performing by any process
equivalent to the following rules".
[YK] Left for Thomas.
23) The classical RTP decoding order recovery mode, page
28
The sentence "Decoding order recovery between RTP
streams of different RTP sessions to access units SHALL be applied
according to the following rules" could be improved to something like
"The decoding order of NAL units from multiple RTP streams in multiple
RTP sessions shall be recovered into a single sequence of NAL units,
grouped into access units, by performing by any process equivalent to
the following rules".
Note also that there are two colons at the end of the
current paragraph.
[YK] Left for Thomas.
24) The classical RTP decoding order recovery mode, page
29
In the editor's note, "NAL units with nal_unit_type
equal to 5 present in any of the RTP streams shall be grouped and
precede directly any NAL units of type 1 , 5, 14, 15 and 20 in the
access unit.", the first instance of "5" should be "6"
[YK] Left for Thomas.
25) The classical RTP decoding order recovery mode, page
29
Regarding the editor's note and the associated open
issue, number 9, I think that SEI messages should be in the RTP stream
that they are relevant to, so that if not all layers are being received,
only those SEI messages that are relevant need to be received.
If this is agreed, then this paragraph with the editor's
note should be moved above the previous paragraph as the NAL units
within an access unit should be reordered before being passed to the
decoder.
[YK] Left for Thomas.
26) The classical RTP decoding order recovery mode -
informative example, page 29
It should be noted why there is no data for session C at
times 4 and 2. It has earlier being stated that there should be data for
a higher layer at all time instances at which there is data in any of
the lower layers, and this breaks that rule. Presumably the reason is
simply the timing of the receipt of the data, and that these NAL units
have not been received because the receiver was not ready. Or perhaps
due to packet loss?
[YK] Left for Thomas.
27) Typos
Page 7. top line: "know" -> "known".
Page 29, top line, "decdoding oreder" -> "decoding
order"
[YK] Both agreed.
Best regards
Mike
Mike Nilsson
Multimedia Analysis and Coding
BT Group Chief Technology Office
___________________________
Sirius House (B54-MH), Room 92
Adastral Park, Martlesham Heath, Ipswich, IP5 3RE, UK
Tel: +44 1473 645413
Mobile: +44 7917 025433
Fax: +44 1908 862365
Email: mike.nilsson at bt.com
<BLOCKED::mailto:mike.nilsson at bt.com>
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
http://www.ietf.org/mailman/listinfo/avt