|
Mike, About the single NAL unit. In 1) "this specification supports the use of the interleaved and
non-interleaved packetisation modes of RFC3984, but not the single NAL unit
mode;" Single NAL unit is
supported in some cases. You have it in 16) about packetization rules. I am OK
with your text but if you say whenever possible there should be some text
stating that they may be used in the base layer for backward interoperability
with systems that support RFC3984. Roni From:
mike.nilsson at bt.com [mailto:mike.nilsson at bt.com] I have read this draft up
to the end of section 8, and have the following comments. Mostly they are about
clarifications and editorial improvements to the text. I have not included any
comments on the major open issue of cross layer decoding order dependency,
which I hope to be able to send soon in a separate e-mail. 1) Introduction, page 5 I think the document
could be improved by moving (some of) the information in the scope section,
which appears on pages 11 and 12, closer to the front of the document. After
the third paragraph of the introduction, some text could be added (using what
is in the scope section) to state such things as: this specification allows
NAL units to be encapsulated into one or more RTP sessions; this specification
supports the use of the interleaved and non-interleaved packetisation modes of
RFC3984, but not the single NAL unit mode; when NAL units are encapsulated
into more than one RTP session, different packetisation modes can be used in
each session. Then the current fourth
paragraph of the introduction makes more sense, especially if the wording is
slightly modified to make it clear that it applies to the case of more than one
session and interleaved mode not being used on all sessions. Note the first
sentence in this paragraph is not good English. The following would be better. “This memo includes
two processes to recover NAL unit decoding order when NAL units are transported
using multiple RTP sessions, and interleaved mode is not used in all of those
sessions.” 2) Scope, page 11 “When Session
multiplexing is not used, … When a subset of the base layer containing
the T0 base Layer and one or more temporal enhancement Layers is transmitted
…” Should this be zero or
more temporal enhancement Layers? This would make it clear that if all that was
transmitted was the T0 base layer, then that should be encapsulated according
to RFC 3984. 3) anchor layer representation,
page 14 My reading of this
definition is that it is possible to start decoding at the first NAL unit of
the anchor layer representation, and everything will be OK. This would not send
NAL units of the lower Layers of the same access unit to the decoder. Is this the intention? 4) Base RTP session, page
14 “The Base RTP
session may contain NAL units of NAL unit type equal to 14 and Should type 20 be
specifically mentioned? Presumably this is allowed. 5) Enhancement RTP
session, page 14 “An RTP session
containing the RTP stream which at least depends on one other RTP session
…” This is not good English.
The following would be better. “An RTP session
containing an RTP stream that depends on at least one other RTP session
…” Is it the session that
depends on another session, or the stream that depends on another stream? Even
the above improved wording has confused the two. Another alternative is as
follows. “An RTP session
containing an RTP stream that depends on at least one RTP stream in another RTP
session …” Depending on how this
definition is fixed (if at all), the definition above for Base RTP session may
need to be fixed (to align with it). 6) Session multiplexing,
page 16 “Each RTP session
requires a separate signaling and has a separate Timestamp, Sequence Number,
and SSRC space.” I am unclear about what
is meant by a separate Timestamp space. Does this mean that the encoded values
are different (for security purposes) but are based on the same clock, or does
it mean that they are different and based on different clocks. Presumably the former is
the intention. I can not see how the non-CL-DON decoding order recovery method
could work if the latter were the intention. 7) SVC NAL unit, page 16 “A NAL unit of NAL
unit type 14 or 20 as specified in Annex G of [SVC]. An SVC NAL unit has a
four-byte NAL unit header.” What about type 15?
(subset sequence parameter set) 8) PACSI NAL unit
definition, page 21 The definition of a term
for “the layer representation to which the first NAL unit in the
aggregation packet after the PACSI NAL unit belongs” would make some of
the subsequent definitions easier to read. My first thought for the
name for this term was “target layer representation” for
consistency with the local term “target NAL units”, but it had
already been used. Perhaps “associated layer representation” could
be used? 9) X bit, page 22 “The X bit SHOULD
be identical for all the PACSI NAL units involved in all the RTP sessions
conveying an SVC bitstream.” I suggest removing
“involved”, and adding “of”: “The X bit SHOULD be
identical for all the PACSI NAL units in all of the RTP sessions conveying an
SVC bitstream.”. 10) T bit, page 22 “…MUST be
present and specified as in below.”. I suggest removing
“in”. 11) A bit, page 22 “The A bit MUST be
set to 1 if all the target NAL units belong to anchor layer
representations. Otherwise, the A bit MUST be set to 0. The A bit
SHOULD be identical for all the PACSI NAL units for which the target NAL units
belong to the same access unit.” I understand the
intention of this bit, and the note is clear, but the actual definition does
not seem to specify what is intended. If some of the NAL units belonging to the
anchor layer representations were in an earlier aggregation packet, but were
not target NAL units in that packet, the A bit being set in the current
aggregation packet would not actually indicate a switching point? The other
issue I have is related to comment 3 above, about whether other (lower layer)
NAL units are needed in addition to those in the anchor layer representation? Perhaps the solution is
to add some words that say when the A bit is set to “ 12) C bit, page 23 Same comment as for the A
bit in comment 11: this also seems to allow some of the (vital) intra NAL units
to be “hidden” in aggregation packets where they are not target NAL
units. 13) S bit and E bit, page
23 These definitions of
start and end bits only seem to make sense if NAL units of a layer
representation have to transmitted in decoding order. But this constraint does
not seem to be specified anywhere. What is the intention? Should “decoding
order” in the definition be replaced with “transmission
order”? These definitions could
be much simplified using the term “associated layer representation”
as in comment 8. 14) SEI NAL units in
PACSI, page 24 “SEI NAL units
included in the PACSI NAL unit, if any, MUST contain a subset of the SEI
messages associated with the access unit of the first NAL unit following the
PACSI NAL unit within the aggregation packet.” The comment I made on 4
January, as below, has not been addressed. As a subset could
presumably be an empty set, does this paragraph actually say anything at all?
Which SEI messages must be included? And are any excluded, such as those
associated with some other access unit? Ye-Kui’s response
on the same day was: Here we are trying to say
the following: - Never include SEI
messages associated with other access units than the one (target access unit)
containing the first NAL unit following the PACSI in the aggregation
packet. - A subset (zero to all)
of the SEI messages associated with the target access unit can be
included in the PACSI. This suggests that the
following text would be OK. “The PACSI NAL unit
SHALL include a subset (zero to all) of the SEI NAL units associated with the
access unit to which the target NAL units belong, and SHALL NOT contain SEI NAL
units associated with any other access unit.” 15) SEI NAL units in
PACSI, page 24 In the last paragraph
before section 7: “An SEI message SHOULD NOT be included in a PACSI NAL
unit and included in one of the remaining NAL units contained in the same
aggregation packet at the same time.”, “at the same time” is
not needed. Also it is interesting
that this is worded in terms of SEI messages and not SEI NAL units. An SEI NAL
unit can contain one or more SEI messages. I am not sure whether H.264/AVC/SVC
has any restriction on repeating either SEI messages or SEI NAL units within an
access unit. If not, then we are adding a restriction to a bitstream that does
not exist in H.264/AVC/SVC? I think the intention is
clear that the RTP packetisation process should not repeat SEI messages by
putting them in both the PACSI and the later part of the aggregation packet,
but the current text is more restricting than this. 16) Packetization Rules,
page 25 “… the single
NAL unit packetization mode SHOULD NOT be used whenever possible …” This probably does not
have the intended meaning. To me, when read explicitly as it is written, it is
saying that the single NAL unit packetization mode should not be used at every
possible opportunity, and instead, occasionally, something else should be done. Instead, “…
use of the single NAL unit packetization mode SHOULD be avoided whenever
possible …” 17) Packetization Rules,
page 25 In the first informative
note on this page (relating to historical ballast), there is no mention of
FU-A, which, presumably, also has to be implemented to conform to
non-interleaved mode (of RFC 3984). 18) Packetization Rules,
page 25 “A prefix NAL unit
SHOULD be aggregated to the same packet as the associated NAL unit following
the prefix NAL unit in decoding order.” The wording of this could
be improved, such as in the following. “A prefix NAL unit
and the NAL unit with which it is associated, and which follows the prefix NAL
unit in decoding order, SHOULD be included in the same aggregation
packet.” 19) Packetization Rules
for Layered Multicast, page 25 This section, in both the
title and the body, uses the term “Layered Multicast”, which I
think we have agreed should be referred to as “session
multiplexing”. 20) Packetization Rules
for Layered Multicast, page 26 The editor’s note
from Thomas could be addressed by writing the paragraph as below. “If the CL-DON
decoding order recovery mode is in use, either the non-interleaved
packetization mode, restricted to STAP-A packets only, or the interleaved
packetization mode MAY be used, but the single NAL unit packetization mode MUST
NOT be used. …” 21) decoding order
recovery mode, pages 28 and 30 The sentence at the top
of page 28 states that the method to use to recover decoding order is indicated
by the presence or absence of the parameter sprop-cl-don. I suggest that this
is reinforced by added a sentence at the start of 8.1.1 and 8.1.2 to state the
condition when the process is invoked: “This process is used when the
parameter sprop-cl-don is (not) present in the session description”. 22) The classical RTP
decoding order recovery mode, page 28 The sentence
“Within each RTP stream, decoding order recovery of NAL units SHALL be
applied according to the following rules” could be improved to something
like “Within each RTP stream, the decoding order of NAL units SHALL be
recovered according to the following rules”. Also, this text sounds
too strong: surely we should only be concerned with the end result and not the
method, and hence could add words such as “SHALL be recovered by
performing by any process equivalent to the following rules”. 23) The classical RTP
decoding order recovery mode, page 28 The sentence
“Decoding order recovery between RTP streams of different RTP sessions to
access units SHALL be applied according to the following rules” could be
improved to something like “The decoding order of NAL units from multiple
RTP streams in multiple RTP sessions shall be recovered into a single sequence
of NAL units, grouped into access units, by performing by any process
equivalent to the following rules”. Note also that there are
two colons at the end of the current paragraph. 24) The classical RTP
decoding order recovery mode, page 29 In the editor’s
note, “NAL units with nal_unit_type equal to 5 present in any of the RTP
streams shall be grouped and precede directly any NAL units of type 1 , 5, 14,
15 and 25) The classical RTP
decoding order recovery mode, page 29 Regarding the
editor’s note and the associated open issue, number 9, I think that SEI
messages should be in the RTP stream that they are relevant to, so that if not
all layers are being received, only those SEI messages that are relevant need
to be received. If this is agreed, then
this paragraph with the editor’s note should be moved above the previous
paragraph as the NAL units within an access unit should be reordered before
being passed to the decoder. 26) The classical RTP
decoding order recovery mode – informative example, page 29 It should be noted why
there is no data for session C at times 4 and 2. It has earlier being stated that
there should be data for a higher layer at all time instances at which there is
data in any of the lower layers, and this breaks that rule. Presumably the
reason is simply the timing of the receipt of the data, and that these NAL
units have not been received because the receiver was not ready. Or perhaps due
to packet loss? 27) Typos Page 7. top line:
“know” -> “known”. Page 29, top line,
“decdoding oreder” -> “decoding order” Best regards Mike Mike Nilsson Sirius House (B54-MH), Room 92 |
_______________________________________________ Audio/Video Transport Working Group avt at ietf.org http://www.ietf.org/mailman/listinfo/avt