|
Roni, I think we are in agreement here. The text in the current draft that
immediately precedes my proposed changed wording “… use of the
single NAL unit packetization mode SHOULD be avoided whenever possible …”
states “All receivers MUST support the single NAL unit packetization
mode to provide backward compatibility to endpoints supporting only the single
NAL unit mode of RFC 3984.” Concerning the different packetisation
modes that a receiver can support, my bullet point in comment 1 was too brief –
it is clearly misleading. I think the following are the only two types of SVC receiver
that we are allowing: 1) a receiver that supports the non-interleaved
packetisation mode AND the single NAL unit mode 2) a receiver that supports the interleaved
packetisation mode AND the non-interleaved packetisation mode AND the single
NAL unit mode Unlike RFC3984, a receiver can not support
only the single NAL unit mode? (Unless it IS a RFC3984 receiver, capable of
receiving H.264/AVC and not SVC). Best regards Mike From:
Even, Roni [mailto:roni.even at polycom.co.il] Mike, About the single NAL
unit. In 1) "this specification supports the use of the interleaved and
non-interleaved packetisation modes of RFC3984, but not the single NAL unit
mode;" Single NAL unit is supported in some
cases. You have it in 16) about packetization rules. I am OK with your text but
if you say whenever possible there should be some text stating that they may be
used in the base layer for backward interoperability with systems that support
RFC3984. Roni From:
mike.nilsson at bt.com [mailto:mike.nilsson at bt.com] I have read this draft up to the end of
section 8, and have the following comments. Mostly they are about
clarifications and editorial improvements to the text. I have not included any comments on the
major open issue of cross layer decoding order dependency, which I hope to be
able to send soon in a separate e-mail. 1) Introduction, page 5 I think the document could be improved by
moving (some of) the information in the scope section, which appears on pages
11 and 12, closer to the front of the document. After the third paragraph of
the introduction, some text could be added (using what is in the scope section)
to state such things as: this specification allows NAL units to be
encapsulated into one or more RTP sessions; this specification supports the use of the
interleaved and non-interleaved packetisation modes of RFC3984, but not the
single NAL unit mode; when NAL units are encapsulated into more
than one RTP session, different packetisation modes can be used in each
session. Then the current fourth paragraph of the
introduction makes more sense, especially if the wording is slightly modified
to make it clear that it applies to the case of more than one session and
interleaved mode not being used on all sessions. Note the first sentence in
this paragraph is not good English. The following would be better. “This memo includes two processes to
recover NAL unit decoding order when NAL units are transported using multiple
RTP sessions, and interleaved mode is not used in all of those sessions.” 2) Scope, page 11 “When Session multiplexing is not
used, … When a subset of the base layer containing the T0 base Layer and
one or more temporal enhancement Layers is transmitted …” Should this be zero or more temporal
enhancement Layers? This would make it clear that if all that was transmitted
was the T0 base layer, then that should be encapsulated according to RFC 3984. 3) anchor layer representation, page 14 My reading of this definition is that it
is possible to start decoding at the first NAL unit of the anchor layer
representation, and everything will be OK. This would not send NAL units of the
lower Layers of the same access unit to the decoder. Is this the intention? 4) Base RTP session, page 14 “The Base RTP session may contain
NAL units of NAL unit type equal to 14 and Should type 20 be specifically mentioned?
Presumably this is allowed. 5) Enhancement RTP session, page 14 “An RTP session containing the RTP
stream which at least depends on one other RTP session …” This is not good English. The following
would be better. “An RTP session containing an RTP
stream that depends on at least one other RTP session …” Is it the session that depends on another
session, or the stream that depends on another stream? Even the above improved
wording has confused the two. Another alternative is as follows. “An RTP session containing an RTP
stream that depends on at least one RTP stream in another RTP session
…” Depending on how this definition is fixed
(if at all), the definition above for Base RTP session may need to be fixed (to
align with it). 6) Session multiplexing, page 16 “Each RTP session requires a
separate signaling and has a separate Timestamp, Sequence Number, and SSRC
space.” I am unclear about what is meant by a
separate Timestamp space. Does this mean that the encoded values are different
(for security purposes) but are based on the same clock, or does it mean that
they are different and based on different clocks. Presumably the former is the intention. I
can not see how the non-CL-DON decoding order recovery method could work if the
latter were the intention. 7) SVC NAL unit, page 16 “A NAL unit of NAL unit type 14 or
20 as specified in Annex G of [SVC]. An SVC NAL unit has a four-byte NAL unit
header.” What about type 15? (subset sequence
parameter set) 8) PACSI NAL unit definition, page 21 The definition of a term for “the
layer representation to which the first NAL unit in the aggregation packet
after the PACSI NAL unit belongs” would make some of the subsequent
definitions easier to read. My first thought for the name for this
term was “target layer representation” for consistency with the
local term “target NAL units”, but it had already been used.
Perhaps “associated layer representation” could be used? 9) X bit, page 22 “The X bit SHOULD be identical for
all the PACSI NAL units involved in all the RTP sessions conveying an SVC
bitstream.” I suggest removing “involved”,
and adding “of”: “The X bit SHOULD be identical for all the
PACSI NAL units in all of the RTP sessions conveying an SVC bitstream.”. 10) T bit, page 22 “…MUST be present and
specified as in below.”. I suggest removing “in”. 11) A bit, page 22 “The A bit MUST be set to 1 if all
the target NAL units belong to anchor layer representations. Otherwise,
the A bit MUST be set to 0. The A bit SHOULD be identical for all the
PACSI NAL units for which the target NAL units belong to the same access
unit.” I understand the intention of this bit,
and the note is clear, but the actual definition does not seem to specify what
is intended. If some of the NAL units belonging to the anchor layer
representations were in an earlier aggregation packet, but were not target NAL
units in that packet, the A bit being set in the current aggregation packet
would not actually indicate a switching point? The other issue I have is
related to comment 3 above, about whether other (lower layer) NAL units are
needed in addition to those in the anchor layer representation? Perhaps the solution is to add some words
that say when the A bit is set to “ 12) C bit, page 23 Same comment as for the A bit in comment
11: this also seems to allow some of the (vital) intra NAL units to be
“hidden” in aggregation packets where they are not target NAL
units. 13) S bit and E bit, page 23 These definitions of start and end bits
only seem to make sense if NAL units of a layer representation have to
transmitted in decoding order. But this constraint does not seem to be
specified anywhere. What is the intention? Should “decoding order” in the
definition be replaced with “transmission order”? These definitions could be much simplified
using the term “associated layer representation” as in comment 8. 14) SEI NAL units in PACSI, page 24 “SEI NAL units included in the PACSI
NAL unit, if any, MUST contain a subset of the SEI messages associated with the
access unit of the first NAL unit following the PACSI NAL unit within the
aggregation packet.” The comment I made on 4 January, as below,
has not been addressed. As a subset could presumably be an empty
set, does this paragraph actually say anything at all? Which SEI messages must
be included? And are any excluded, such as those associated with some other
access unit? Ye-Kui’s response on the same day
was: Here we are trying to say the following: - Never include SEI messages associated
with other access units than the one (target access unit) containing the
first NAL unit following the PACSI in the aggregation packet. - A subset (zero to all) of the SEI
messages associated with the target access unit can be included in the PACSI. This suggests that the following text
would be OK. “The PACSI NAL unit SHALL include a
subset (zero to all) of the SEI NAL units associated with the access unit to
which the target NAL units belong, and SHALL NOT contain SEI NAL units
associated with any other access unit.” 15) SEI NAL units in PACSI, page 24 In the last paragraph before section 7:
“An SEI message SHOULD NOT be included in a PACSI NAL unit and included
in one of the remaining NAL units contained in the same aggregation packet at
the same time.”, “at the same time” is not needed. Also it is interesting that this is worded
in terms of SEI messages and not SEI NAL units. An SEI NAL unit can contain one
or more SEI messages. I am not sure whether H.264/AVC/SVC has any restriction
on repeating either SEI messages or SEI NAL units within an access unit. If
not, then we are adding a restriction to a bitstream that does not exist in
H.264/AVC/SVC? I think the intention is clear that the
RTP packetisation process should not repeat SEI messages by putting them in
both the PACSI and the later part of the aggregation packet, but the current
text is more restricting than this. 16) Packetization Rules, page 25 “… the single NAL unit
packetization mode SHOULD NOT be used whenever possible …” This probably does not have the intended
meaning. To me, when read explicitly as it is written, it is saying that the
single NAL unit packetization mode should not be used at every possible
opportunity, and instead, occasionally, something else should be done. Instead, “… use of the single
NAL unit packetization mode SHOULD be avoided whenever possible …” 17) Packetization Rules, page 25 In the first informative note on this page
(relating to historical ballast), there is no mention of FU-A, which,
presumably, also has to be implemented to conform to non-interleaved mode (of
RFC 3984). 18) Packetization Rules, page 25 “A prefix NAL unit SHOULD be
aggregated to the same packet as the associated NAL unit following the prefix
NAL unit in decoding order.” The wording of this could be improved,
such as in the following. “A prefix NAL unit and the NAL unit
with which it is associated, and which follows the prefix NAL unit in decoding
order, SHOULD be included in the same aggregation packet.” 19) Packetization Rules for Layered Multicast,
page 25 This section, in both the title and the
body, uses the term “Layered Multicast”, which I think we have
agreed should be referred to as “session multiplexing”. 20) Packetization Rules for Layered
Multicast, page 26 The editor’s note from Thomas could
be addressed by writing the paragraph as below. “If the CL-DON decoding order
recovery mode is in use, either the non-interleaved packetization mode,
restricted to STAP-A packets only, or the interleaved packetization mode MAY be
used, but the single NAL unit packetization mode MUST NOT be used.
…” 21) decoding order recovery mode, pages 28
and 30 The sentence at the top of page 28 states
that the method to use to recover decoding order is indicated by the presence
or absence of the parameter sprop-cl-don. I suggest that this is reinforced by
added a sentence at the start of 8.1.1 and 8.1.2 to state the condition when
the process is invoked: “This process is used when the parameter
sprop-cl-don is (not) present in the session description”. 22) The classical RTP decoding order
recovery mode, page 28 The sentence “Within each RTP
stream, decoding order recovery of NAL units SHALL be applied according to the
following rules” could be improved to something like “Within each
RTP stream, the decoding order of NAL units SHALL be recovered according to the
following rules”. Also, this text sounds too strong: surely
we should only be concerned with the end result and not the method, and hence
could add words such as “SHALL be recovered by performing by any process
equivalent to the following rules”. 23) The classical RTP decoding order
recovery mode, page 28 The sentence “Decoding order
recovery between RTP streams of different RTP sessions to access units SHALL be
applied according to the following rules” could be improved to something
like “The decoding order of NAL units from multiple RTP streams in
multiple RTP sessions shall be recovered into a single sequence of NAL units,
grouped into access units, by performing by any process equivalent to the
following rules”. Note also that there are two colons at the
end of the current paragraph. 24) The classical RTP decoding order
recovery mode, page 29 In the editor’s note, “NAL
units with nal_unit_type equal to 5 present in any of the RTP streams shall be
grouped and precede directly any NAL units of type 1 , 5, 14, 15 and 25) The classical RTP decoding order
recovery mode, page 29 Regarding the editor’s note and the
associated open issue, number 9, I think that SEI messages should be in the RTP
stream that they are relevant to, so that if not all layers are being received,
only those SEI messages that are relevant need to be received. If this is agreed, then this paragraph
with the editor’s note should be moved above the previous paragraph as
the NAL units within an access unit should be reordered before being passed to
the decoder. 26) The classical RTP decoding order
recovery mode – informative example, page 29 It should be noted why there is no data
for session C at times 4 and 2. It has earlier being stated that there should
be data for a higher layer at all time instances at which there is data in any
of the lower layers, and this breaks that rule. Presumably the reason is simply
the timing of the receipt of the data, and that these NAL units have not been
received because the receiver was not ready. Or perhaps due to packet loss? 27) Typos Page 7. top line: “know” ->
“known”. Page 29, top line, “decdoding
oreder” -> “decoding order” Best regards Mike Mike Nilsson Sirius House (B54-MH), Room 92 |
_______________________________________________ Audio/Video Transport Working Group avt at ietf.org http://www.ietf.org/mailman/listinfo/avt