[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Comments on draft-ietf-avt-rtp-svc-07.txt



Mike,

About the single NAL unit.

In 1) "this specification supports the use of the interleaved and non-interleaved packetisation modes of RFC3984, but not the single NAL unit mode;"

Single NAL unit is supported in some cases. You have it in 16) about packetization rules. I am OK with your text but if you say whenever possible there should be some text stating that they may be used in the base layer for backward interoperability with systems that support RFC3984.

Roni

 

 

 


From: mike.nilsson at bt.com [mailto:mike.nilsson at bt.com]
Sent: Friday, February 08, 2008 6:01 PM
To: Even, Roni; ye-kui.wang at nokia.com; jonathan at vidyo.com; schierl at hhi.fhg.de; csp at csperkins.org; Yann.Leprovost at alcatel-lucent.fr; stewe at stewe.org; tom at vidyo.com; tom.taylor at rogers.com; rjesup at wgate.com
Cc: avt at ietf.org
Subject: Comments on draft-ietf-avt-rtp-svc-07.txt

 

I have read this draft up to the end of section 8, and have the following comments. Mostly they are about clarifications and editorial improvements to the text.

 

I have not included any comments on the major open issue of cross layer decoding order dependency, which I hope to be able to send soon in a separate e-mail.

 

1) Introduction, page 5

I think the document could be improved by moving (some of) the information in the scope section, which appears on pages 11 and 12, closer to the front of the document. After the third paragraph of the introduction, some text could be added (using what is in the scope section) to state such things as:

 

this specification allows NAL units to be encapsulated into one or more RTP sessions;

this specification supports the use of the interleaved and non-interleaved packetisation modes of RFC3984, but not the single NAL unit mode;

when NAL units are encapsulated into more than one RTP session, different packetisation modes can be used in each session.

 

Then the current fourth paragraph of the introduction makes more sense, especially if the wording is slightly modified to make it clear that it applies to the case of more than one session and interleaved mode not being used on all sessions. Note the first sentence in this paragraph is not good English. The following would be better.

 

“This memo includes two processes to recover NAL unit decoding order when NAL units are transported using multiple RTP sessions, and interleaved mode is not used in all of those sessions.”

 

2) Scope, page 11

“When Session multiplexing is not used, … When a subset of the base layer containing the T0 base Layer and one or more temporal enhancement Layers is transmitted …”

 

Should this be zero or more temporal enhancement Layers? This would make it clear that if all that was transmitted was the T0 base layer, then that should be encapsulated according to RFC 3984.

 

3) anchor layer representation, page 14

 

My reading of this definition is that it is possible to start decoding at the first NAL unit of the anchor layer representation, and everything will be OK. This would not send NAL units of the lower Layers of the same access unit to the decoder.

 

Is this the intention?

 

4) Base RTP session, page 14

 

“The Base RTP session may contain NAL units of NAL unit type equal to 14 and 15.”

 

Should type 20 be specifically mentioned? Presumably this is allowed.

 

5) Enhancement RTP session, page 14

 

“An RTP session containing the RTP stream which at least depends on one other RTP session …”

 

This is not good English. The following would be better.

 

“An RTP session containing an RTP stream that depends on at least one other RTP session …”

 

Is it the session that depends on another session, or the stream that depends on another stream? Even the above improved wording has confused the two. Another alternative is as follows.

 

“An RTP session containing an RTP stream that depends on at least one RTP stream in another RTP session …”

 

Depending on how this definition is fixed (if at all), the definition above for Base RTP session may need to be fixed (to align with it).

 

6) Session multiplexing, page 16

 

“Each RTP session requires a separate signaling and has a separate Timestamp, Sequence Number, and SSRC space.”

 

I am unclear about what is meant by a separate Timestamp space. Does this mean that the encoded values are different (for security purposes) but are based on the same clock, or does it mean that they are different and based on different clocks.

 

Presumably the former is the intention. I can not see how the non-CL-DON decoding order recovery method could work if the latter were the intention.

 

7) SVC NAL unit, page 16

“A NAL unit of NAL unit type 14 or 20 as specified in Annex G of [SVC]. An SVC NAL unit has a four-byte NAL unit header.”

 

What about type 15? (subset sequence parameter set)

 

8) PACSI NAL unit definition, page 21

The definition of a term for “the layer representation to which the first NAL unit in the aggregation packet after the PACSI NAL unit belongs” would make some of the subsequent definitions easier to read.

 

My first thought for the name for this term was “target layer representation” for consistency with the local term “target NAL units”, but it had already been used. Perhaps “associated layer representation” could be used?

 

9) X bit, page 22

“The X bit SHOULD be identical for all the PACSI NAL units involved in all the RTP sessions conveying an SVC bitstream.”

 

I suggest removing “involved”, and adding “of”: “The X bit SHOULD be identical for all the PACSI NAL units in all of the RTP sessions conveying an SVC bitstream.”.

 

10) T bit, page 22

“…MUST be present and specified as in below.”.

 

I suggest removing “in”.

 

11) A bit, page 22

“The A bit MUST be set to 1 if all the target NAL units belong to anchor layer representations.  Otherwise, the A bit MUST be set to 0.  The A bit SHOULD be identical for all the PACSI NAL units for which the target NAL units belong to the same access unit.”

 

I understand the intention of this bit, and the note is clear, but the actual definition does not seem to specify what is intended. If some of the NAL units belonging to the anchor layer representations were in an earlier aggregation packet, but were not target NAL units in that packet, the A bit being set in the current aggregation packet would not actually indicate a switching point? The other issue I have is related to comment 3 above, about whether other (lower layer) NAL units are needed in addition to those in the anchor layer representation?

 

Perhaps the solution is to add some words that say when the A bit is set to “1”, all NAL units necessary to perform switching shall be transmitted as target NAL units in one or more aggregation packets, in each of which the A bit is set to “1”.

 

12) C bit, page 23

Same comment as for the A bit in comment 11: this also seems to allow some of the (vital) intra NAL units to be “hidden” in aggregation packets where they are not target NAL units.

 

13) S bit and E bit, page 23

These definitions of start and end bits only seem to make sense if NAL units of a layer representation have to transmitted in decoding order. But this constraint does not seem to be specified anywhere.

 

What is the intention?

 

Should “decoding order” in the definition be replaced with “transmission order”?

 

These definitions could be much simplified using the term “associated layer representation” as in comment 8.

 

14) SEI NAL units in PACSI, page 24

“SEI NAL units included in the PACSI NAL unit, if any, MUST contain a subset of the SEI messages associated with the access unit of the first NAL unit following the PACSI NAL unit within the aggregation packet.”

 

The comment I made on 4 January, as below, has not been addressed.

 

As a subset could presumably be an empty set, does this paragraph actually say anything at all? Which SEI messages must be included? And are any excluded, such as those associated with some other access unit?

 

Ye-Kui’s response on the same day was:

 

Here we are trying to say the following:

- Never include SEI messages associated with other access units than the

one (target access unit) containing the first NAL unit following the

PACSI in the aggregation packet.

- A subset (zero to all) of the SEI messages associated with the target

access unit can be included in the PACSI.

 

This suggests that the following text would be OK.

 

“The PACSI NAL unit SHALL include a subset (zero to all) of the SEI NAL units associated with the access unit to which the target NAL units belong, and SHALL NOT contain SEI NAL units associated with any other access unit.”

 

15) SEI NAL units in PACSI, page 24

In the last paragraph before section 7: “An SEI message SHOULD NOT be included in a PACSI NAL unit and included in one of the remaining NAL units contained in the same aggregation packet at the same time.”, “at the same time” is not needed.

 

Also it is interesting that this is worded in terms of SEI messages and not SEI NAL units. An SEI NAL unit can contain one or more SEI messages. I am not sure whether H.264/AVC/SVC has any restriction on repeating either SEI messages or SEI NAL units within an access unit. If not, then we are adding a restriction to a bitstream that does not exist in H.264/AVC/SVC?

 

I think the intention is clear that the RTP packetisation process should not repeat SEI messages by putting them in both the PACSI and the later part of the aggregation packet, but the current text is more restricting than this.

 

16) Packetization Rules, page 25

“… the single NAL unit packetization mode SHOULD NOT be used whenever possible …”

 

This probably does not have the intended meaning. To me, when read explicitly as it is written, it is saying that the single NAL unit packetization mode should not be used at every possible opportunity, and instead, occasionally, something else should be done.

 

Instead, “… use of the single NAL unit packetization mode SHOULD be avoided whenever possible …”

 

17) Packetization Rules, page 25

In the first informative note on this page (relating to historical ballast), there is no mention of FU-A, which, presumably, also has to be implemented to conform to non-interleaved mode (of RFC 3984).

 

18) Packetization Rules, page 25

“A prefix NAL unit SHOULD be aggregated to the same packet as the associated NAL unit following the prefix NAL unit in decoding order.”

 

The wording of this could be improved, such as in the following.

 

“A prefix NAL unit and the NAL unit with which it is associated, and which follows the prefix NAL unit in decoding order, SHOULD be included in the same aggregation packet.”

 

19) Packetization Rules for Layered Multicast, page 25

This section, in both the title and the body, uses the term “Layered Multicast”, which I think we have agreed should be referred to as “session multiplexing”.

 

20) Packetization Rules for Layered Multicast, page 26

The editor’s note from Thomas could be addressed by writing the paragraph as below.

 

“If the CL-DON decoding order recovery mode is in use, either the non-interleaved packetization mode, restricted to STAP-A packets only, or the interleaved packetization mode MAY be used, but the single NAL unit packetization mode MUST NOT be used. …”

 

21) decoding order recovery mode, pages 28 and 30

The sentence at the top of page 28 states that the method to use to recover decoding order is indicated by the presence or absence of the parameter sprop-cl-don. I suggest that this is reinforced by added a sentence at the start of 8.1.1 and 8.1.2 to state the condition when the process is invoked: “This process is used when the parameter sprop-cl-don is (not) present in the session description”.

 

22) The classical RTP decoding order recovery mode, page 28

The sentence “Within each RTP stream, decoding order recovery of NAL units SHALL be applied according to the following rules” could be improved to something like “Within each RTP stream, the decoding order of NAL units SHALL be recovered according to the following rules”.

 

Also, this text sounds too strong: surely we should only be concerned with the end result and not the method, and hence could add words such as “SHALL be recovered by performing by any process equivalent to the following rules”.

 

23) The classical RTP decoding order recovery mode, page 28

The sentence “Decoding order recovery between RTP streams of different RTP sessions to access units SHALL be applied according to the following rules” could be improved to something like “The decoding order of NAL units from multiple RTP streams in multiple RTP sessions shall be recovered into a single sequence of NAL units, grouped into access units, by performing by any process equivalent to the following rules”.

 

Note also that there are two colons at the end of the current paragraph.

 

24) The classical RTP decoding order recovery mode, page 29

In the editor’s note, “NAL units with nal_unit_type equal to 5 present in any of the RTP streams shall be grouped and precede directly any NAL units of type 1 , 5, 14, 15 and 20 in the access unit.”, the first instance of “5” should be “6”

 

25) The classical RTP decoding order recovery mode, page 29

Regarding the editor’s note and the associated open issue, number 9, I think that SEI messages should be in the RTP stream that they are relevant to, so that if not all layers are being received, only those SEI messages that are relevant need to be received.

 

If this is agreed, then this paragraph with the editor’s note should be moved above the previous paragraph as the NAL units within an access unit should be reordered before being passed to the decoder.

 

26) The classical RTP decoding order recovery mode – informative example, page 29

It should be noted why there is no data for session C at times 4 and 2. It has earlier being stated that there should be data for a higher layer at all time instances at which there is data in any of the lower layers, and this breaks that rule. Presumably the reason is simply the timing of the receipt of the data, and that these NAL units have not been received because the receiver was not ready. Or perhaps due to packet loss?

 

27) Typos

 

Page 7. top line: “know” -> “known”.

Page 29, top line, “decdoding oreder” -> “decoding order”

 

 

 

Best regards

 

Mike

 

Mike Nilsson
Multimedia Analysis and Coding
BT Group Chief Technology Office

___________________________

Sirius House (B54-MH), Room 92
Adastral Park, Martlesham Heath, Ipswich, IP5 3RE, UK
Tel:    +44 1473 645413
Mobile: +44 7917 025433
Fax:    +44 1908 862365
Email:  mike.nilsson at bt.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
http://www.ietf.org/mailman/listinfo/avt