[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Post WG last call edits on the H.264 payload format: Technical change



Hi,
[as an author, not WG chair]

I think I will need to give some further considerations on this issue.

First of all it is important to consider that several NALUs can have the same DON. Thus I was in error stating that max-don-diff can be used to determine how many NALUs the buffer may contain. It is not limited by that parameter.

Thus one can ask the questions:
1. Is it necessary to know what memory or space requirements are put on de-interleaving buffer?


If the answer is no, then we only need a parameter that tells the receiver how to move NALUs from the deinterleaving buffer. However it seem likely that one does indicate the amount of memory resources that will be used at the receiver. At least as a ballpark figure.

2. Is the most appropriate way to express the buffer size in number of bytes or NALUs?

This second question and how one answers it gives good insights into how to proceed. If it is required to express the max number of NALUs, then the interleaving-depth parameter is needed. If it is in bytes then we need the sprop-deint-buf-req. Due to the problem of determining the number of bytes that the average NALU will have, it is difficult to know how much memory is will approximately be required. At the same time this works also the other way, that from a memory figure, the number of NALUs are hard to determine. One reason I can see that one may need to know the number of NALUs is if one implements the buffer in a certain way that works best if one pre-allocate a number of NALU slots. Personally I don't see a reason for this.

Also, I don't think using interleaving-depth for removing NALUs from the buffer is appropriate usage when we have a more flexible and powerful attribute in max-don-diff that covers the needs which is as equal to implement at the receiver side.

Thus my single proposal for how to edit the H.264 RTP payload format specification is:

We use the following three de-interleaving related parameters:
- sprop-deint-buf-req
- deint-buf-cap
- sprop-max-don-diff

Please give your view as soon as possible. I would like to avoid incurring to much delay.

Cheers

Magnus


Magnus Westerlund wrote:

Hi AVTers and H.264 Authors,

In addition to my previous email on some editorial changes to the RTP payload format for H.264, some of us authors and Colin have had a discussion about the large set of parameters for deinterleaving. However we have so far not agreed if something should be done about this. To save time and have an open process about this the discussion will be held openly on the AVT reflector to allow all to give their view.

Currently we have the following interleaving related parameters:
- sprop-interleaving-depth
- sprop-deint-buf-req
- deint-buf-cap
- sprop-init-buf-time
- sprop-max-don-diff

Analyzing them we seem to have a significant overlap between some of the parameters. To simplify things I personally would suggest that we remove some of them avoid uncertainties on which to use, and make implementation simpler.

The sprop-init-buf-time (specifying the initial buffering time in milliseconds) does provide a subset of the functionality of sprop-max-don-diff. The max-don-diff parameter can be used both for initial and during session to determine the number of NALUs before releasing them from the deinterleaving buffer. Thus it can provide the initial buffering time, by having a receiver do initial buffering until the first packet pops out of the buffer. The max-don-diff is also more robust and can handle slow start like behavior that makes the time before playout start a bit longer than anticipated.

The sprop-interleaving-depth and sprop-deint-buf-req does also overlap as both specify the smallest required buffer space to guarantee that deinterleaving can be done. However they are expressed in different units, where int-depth is in number of NALUs and buf-req is in bytes. The buf-req parameter is only intended to provide the size in bytes, while the int-depth does provide the receiver with another measure on when a NALU can be released from the buffer.

The sprop-interleaving-depth and sprop-max-don-diff does both provide values that allow a receiver to determine when a NALU can be released from the deinterleaving buffer. The interleaving-depth parameter is rather inflexible, and must always have the same number of NALUs in the buffer before releasing them. This does not allow a sender to reduce the amount of buffering if less then the depth NALUs are used in the interleaving pattern. However the max-don-diff mechanism does provide for that as, the don numbers can be made sparse in the assignment, the amount of NALUs within the interleaving buffer for a max don diff value of N is anything between 1 and N, depending on how the DONs are assigned.

With all these overlap between the parameters one might consider what simplifications that can be made. In my opinion we can remove sprop-init-buf-time as being totally redundant and inferior to sprop-max-don-diff.

It is a more difficult question if we should remove any of the other parameters. Here I see a couple of different alternatives:

A. We remove sprop-interleaving-depth and sprop-deint-buf-req and only relies on max-don-diff to determine the maximum number of NALUs that can be present in the deinterleaving buffer. The actual memory requirement for this buffer will be a bit vague, and the client can only make a estimation based on likely max size of the NALUs. The deint-buf-cap is kept to allow a offer or answer to state its memory capabilities for de-interleaving in bytes, and having the sender to ensure that these are meet.

B. We remove sprop-interleaving-depth. The removal of NALUs from the buffer is completely controlled by max-don-diff. The checking if the receiver has sufficient resources are expressed by sprop-deint-buf-req, and thus expressed in bytes. The receiver will then also have the maximum number of NALUs through the max-don-diff parameter.

These changes will effect the MIME type, the Offer-Answer section and the informative chapter describing proposed methods for handling the de-interleaving buffer.

I think we should do some reduction of parameters despite the fact that it will delay the request to the IESG with at least one and half week, assuming that it is sufficent with a 1 week WG last call.

Please provide comments on this, preferable stating if alternative A or B or neither is your preference.


Cheers

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com



_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt



--

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt