[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] Review draft-ietf-avt-rtp-h264-09.txt
Hi Joerg and Colin,
thanks, Joerg, for your detailed review. Also thanks to Colin for his
additional comments and reminders communicated to the authors in
private. As I'm working on a modified draft, I collect my comments to
Joerg's review in this EMail.
Most of your points are well taken, and factored into the next release of
the draft. However, I'm not sure I'm addressing the following three points
adequately, and I would like to ask for additional input:
a) network elements: I understand that most of today's routers do not look
into RTP payload specifics, nor should. What we have in mind, however, are
indeed media-aware network elements. I could envision routers and gateways
in (private) IP networks which carry a lot of H.264 that take advantage of
the H.264 NAL unit priorization scheme. A typical example would be the
border gateway between wireline and wireless worlds. MCUs are another
example of a "network element" that can make good use of NRI. I think that
AVT will have to acknowledge in the future that media based priorization
becomes a reality (there are other examples in the audio and video
compression field that would make schemes similar to NRI useful, e.g.
G.723.1 Annex C, or H.263 Annex V.
I guess, your problem with "network elements" is that the name is too
generic. How about calling them "media-aware network elements", and
explain the term in the definition section as follows:
Media aware network element: A network element, such as a router,
gateway, MCU, that is capable of parsing certain aspects of the
RTP payload headers or the RTP payload, and reacting on the
contents.
Informative note: The concept of a media-aware network
element seems not always to be compatible with some RTP based
technologies, especially with SRTP.
b)
Sect. 8.1, max-cbp and max-dbp:
There is likely to be a good reason in H.264, nevertheless the
units for these parameters appear confusing. Both refer to
a size, but while max-cbp is specified in 1000bit units, max-dbp
uses 1024 byte units.
These units translate directly to similarly called syntax elements of
Annexes of H.264, and I'm very reluctant to change them.
c)
Sect. 5.1, last para:
The informative note on the jitter estimation is surely a good idea.
But it is a bad thing that the packet format modifies generic
RTP assumptions and counters the use of general purpose RTP analysis
tools (or libraries). It appears it be worthwhile to report how an
H.264-aware receiver would perform these calculations properly.
This is a well-known problem of all video codecs that have (somewhat)
decoupled transmission to display order (e.g. everything that uses B
frames). With old-fashioned B frames of the MPEG flavor, you have a chance
to define a jitter calculation scheme, at least when you know the frame
structures (e.g. IBBP). Having this knowledge already restricts the
freedom of the video encoder and may lead to a somewhat lower compression,
which is why the compression people don't like such restrictions. But
since most encoders use a fixed frame structure, a jitter calculation
scheme could be defined. I'm unaware of any IETF document that discusses
sucha scheme, though.
With H.264's flexibility, you don't have a chance. The syntax allows for
"jitter" of many seconds (although this can be reduced using H.264's HRD
SEI messages. More importantly, while in old MPEG the media induced
"jitter" occurs regularly (through a fixed frame structure IBBP scheme), in
H.264 we don't know yet how encoder algorithms will employ the flexibility
of the syntax. I'm very sorry, but I fear the only thing we can reasonably
do in this draft is to advise people that the problem exist.
Stephan
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt