[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AVT] Review draft-ietf-avt-rtp-h264-09.txt



Hi Joerg and Colin,

thanks, Joerg, for your detailed review. Also thanks to Colin for his additional comments and reminders communicated to the authors in private. As I'm working on a modified draft, I collect my comments to Joerg's review in this EMail.

Most of your points are well taken, and factored into the next release of the draft. However, I'm not sure I'm addressing the following three points adequately, and I would like to ask for additional input:

a) network elements: I understand that most of today's routers do not look into RTP payload specifics, nor should. What we have in mind, however, are indeed media-aware network elements. I could envision routers and gateways in (private) IP networks which carry a lot of H.264 that take advantage of the H.264 NAL unit priorization scheme. A typical example would be the border gateway between wireline and wireless worlds. MCUs are another example of a "network element" that can make good use of NRI. I think that AVT will have to acknowledge in the future that media based priorization becomes a reality (there are other examples in the audio and video compression field that would make schemes similar to NRI useful, e.g. G.723.1 Annex C, or H.263 Annex V.
I guess, your problem with "network elements" is that the name is too generic. How about calling them "media-aware network elements", and explain the term in the definition section as follows:


   Media aware network element: A network element, such as a router,
   gateway, MCU, that is capable of parsing certain aspects of the
   RTP payload headers or the RTP payload, and reacting on the
   contents.

      Informative note: The concept of a media-aware network
      element seems not always to be compatible with some RTP based
      technologies, especially with SRTP.

b)
Sect. 8.1, max-cbp and max-dbp:
  There is likely to be a good reason in H.264, nevertheless the
  units for these parameters appear confusing.  Both refer to
  a size, but while max-cbp is specified in 1000bit units, max-dbp
  uses 1024 byte units.
These units translate directly to similarly called syntax elements of Annexes of H.264, and I'm very reluctant to change them.

c)
Sect. 5.1, last para:
  The informative note on the jitter estimation is surely a good idea.
  But it is a bad thing that the packet format modifies generic
  RTP assumptions and counters the use of general purpose RTP analysis
  tools (or libraries).  It appears it be worthwhile to report how an
  H.264-aware receiver would perform these calculations properly.
This is a well-known problem of all video codecs that have (somewhat) decoupled transmission to display order (e.g. everything that uses B frames). With old-fashioned B frames of the MPEG flavor, you have a chance to define a jitter calculation scheme, at least when you know the frame structures (e.g. IBBP). Having this knowledge already restricts the freedom of the video encoder and may lead to a somewhat lower compression, which is why the compression people don't like such restrictions. But since most encoders use a fixed frame structure, a jitter calculation scheme could be defined. I'm unaware of any IETF document that discusses sucha scheme, though.
With H.264's flexibility, you don't have a chance. The syntax allows for "jitter" of many seconds (although this can be reduced using H.264's HRD SEI messages. More importantly, while in old MPEG the media induced "jitter" occurs regularly (through a fixed frame structure IBBP scheme), in H.264 we don't know yet how encoder algorithms will employ the flexibility of the syntax. I'm very sorry, but I fear the only thing we can reasonably do in this draft is to advise people that the problem exist.


Stephan


_______________________________________________ Audio/Video Transport Working Group avt at ietf.org https://www1.ietf.org/mailman/listinfo/avt