idnits 2.17.1 

draft-civanlar-bmpeg-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 6 instances of too long lines in the document, the longest one
     being 2 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 139 has weird spacing: '...es.  If  a pac...'

  == Line 215 has weird spacing: '...such as  the "...'

  == Line 232 has weird spacing: '...cessing  to ca...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 307 looks like a reference

  -- Missing reference section? '2' on line 311 looks like a reference

  -- Missing reference section? '3' on line 314 looks like a reference

  -- Missing reference section? '4' on line 317 looks like a reference

  -- Missing reference section? '5' on line 319 looks like a reference

  -- Missing reference section? '6' on line 323 looks like a reference


     Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	  Internet Engineering Task Force                         M. Reha Civanlar
3	  INTERNET-DRAFT                                             Glenn L. Cash
4	  File: draft-civanlar-bmpeg-02.txt                       Barry G. Haskell

6	                                                        AT&T Labs-Research

8	                                                           November, 1997

10	                   RTP Payload Format for Bundled MPEG

12	                           Status of this Memo

14	  This document is an Internet-Draft.  Internet-Drafts are working
15	  documents of the Internet Engineering Task Force (IETF), its areas,
16	  and its working groups.  Note that other groups may also distribute
17	  working documents as Internet-Drafts.

19	  Internet-Drafts are draft documents valid for a maximum of six months
20	  and may be updated, replaced, or obsoleted by other documents at any
21	  time.  It is inappropriate to use Internet- Drafts as reference
22	  material or to cite them other than as ``work in progress.''

24	  To learn the current status of any Internet-Draft, please check the
25	  ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
26	  Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
27	  munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
28	  ftp.isi.edu (US West Coast).

30	  Distribution of this memo is unlimited.

32	                                 Abstract

34	  This document describes a payload type for bundled, MPEG-2 encoded
35	  video and audio data that may be used with RTP, version 2. Bundling
36	  has some advantages for this payload type particularly when it is used
37	  for video-on-demand applications. This payload type may be used when
38	  its advantages are important enough to sacrifice the modularity of
39	  having separate audio and video streams.

41	  A technique to improve packet loss resilience based on ''out-of-band''
42	  transmission of MPEG-2 specific, vital information is described as an
43	  Appendix.

45	  A section on security considerations for this payload type is added.

47	  1. Introduction

49	  This document describes a bundled packetization scheme for MPEG-2
50	  encoded audio and video streams using the Real-time Transport Protocol
51	  (RTP), version 2 [1].

53	  The MPEG-2 International standard consists of three layers: audio,
54	  video and systems [2]. The audio and the video layers define the
55	  syntax and semantics of the corresponding "elementary streams." The
56	  systems layer supports synchronization and interleaving of multiple
57	  compressed streams, buffer initialization and management, and time
58	  identification.  RFC 2038 [3] describes packetization techniques to
59	  transport individual audio and video elementary streams as well as the
60	  transport stream, which is defined at the system layer, using the RTP.

62	  The bundled packetization scheme is needed because it has several
63	  advantages over other schemes for some important applications
64	  including video-on-demand (VOD) where, audio and video are always used
65	  together.  Its advantages over independent packetization of audio and
66	  video are:

68	    1. Uses a single port per "program" (i.e. bundled A/V).
69	    This may increase the number of streams that can be served
70	    e.g., from a VOD server. Also, it eliminates the performance
71	    hit when two ports are used for the separate audio and video
72	    messages on the client side.

74	    2. Provides implicit synchronization of audio and video.
75	    This is particularly convenient when the A/V data is stored
76	    in an interleaved format at the server.

78	    3. Reduces the header overhead. Since using large packets
79	    increases the effects of losses and delay, audio only
80	    packets need to be smaller increasing the overhead. An
81	    A/V bundled format can provide about 1% overall overhead
82	    reduction. Considering the high bitrates used for MPEG-2
83	    encoded material, e.g. 4 Mbps, the number of bits saved,
84	    e.g. 40 Kbps, may provide noticeable audio or video
85	    quality improvement.

87	    4. May reduce overall receiver buffer size. Audio and
88	    video streams may experience different delays when
89	    transmitted separately. The receiver buffers need to be
90	    designed for the longest of these delays. For example,
91	    let's assume that using two buffers, each with a size B,
92	    is sufficient with probability P when each stream is
93	    transmitted individually. The probability that the same
94	    buffer size will be sufficient when both streams need to
95	    be received is P times the conditional probability of B
96	    being sufficient for the second stream given that it was
97	    sufficient for the first one. This conditional probability
98	    is, generally, less than one requiring use of a larger
99	    buffer size to achieve the same probability level.

101	    5. May help with the control of the overall bandwidth used
102	    by an A/V program.

104	 And, the advantages over packetization of the transport layer streams
105	 are:

107	   1. Reduced overhead. It does not contain systems layer
108	   information which is redundant for the RTP (essentially
109	   they address similar issues).

111	   2. Easier error recovery. Because of the structured
112	   packetization consistent with the application layer
113	   framing (ALF) principle, loss concealment and error
114	   recovery can be made simpler and more effective.

116	2. Encapsulation of Bundled MPEG Video and Audio

118	Video encapsulation follows rules similar to the ones described in [3]
119	for encapsulation of MPEG elementary streams. Specifically,

121	  1. The MPEG Video_Sequence_Header, when present, will always
122	     be at the beginning of an RTP payload.
123	  2. An MPEG GOP_header, when present, will always be at the
124	     beginning of the RTP payload, or will follow a
125	     Video_Sequence_Header.
126	  3. An MPEG Picture_Header, when present, will always be at the
127	     beginning of a RTP payload, or will follow a GOP_header.

129	In addition to these, it is required that:

131	  4. Each packet must contain an integral number of video slices.

133	It is the application's responsibility to adjust the slice sizes and the
134	number of slices put in each RTP packet so that lower level
135	fragmentation does not occur. This approach simplifies the receivers
136	while somewhat increasing the complexity of the transmitter's
137	packetizer. Considering that a slice can be as small as a single
138	macroblock, it is possible to prevent fragmentation for most of the
139	cases.  If  a packet size exceeds the path maximum transmission unit
140	(path-MTU) [4], this payload type depends on the lower protocol layers
141	for fragmentation and this may cause problems with packet classification
142	for integrated services (e.g. with RSVP).

144	The video data is followed by a sufficient number of integral audio
145	frames to cover the duration of the video segment included in a packet.
146	For example, if the first packet contains three 1/900 seconds long
147	slices of video, and Layer I audio coding is used at a 44.1kHz sampling
148	rate, only one audio frame covering 384/44100 seconds of audio need be
149	included in this packet. Since the length of this audio frame (8.71
150	msec.) is longer than that of the video segment contained in this packet
151	(3.33 msec), the next few packets may not contain any audio frames until
152	the packet in which the covered video time extends outside the length of
153	the previously transmitted audio frames. Alternatively, it is possible,
154	in this proposal, to repeat the latest audio frame in "no-audio" packets
155	for packet loss resilience. Again, it is the application's
156	responsibility to adjust the bundled packet size according to the
157	minimum MTU size to prevent fragmentation.

159	2.1. RTP Fixed Header for BMPEG Encapsulation

161	The following RTP header fields are used:

163	  Payload Type: A distinct payload type number, which may be dynamic,
164	  should be assigned to BMPEG.

166	  M Bit: Set for packets containing end of a picture.

168	  timestamp: 32-bit 90 kHz timestamp representing sampling time of the
169	  MPEG picture. May not be monotonically increasing if B pictures are
170	  present. Same for all packets belonging to the same picture. For
171	  packets that contain only a sequence, extension and/or GOP header, the
172	  timestamp is that of the subsequent picture.

174	2.2. BMPEG Specific Header:

176	 0                   1                   2                   3
177	 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
178	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
179	| P |N|MBZ|    Audio Length   | |         Audio Offset          |
180	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
181	                              MBZ

183	  P: Picture type (2 bits). I (0), P (1), B (2).

185	  N: Header data changed (1 bit). Set if any part of the video sequence,
186	  extension, GOP and picture header data is different than that of the
187	  previously sent headers. It gets reset when all the header data gets
188	  repeated (see Appendix 2).

190	  MBZ: Must be zero. Reserved for future use.

192	  Audio Length: (10 bits) Length of the audio data in this packet in
193	  bytes. Start of the audio data is found by subtracting "Audio Length"
194	  from the total length of the received packet.

196	  Audio Offset: (16 bits) The offset between the start of the audio
197	  frame and the RTP timestamp for this packet in number of audio samples
198	  (for multi-channel sources, a set of samples covering all channels is
199	  counted as one sample for this purpose.)

201	  Audio offset is a signed integer in two's complement form. It allows a
202	  ~ +/- 750 msec offset at 44.1 KHz audio sampling rate. For a very low
203	  video frame rate (e.g., a frame per second), this offset may not be
204	  sufficient and this payload format may not be usable.

206	  If  B frames are present, audio frames are not re-ordered along with
207	  video.  Instead, they are packetized along with video frames in their
208	  transmission order  (e.g., an audio segment packetized with a video
209	  segment corresponding to a P picture may belong to a B picture, which
210	  will be transmitted later and should be rendered at the same time with
211	  this audio segment.) Even though the video segments are reordered, the
212	  audio offset for a particular audio segment is still relative to the
213	  RTP timestamp in the packet containing that audio segment.

215	  Since a special picture counter, such as  the "temporal reference
216	  (TR)" field of [3], is not included in this payload format, lost GOP
217	  headers may not be detected.  The only effect of this may be incorrect
218	  decoding of the B pictures immediately following the lost GOP header
219	  for some edited video material.

221	3. Security Considerations

223	RTP packets using the payload format defined in this specification are
224	subject to the security considerations discussed in the RTP
225	specification [1]. This implies that confidentiality of the media
226	streams is achieved by encryption. Because the data compression used
227	with this payload format is applied end-to-end, encryption may be
228	performed after compression so there is no conflict between the two
229	operations.

231	This payload type does not exhibit any significant non-uniformity in the
232	receiver side computational complexity for packet processing  to cause a
233	potential denial-of-service threat.

235	A security review of this payload format found no additional
236	considerations beyond those in the RTP specification.

238	Appendix 1. Out-of-band Transmission of the "High Priority" Information

240	In MPEG encoded video, loss of the header information, which includes
241	sequence, GOP, and picture headers, and the corresponding extensions,
242	causes severe degradations in the decoded video. When possible,
243	dependable transmission of the header information to the receivers can
244	improve the loss resiliency of MPEG video significantly [5].  RFC 2038
245	describes a payload type where the header information can be repeated in
246	each RTP packet. Although this is a straightforward approach, it may
247	increase the overhead.

249	The "data partitioning" method in MPEG-2 defines the syntax and
250	semantics for partitioning an MPEG-2 encoded video bitstream into "high
251	priority" and "low priority" parts. If the "high priority" (HP) part is
252	selected to contain only the header information, it is less than two
253	percent of the video data and can be transmitted before the start of the
254	real-time transmission using a reliable protocol. In order to
255	synchronize the HP data with the corresponding real-time stream, the
256	initial value of the timestamp for the real-time stream may be inserted
257	at the beginning of the HP data.

259	Alternatively, the HP data may be transmitted along with the A/V data
260	using layered multimedia transmission techniques for RTP [6].

262	Appendix 2. Error Recovery

264	Packet losses can be detected from a combination of the sequence number
265	and the timestamp fields of the RTP fixed header. The extent of the loss
266	can be determined from the timestamp, the slice number and the
267	horizontal location of the first slice in the packet. The slice number
268	and the horizontal location can be determined from the slice header and
269	the first macroblock address increment, which are located at fixed bit
270	positions.

272	If lost data consists of slices all from the same picture, new data
273	following the loss may simply be given to the video decoder which will
274	normally repeat missing pixels from a previous picture. The next audio
275	frame must be played at the appropriate time determined by the timestamp
276	and the audio offset contained in the received packet. Appropriate audio
277	frames (e.g., representing background noise) may need to be fed to the
278	audio decoder in place of the lost audio frames to keep the lip-synch
279	and/or to conceal the effects of the losses.

281	If the received new data after a loss is from the next picture (i.e. no
282	complete picture loss) and the N bit is not set, previously received
283	headers for the particular picture type (determined from the P bits) can
284	be given to the video decoder followed by the new data. If N is set,
285	data deletion until a new picture start code is advisable unless headers
286	are available from previously received HP data.

288	If data for more than one picture is lost and HP data is not available,
289	unless N is zero and at least one packet has been received for every
290	intervening picture of the same type and that the N bit was 0 for each
291	of those pictures, resynchronization to a new video sequence header is
292	advisable.

294	In all cases of large packet losses, if the HP data is available,
295	appropriate portions of it can be given to the video decoder and the
296	received data can be used irrespective of the N bit value or the number
297	of lost pictures.

299	Appendix 3. Resynchronization

301	As described in [3], use of frequent video sequence headers makes it
302	possible to join in a program at arbitrary times. Also, it reduces the
303	resynchronization time after severe losses.

305	References:

307	[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson,
308	    "RTP: A Transport Protocol for Real-Time Applications,"
309	    RFC 1889, January 1996.

311	[2] ISO/IEC International Standard 13818; "Generic coding of moving
312	    pictures and associated audio information," November 1994.

314	[3] D.Hoffman, G. Fernando, V. Goyal, M. R. Civanlar, "RTP Payload Format
315	    for MPEG1/MPEG2 Video," draft-ietf-avt-mpeg-new-00.txt, April 1997.

317	[4] J. Mogul, S. Deering, "Path MTU Discovery," RFC 1191, November 1990.

319	[5] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based
320	    video-on-demand over ATM packet networks and the WWW," Signal
321	    Processing: Image Communication, no. 8, pp. 221-227, Elsevier, 1996.

323	[6] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia
324	    Streams," Internet Draft, draft-speer-avt-layered-video-02.txt,
325	    December 1996.

327	Author's  Address:

329	   M. Reha Civanlar
330	   Glenn L. Cash
331	   Barry G. Haskell

333	   AT&T Labs-Research
334	   100 Schultz Drive
335	   Red Bank, NJ 07701
336	   USA

338	   e-mail: civanlar|glenn|bgh@research.att.com