idnits 2.17.1 draft-civanlar-bmpeg-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 6 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 139 has weird spacing: '...es. If a pac...' == Line 215 has weird spacing: '...such as the "...' == Line 232 has weird spacing: '...cessing to ca...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 307 looks like a reference -- Missing reference section? '2' on line 311 looks like a reference -- Missing reference section? '3' on line 314 looks like a reference -- Missing reference section? '4' on line 317 looks like a reference -- Missing reference section? '5' on line 319 looks like a reference -- Missing reference section? '6' on line 323 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Reha Civanlar 3 INTERNET-DRAFT Glenn L. Cash 4 File: draft-civanlar-bmpeg-02.txt Barry G. Haskell 6 AT&T Labs-Research 8 November, 1997 10 RTP Payload Format for Bundled MPEG 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as ``work in progress.'' 24 To learn the current status of any Internet-Draft, please check the 25 ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow 26 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 27 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 28 ftp.isi.edu (US West Coast). 30 Distribution of this memo is unlimited. 32 Abstract 34 This document describes a payload type for bundled, MPEG-2 encoded 35 video and audio data that may be used with RTP, version 2. Bundling 36 has some advantages for this payload type particularly when it is used 37 for video-on-demand applications. This payload type may be used when 38 its advantages are important enough to sacrifice the modularity of 39 having separate audio and video streams. 41 A technique to improve packet loss resilience based on ''out-of-band'' 42 transmission of MPEG-2 specific, vital information is described as an 43 Appendix. 45 A section on security considerations for this payload type is added. 47 1. Introduction 49 This document describes a bundled packetization scheme for MPEG-2 50 encoded audio and video streams using the Real-time Transport Protocol 51 (RTP), version 2 [1]. 53 The MPEG-2 International standard consists of three layers: audio, 54 video and systems [2]. The audio and the video layers define the 55 syntax and semantics of the corresponding "elementary streams." The 56 systems layer supports synchronization and interleaving of multiple 57 compressed streams, buffer initialization and management, and time 58 identification. RFC 2038 [3] describes packetization techniques to 59 transport individual audio and video elementary streams as well as the 60 transport stream, which is defined at the system layer, using the RTP. 62 The bundled packetization scheme is needed because it has several 63 advantages over other schemes for some important applications 64 including video-on-demand (VOD) where, audio and video are always used 65 together. Its advantages over independent packetization of audio and 66 video are: 68 1. Uses a single port per "program" (i.e. bundled A/V). 69 This may increase the number of streams that can be served 70 e.g., from a VOD server. Also, it eliminates the performance 71 hit when two ports are used for the separate audio and video 72 messages on the client side. 74 2. Provides implicit synchronization of audio and video. 75 This is particularly convenient when the A/V data is stored 76 in an interleaved format at the server. 78 3. Reduces the header overhead. Since using large packets 79 increases the effects of losses and delay, audio only 80 packets need to be smaller increasing the overhead. An 81 A/V bundled format can provide about 1% overall overhead 82 reduction. Considering the high bitrates used for MPEG-2 83 encoded material, e.g. 4 Mbps, the number of bits saved, 84 e.g. 40 Kbps, may provide noticeable audio or video 85 quality improvement. 87 4. May reduce overall receiver buffer size. Audio and 88 video streams may experience different delays when 89 transmitted separately. The receiver buffers need to be 90 designed for the longest of these delays. For example, 91 let's assume that using two buffers, each with a size B, 92 is sufficient with probability P when each stream is 93 transmitted individually. The probability that the same 94 buffer size will be sufficient when both streams need to 95 be received is P times the conditional probability of B 96 being sufficient for the second stream given that it was 97 sufficient for the first one. This conditional probability 98 is, generally, less than one requiring use of a larger 99 buffer size to achieve the same probability level. 101 5. May help with the control of the overall bandwidth used 102 by an A/V program. 104 And, the advantages over packetization of the transport layer streams 105 are: 107 1. Reduced overhead. It does not contain systems layer 108 information which is redundant for the RTP (essentially 109 they address similar issues). 111 2. Easier error recovery. Because of the structured 112 packetization consistent with the application layer 113 framing (ALF) principle, loss concealment and error 114 recovery can be made simpler and more effective. 116 2. Encapsulation of Bundled MPEG Video and Audio 118 Video encapsulation follows rules similar to the ones described in [3] 119 for encapsulation of MPEG elementary streams. Specifically, 121 1. The MPEG Video_Sequence_Header, when present, will always 122 be at the beginning of an RTP payload. 123 2. An MPEG GOP_header, when present, will always be at the 124 beginning of the RTP payload, or will follow a 125 Video_Sequence_Header. 126 3. An MPEG Picture_Header, when present, will always be at the 127 beginning of a RTP payload, or will follow a GOP_header. 129 In addition to these, it is required that: 131 4. Each packet must contain an integral number of video slices. 133 It is the application's responsibility to adjust the slice sizes and the 134 number of slices put in each RTP packet so that lower level 135 fragmentation does not occur. This approach simplifies the receivers 136 while somewhat increasing the complexity of the transmitter's 137 packetizer. Considering that a slice can be as small as a single 138 macroblock, it is possible to prevent fragmentation for most of the 139 cases. If a packet size exceeds the path maximum transmission unit 140 (path-MTU) [4], this payload type depends on the lower protocol layers 141 for fragmentation and this may cause problems with packet classification 142 for integrated services (e.g. with RSVP). 144 The video data is followed by a sufficient number of integral audio 145 frames to cover the duration of the video segment included in a packet. 146 For example, if the first packet contains three 1/900 seconds long 147 slices of video, and Layer I audio coding is used at a 44.1kHz sampling 148 rate, only one audio frame covering 384/44100 seconds of audio need be 149 included in this packet. Since the length of this audio frame (8.71 150 msec.) is longer than that of the video segment contained in this packet 151 (3.33 msec), the next few packets may not contain any audio frames until 152 the packet in which the covered video time extends outside the length of 153 the previously transmitted audio frames. Alternatively, it is possible, 154 in this proposal, to repeat the latest audio frame in "no-audio" packets 155 for packet loss resilience. Again, it is the application's 156 responsibility to adjust the bundled packet size according to the 157 minimum MTU size to prevent fragmentation. 159 2.1. RTP Fixed Header for BMPEG Encapsulation 161 The following RTP header fields are used: 163 Payload Type: A distinct payload type number, which may be dynamic, 164 should be assigned to BMPEG. 166 M Bit: Set for packets containing end of a picture. 168 timestamp: 32-bit 90 kHz timestamp representing sampling time of the 169 MPEG picture. May not be monotonically increasing if B pictures are 170 present. Same for all packets belonging to the same picture. For 171 packets that contain only a sequence, extension and/or GOP header, the 172 timestamp is that of the subsequent picture. 174 2.2. BMPEG Specific Header: 176 0 1 2 3 177 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 179 | P |N|MBZ| Audio Length | | Audio Offset | 180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 181 MBZ 183 P: Picture type (2 bits). I (0), P (1), B (2). 185 N: Header data changed (1 bit). Set if any part of the video sequence, 186 extension, GOP and picture header data is different than that of the 187 previously sent headers. It gets reset when all the header data gets 188 repeated (see Appendix 2). 190 MBZ: Must be zero. Reserved for future use. 192 Audio Length: (10 bits) Length of the audio data in this packet in 193 bytes. Start of the audio data is found by subtracting "Audio Length" 194 from the total length of the received packet. 196 Audio Offset: (16 bits) The offset between the start of the audio 197 frame and the RTP timestamp for this packet in number of audio samples 198 (for multi-channel sources, a set of samples covering all channels is 199 counted as one sample for this purpose.) 201 Audio offset is a signed integer in two's complement form. It allows a 202 ~ +/- 750 msec offset at 44.1 KHz audio sampling rate. For a very low 203 video frame rate (e.g., a frame per second), this offset may not be 204 sufficient and this payload format may not be usable. 206 If B frames are present, audio frames are not re-ordered along with 207 video. Instead, they are packetized along with video frames in their 208 transmission order (e.g., an audio segment packetized with a video 209 segment corresponding to a P picture may belong to a B picture, which 210 will be transmitted later and should be rendered at the same time with 211 this audio segment.) Even though the video segments are reordered, the 212 audio offset for a particular audio segment is still relative to the 213 RTP timestamp in the packet containing that audio segment. 215 Since a special picture counter, such as the "temporal reference 216 (TR)" field of [3], is not included in this payload format, lost GOP 217 headers may not be detected. The only effect of this may be incorrect 218 decoding of the B pictures immediately following the lost GOP header 219 for some edited video material. 221 3. Security Considerations 223 RTP packets using the payload format defined in this specification are 224 subject to the security considerations discussed in the RTP 225 specification [1]. This implies that confidentiality of the media 226 streams is achieved by encryption. Because the data compression used 227 with this payload format is applied end-to-end, encryption may be 228 performed after compression so there is no conflict between the two 229 operations. 231 This payload type does not exhibit any significant non-uniformity in the 232 receiver side computational complexity for packet processing to cause a 233 potential denial-of-service threat. 235 A security review of this payload format found no additional 236 considerations beyond those in the RTP specification. 238 Appendix 1. Out-of-band Transmission of the "High Priority" Information 240 In MPEG encoded video, loss of the header information, which includes 241 sequence, GOP, and picture headers, and the corresponding extensions, 242 causes severe degradations in the decoded video. When possible, 243 dependable transmission of the header information to the receivers can 244 improve the loss resiliency of MPEG video significantly [5]. RFC 2038 245 describes a payload type where the header information can be repeated in 246 each RTP packet. Although this is a straightforward approach, it may 247 increase the overhead. 249 The "data partitioning" method in MPEG-2 defines the syntax and 250 semantics for partitioning an MPEG-2 encoded video bitstream into "high 251 priority" and "low priority" parts. If the "high priority" (HP) part is 252 selected to contain only the header information, it is less than two 253 percent of the video data and can be transmitted before the start of the 254 real-time transmission using a reliable protocol. In order to 255 synchronize the HP data with the corresponding real-time stream, the 256 initial value of the timestamp for the real-time stream may be inserted 257 at the beginning of the HP data. 259 Alternatively, the HP data may be transmitted along with the A/V data 260 using layered multimedia transmission techniques for RTP [6]. 262 Appendix 2. Error Recovery 264 Packet losses can be detected from a combination of the sequence number 265 and the timestamp fields of the RTP fixed header. The extent of the loss 266 can be determined from the timestamp, the slice number and the 267 horizontal location of the first slice in the packet. The slice number 268 and the horizontal location can be determined from the slice header and 269 the first macroblock address increment, which are located at fixed bit 270 positions. 272 If lost data consists of slices all from the same picture, new data 273 following the loss may simply be given to the video decoder which will 274 normally repeat missing pixels from a previous picture. The next audio 275 frame must be played at the appropriate time determined by the timestamp 276 and the audio offset contained in the received packet. Appropriate audio 277 frames (e.g., representing background noise) may need to be fed to the 278 audio decoder in place of the lost audio frames to keep the lip-synch 279 and/or to conceal the effects of the losses. 281 If the received new data after a loss is from the next picture (i.e. no 282 complete picture loss) and the N bit is not set, previously received 283 headers for the particular picture type (determined from the P bits) can 284 be given to the video decoder followed by the new data. If N is set, 285 data deletion until a new picture start code is advisable unless headers 286 are available from previously received HP data. 288 If data for more than one picture is lost and HP data is not available, 289 unless N is zero and at least one packet has been received for every 290 intervening picture of the same type and that the N bit was 0 for each 291 of those pictures, resynchronization to a new video sequence header is 292 advisable. 294 In all cases of large packet losses, if the HP data is available, 295 appropriate portions of it can be given to the video decoder and the 296 received data can be used irrespective of the N bit value or the number 297 of lost pictures. 299 Appendix 3. Resynchronization 301 As described in [3], use of frequent video sequence headers makes it 302 possible to join in a program at arbitrary times. Also, it reduces the 303 resynchronization time after severe losses. 305 References: 307 [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, 308 "RTP: A Transport Protocol for Real-Time Applications," 309 RFC 1889, January 1996. 311 [2] ISO/IEC International Standard 13818; "Generic coding of moving 312 pictures and associated audio information," November 1994. 314 [3] D.Hoffman, G. Fernando, V. Goyal, M. R. Civanlar, "RTP Payload Format 315 for MPEG1/MPEG2 Video," draft-ietf-avt-mpeg-new-00.txt, April 1997. 317 [4] J. Mogul, S. Deering, "Path MTU Discovery," RFC 1191, November 1990. 319 [5] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based 320 video-on-demand over ATM packet networks and the WWW," Signal 321 Processing: Image Communication, no. 8, pp. 221-227, Elsevier, 1996. 323 [6] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia 324 Streams," Internet Draft, draft-speer-avt-layered-video-02.txt, 325 December 1996. 327 Author's Address: 329 M. Reha Civanlar 330 Glenn L. Cash 331 Barry G. Haskell 333 AT&T Labs-Research 334 100 Schultz Drive 335 Red Bank, NJ 07701 336 USA 338 e-mail: civanlar|glenn|bgh@research.att.com