idnits 2.17.1 draft-ietf-avt-mpeg-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 120 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 424 looks like a reference -- Missing reference section? '2' on line 428 looks like a reference -- Missing reference section? '3' on line 431 looks like a reference -- Missing reference section? '4' on line 433 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Don Hoffman 3 INTERNET-DRAFT Gerard Fernando 4 File: draft-ietf-avt-mpeg-01.txt Steve Kleiman 5 Sun Microsystems, Inc. 7 Vivek Goyal 8 USC/Information Sciences Institute 10 November, 1995 11 Expires: June 1, 1996 13 RTP Payload Format for MPEG1/MPEG2 Video 15 Status of this Memo 17 This document is an Internet-Draft. Internet-Drafts are working documents of 18 the Internet Engineering Task Force (IETF), its areas, and its working 19 groups. Note that other groups may also distribute working documents as 20 Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months and may 23 be updated, replaced, or obsoleted by other documents at any time. It is 24 inappropriate to use Internet-Drafts as reference material or to cite them 25 other than as "work in progress." 27 To learn the current status of any Internet-Draft, please check the 28 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 29 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au 30 (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West 31 Coast). 33 Distribution of this memo is unlimited. 35 Abstract 37 This draft describes a packetization scheme for MPEG video and audio 38 streams. The scheme proposed can be used to transport such a video or audio 39 flow over the transport protocols supported by RTP. Two approaches are 40 described. The first is designed to support maximum interoperability with 41 MPEG System environments. The second is designed to provide maximum 42 compatibility with other RTP-encapsulated media streams and future conference 43 control work of the IETF. 45 0. What's Changed Since Last Version 47 1) Added rules for encapsulating MPEG1 System and MPEG2 Program 48 streams. 50 2) Changed meaning of M-bit in system stream encapsulations. 52 3) Make Audio ES header an even 32 bits. 54 4) Changed format of ES header. 56 1. Introduction 58 ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has defined 59 the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard (ISO/IEC 60 13818)[2]. This draft describes a packetization scheme to transport MPEG 61 video and audio streams using the Real-time Transport Protocol (RTP), version 62 2 [3, 4]. 64 The MPEG1 specification is defined in three parts: System, Video and Audio. 65 It is designed primarily for CD-ROM-based applications, and is optimized for 66 approximately 1.5 Mbits/sec combined data rates. The video and audio portions 67 of the specification describe the basic format of the video or audio stream. 68 These formats define the Elementary Streams (ES). The MPEG1 System 69 specification defines an encapsulation of the ES that contains 70 Presentation Time Stamps (PTS), Decoding Time Stamps and System Clock 71 references, and performs multiplexing of MPEG1 compressed video and audio 72 ES's with user data. 74 The MPEG2 specification is structured in a similar way. However, it hasn't 75 been restricted only to CD-ROM applications. The MPEG2 System specification 76 defines two system stream formats: the MPEG2 Transport Stream (MTS) and the 77 MPEG2 Program Stream (MPS). The MTS is tailored for communicating or storing 78 one or more programs of MPEG2 compressed data and also other data in 79 relatively error-prone environments. The MPS is tailored for relatively 80 error-free environments. 82 We seek to achieve interoperability among 4 types of end-systems in the 83 following specification. The 4 types are: 85 1. Transmitting Interworking Unit (TIU) 87 Receives MPEG information from a native MTS system for 88 distribution over packet networks using a native RTP-based system 89 layer (such as an IP-based internetwork). Examples: real-time 90 encoder, MTS satellite link to Internet, video server with 91 MTS-encoded source material. 93 2. Receiving Interworking Unit (RIU) 95 Receives MPEG information in real time from an RTP-based network 96 for forwarding to a native MTS environment. Examples: 97 Internet-based video server to MTS-based cable distribution 98 plant. 100 3. Transmitting Internet End-System (TAES) 102 Transmits MPEG information generated or stored within the internet 103 end-system itself, or received from internet-based computer networks. 104 Example: video server. 106 4. Receiving Internet End-System (RAES) 108 Receives MPEG information over an RTP-based internet for 109 consumption at the internet end-system or forwarding to 110 traditional computer network. Example: desktop PC or workstation 111 viewing training video. 113 Each of the 2 types of transmitters must work with each of the 2 types of 114 receivers. Because it is probable that the TAES, and certain that the RAES, 115 will be based on existing and planned internet-connected computers, it is 116 highly desirable for the interoperable protocol to be based on RTP. 118 Because of the range of applications that might employ MPEG streams, we 119 propose to define two payload formats. 121 Much interest in the MPEG community is in the use of one of the MPEG System 122 encodings, and hence, in Section 2 we propose encapsulations of MPEG1 System 123 streams and MPEG2 Transport and Program Streams with RTP. This profile 124 supports the full semantics of MPEG System and offers basic interoperability 125 among all four end-system types. 127 When operating only among internet-based end-systems (i.e., TAES and RAES) a 128 payload format that provides greater compatibility with the Internet 129 architecture is desired, deferring some of the system issues to other 130 protocols being defined in the Internet community (such as the MMUSIC WG). 131 In Section 3 we propose an encapsulation of compressed video and audio data 132 (referred to in MPEG documentation as "Elementary Streams" (ES)) complying 133 with either MPEG1 or MPEG2. Here, neither of the System standards of MPEG1 or 134 MPEG2 are utilized. The ES's are directly encapsulated with RTP. 136 Throughout this specification, we make extensive use of MPEG terminology. 137 The reader should consult the primary MPEG references for definitive 138 descriptions of this terminology. 140 2. Encapsulation of MPEG System and Transport Streams 142 Each RTP packet will contain a timestamp derived from the sender's 90KHz 143 clock reference. This clock is synchronized to the system stream Program 144 Clock Reference (PCR) or System Clock Reference (SCR) and represents the 145 target transmission time of the first byte of the packet payload. The RTP 146 timestamp will not be passed to the MPEG decoder. This use of the timestamp 147 is somewhat different than normally is the case in RTP, in that it is not 148 considered to be the media display or presentation timestamp. The primary 149 purposes of the RTP timestamp will be to estimate and reduce any 150 network-induced jitter and to synchronize relative time drift between the 151 transmitter and receiver. 153 For MPEG2 Transport Streams the RTP payload will contain an integral number 154 of MPEG transport packets. To avoid end system inefficiencies, data from 155 multiple small MTS packets (normally fixed in size at 188 bytes) are 156 aggregated into a single RTP packet. The number of transport packets 157 contained is computed by dividing RTP payload length by the length of an MTS 158 packet (188). 160 For MPEG2 Program streams and MPEG1 system streams there are no packetization 161 restrictions; these streams are treated as a packetized stream of bytes. 163 2.1 RTP header usage 165 The RTP header fields are used as follows: 167 Payload Type: Distinct payload types should be assigned for 168 of MPEG1 System Streams, MPEG2 Program Streams and MPEG2 Transport 169 Streams. See [4] for payload type assignments. 171 M bit: Set to 1 whenever the timestamp is discontinuous 172 (such as might happen when a sender switches from one data source 173 to another). This allows the receiver and any intervening RTP 174 mixers or translators that are synchronizing to the flow to ignore 175 the difference between this timestamp and any previous timestamp in 176 their clock phase detectors. 178 timestamp: 32 bit 90K Hz timestamp representing the target 179 transmission time for the first byte of the packet. 181 3. Encapsulation of MPEG Elementary Streams 183 The following ES types may be encapsulated directly in RTP: 184 (a) MPEG1 Video (ISO/IEC 11172-2) 185 (b) MPEG2 Video (ISO/IEC 13818-2) 186 (c) MPEG1 Audio (ISO/IEC 11172-3) 187 (d) MPEG2 Audio (ISO/IEC 13818-3) 189 A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and MPEG1/MPEG2 190 Audio, respectively. Further indication as to whether the data is MPEG1 or 191 MPEG2 need not be provided in the RTP or MPEG-specific headers of this 192 encapsulation, as this information is available in the ES headers. 194 Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz 195 shall be carried in the fixed RTP header. All packets that make up a 196 audio or video frame shall have the same time stamp. 198 3.1 MPEG Video elementary streams 200 MPEG1 Video can be distinguished from MPEG2 Video at the video sequence 201 header, i.e. for MPEG2 Video a sequence_header() is followed by 202 sequence_extension(). The particular profile and level of MPEG2 Video 203 (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc) are determined 204 by the profile_and_level_indicator field of the sequence_extension 205 header of MPEG2 Video. 207 The MPEG bit-stream semantics were designed for relatively error-free 208 environments, and there is significant amount of dependency (both temporal 209 and spatial) within the stream such that loss of some data make other 210 uncorrupted data useless. The format as defined in this encapsulation uses 211 application layer framing information plus additional information in the RTP 212 stream-specific header to allow for certain recovery mechanisms. Appendix 213 1 suggests several recovery strategies based on the properties of this 214 encapsulation. 216 Since MPEG pictures can be large, they will normally be fragmented into 217 packets of size less than a typical LAN/WAN MTU. The following fragmentation 218 rules apply: 220 1. The MPEG Video_Sequence_Header, when present, will always be at 221 the beginning of an RTP payload. 222 2. An MPEG GOP_header, when present, will always be at the beginning 223 of the RTP payload, or will follow a Video_Sequence_Header. 224 3. An MPEG Picture_Header, when present, will always be at the 225 beginning of a RTP payload, or will follow a GOP_header. 227 Each ES header must be completely contained within the packet. Consequently, 228 a minimum RTP payload size of 261 bytes must be supported to contain the 229 largest single header defined in the ES (that is, the extension_data() header 230 containing the quant_matrix_extension()). Otherwise, there are no 231 restrictions on where headers may appear within packet payloads. 233 In MPEG, each picture is made up of one or more "slices," and a slice is 234 intended to be the unit of recovery from data loss or corruption. An 235 MPEG-compliant decoder will normally advance to the beginning of next slice 236 whenever an error is encountered in the stream. MPEG slice begin and end 237 bits are provided in the encapsulation header to facilitate this. 239 The beginning of a slice must either be the first data in a packet (after any 240 MPEG ES headers) or must follow after some integral number of slices in a 241 packet. This requirement insures that the beginning of the next slice after 242 one with a missing packet can be found without requiring that the receiver 243 scan the packet contents. Slices may be fragmented across packets as long as 244 all the above rules are met. 246 An implementation based on this encapsulation assumes that the 247 Video_Sequence_Header is repeated periodically in the MPEG bit-stream. In 248 practice (though not required by MPEG standard) this is used to allow channel 249 switching and to receive and start decoding a continuously relayed MPEG 250 bit-stream at arbitrary points in the media stream. It is suggested that 251 when playing back from an MPEG stream from a file format (where the 252 Video_Sequence_Header may only be represented at the beginning of the stream) 253 that the first Video_Sequence_Header (preceeded by an end-of-stream 254 indicator) be saved by the packetizer for periodic injection in to the 255 network stream. 257 3.2 MPEG Audio elementary streams 259 MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG 260 ancillary_data() header. For either MPEG1 or MPEG2 Audio, distinct 261 Presentation Time Stamps may be present for frames which correspond to either 262 384 samples for Layer-I, or 1152 samples for Layer-II or Layer-III. The 263 actual number of bytes required to represent this number of samples will vary 264 depending on the encoder parameters. 266 Multiple audio frames may be encapsulated within one RTP packet. In this 267 case, an integral number of audio frames must be contained within the 268 packet and the fragmentation header defined in Section 3.5 shall 269 be set to 0. 271 Also, if relatively short packets are to be used, one frame may be so large 272 that it may straddle multiple RTP packets. For example, for Layer-II MPEG 273 audio sampled at a rate of 44.1 KHz each frame would represent a time slot of 274 26.1 msec. At this sampling rate if the compressed bit-rate is 384 kbits/sec 275 (i.e. 48 kBytes/sec) then the average audio frame size would be 1.25 276 KBytes. If packets were to be 500 Bytes long, then each audio frame would 277 straddle 3 RTP packets. The audio fragmentation indicator header (See 278 Section 3.5) shall be present for an MPEG1/2 Audio payload type to provide 279 for this fragmentation. 281 3.3 RTP Fixed Header for MPEG ES encapsulation 283 The RTP header fields are used as follows: 285 Payload Type: Distinct payload types should be assigned 286 for video elementary streams and audio elementary streams. 287 See [4] for payload type assignments. 289 M bit: For video, set to 1 on packet containing MPEG frame end code, 290 0 otherwise. For audio, set to 1 on first packet of a "talk-spurt," 291 0 otherwise. 293 PT: MPEG video or audio stream ID. 295 timestamp: 32-bit 90K Hz timestamp representing presentation time 296 of MPEG picture or audio frame. Same for all packets that make up 297 a picture or audio frame. May not be monotonically increasing in 298 video stream if B pictures present in stream. For packets that 299 contain only a video sequence and/or GOP header, the timestamp is 300 that of the subsequent picture. 302 3.4 MPEG Video-specific header 304 This header shall be attached to each RTP packet after the RTP fixed header. 306 0 1 2 3 307 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 309 | MBZ | TR | MBZ |S|B|E| P | | BFC | | FFC | 310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 311 FBV FFV 313 MBZ: Unused. Must be set to zero in current specification. This 314 space is reserved for future use. 316 TR: Temporal-Reference (10 bits). The temporal reference of the 317 current picture within the current GOP. This value ranges from 318 0-1023 and is constant for all RTP packets of a given 319 picture. 321 MBZ: Unused. Must be set to zero in current specification. This 322 space is reserved for future use. 324 S: Sequence-header-present (1 bit). Normally 0 and set to 1 at 325 the occurrence of each MPEG sequence header. Used to detect 326 presence of sequence header in RTP packet. 328 B: Beginning-of-slice (BS) (1 bit). Set when the start of the 329 packet payload is a slice start code, or when a slice start code 330 is preceeded only by one or more of a Video_Sequence_Header, 331 GOP_header and/or Picture_Header. 333 E: End-of-slice (ES) (1 bit). Set when the last byte of the payload 334 is the end of an MPEG slice. 336 P: Picture-Type (2 bits). I (1), P (2), B (3) or D (4). This value 337 is constant for each RTP packet of a given picture. 339 FBV: full_pel_backward_vector 340 BFC: backward_f_code 341 FFV: full_pel_forward_vector 342 FFC: forward_f_code 343 Obtained from the most recent picture header, and are constant 344 for each RTP packet of a given picture. None of these values 345 are used for I frames and must be set to zero in the RTP 346 header. For P frames only the last two values are present and 347 FBV and BFC must be set to zero in the RTP header. For B 348 frames all the four values are present. 350 3.5 MPEG Audio-specific header 352 This header shall be attached to each RTP packet at the start of the payload 353 and after any RTP headers for an MPEG1/2 Audio payload type. 355 0 1 2 3 356 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 358 | MBZ | Frag_offset | 359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 361 Frag_offset: Byte offset into the audio frame for the data 362 in this packet. 364 Appendix 1. Error Recovery and Resynchronization Strategies. 366 The following error recovery and resynchronization strategies are intended 367 to be guidelines only. A compliant receiver is free to employ alternative 368 (or no) strategies. 370 When initially decoding an RTP-encapsulated MPEG Elementary Stream, the 371 receiver may discard all packets until the Sequence-header-present bit is set 372 to 1. At this point, sufficient state information is contained in the stream 373 to allow processing by an MPEG decoder. 375 Loss of packets containing the GOP_header and/or Picture_Header are detected 376 by an unexpected change in the Temporal-Reference and Picture-Type values. 377 Consider the following example GOP sequence: 379 In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ... 380 In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ... 382 Consider also two counters: 384 ref_pic_temp (Reference Picture (I,P) Temporal Reference) 385 dep_pic_temp (Dependent Picture (B) Temporal Reference) 387 At each GOP beginning, set these counters to the temporal reference value of 388 the corresponding picture type. For our example GOP sequence, ref_pic_temp = 389 2 and dep_pic_temp = 0. Keep incrementing BOTH counters by unity with each 390 following picture. Ref_pic_temp should match the temporal references of 391 the I and P frames, and dep_pic_temp should match the temporal references 392 of the B frames. 394 dep_pic_temp: - 0 1 2 3 4 5 6 7 8 9 395 In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ... 396 ref_pic_temp: 2 3 4 5 6 7 8 9 10 ^ 11 397 -------------------------- | ^ 398 Match Drop | 399 Mismatch 400 in ref_pic_temp 402 The loss of a GOP header can be detected by matching the appropriate counter 403 (based on picture type) to the temporal reference value. A mismatch indicates 404 a lost GOP header. If desired, a GOP header can be re-constructed using a 405 "null" time_code, repeating the closed_gop flag from previous GOP headers, 406 and setting the broken_link flag to 1. 408 The loss of a Picture_Header can also be detected by a mismatch in the 409 Temporal Reference contained in the RTP packet from the appropriate 410 dep_pic_temp or ref_pic_temp counters at the receiver. After scanning to the 411 next Beginning-of-slice the Picture_Header is reconstructed from the P, TR, 412 FBV, BFC, FFV and FFC contained in that packet, and from stream-dependent 413 default values. 415 Any time an RTP packet is lost (as indicated by a gap in the RTP sequence 416 number), the receiver may discard all packets until the Beginning-of-slice 417 bit is set. At this point, sufficient state information is contained in the 418 stream to allow processing by an MPEG decoder starting at the next slice 419 boundary (possibly after reconstruction of the GOP_header and/or 420 Picture_Header as described above). 422 References: 424 [1] ISO/IEC International Standard 11172; "Coding of moving pictures and 425 associated audio for digital storage media up to about 1,5 Mbits/s", 426 November 1993. 428 [2] ISO/IEC International Standard 13818; "Generic coding of moving pictures 429 and associated audio information", November 1994. 431 [3] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, 432 "RTP: A Transport Protocol for Real-Time Applications", 433 [4] H. Schulzrinne, "RTP Profile for Audio and Video Conferences 434 with Minimal Control", Internet Draft, November 21, 1995 436 Expires: June 1, 1995 438 Authors' Addresses: 440 Gerard Fernando 441 Sun Microsystems, Inc. 442 Mail-stop UMPK14-305 443 2550 Garcia Avenue 444 Mountain View, California 94043-1100 445 USA 446 phone: +1 415-786-6373 447 email: gerard.fernando@eng.sun.com 449 Vivek Goyal 450 USC/Information Sciences Institute 451 4676 Admiralty Way, Suite 1000 452 Marina Del Rey, CA 90292-6695 453 USA 454 phone: +1 310-822-1511 455 e-mail: goyal@isi.edu 457 Don Hoffman 458 Sun Microsystems, Inc. 459 Mail-stop UMPK14-305 460 2550 Garcia Avenue 461 Mountain View, California 94043-1100 462 USA 463 phone: +1 503-297-1580 464 email: don.hoffman@eng.sun.com 466 Steve Kleiman 467 Sun Microsystems, Inc. 468 Mail-stop UMPK17-2029 469 2550 Garcia Avenue 470 Mountain View, California 94043-1100 471 USA 472 email: steve.kleiman@eng.sun.com