idnits 2.17.1 draft-ietf-avt-mpeg1and2-mod-00.txt: -(9): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 2003) is 7559 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 79 looks like a reference -- Missing reference section? '2' on line 80 looks like a reference -- Missing reference section? '3' on line 637 looks like a reference -- Missing reference section? '4' on line 637 looks like a reference -- Missing reference section? '5' on line 169 looks like a reference -- Missing reference section? '6' on line 215 looks like a reference Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group 3 Internet Draft D. Hoffman 4 Document: draft-ietf-avt-mpeg1and2-mod-00.txt G.Fernando 5 Expires: March 2004 Sun Microsystems, Inc. 6 V. Goyal 7 Packet Design, Inc. 8 M. R. Civanlar 9 Ko� University 10 August 2003 12 RTP Payload Format for MPEG1/MPEG2 14 STATUS OF THIS MEMO 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This memo describes a packetization scheme for MPEG video and audio 37 streams. The scheme proposed can be used to transport such a video 38 or audio flow over the transport protocols supported by RTP. Two 39 approaches are described. The first is designed to support maximum 40 interoperability with MPEG System environments. The second is 41 designed to provide maximum compatibility with other RTP-encapsulated 42 media streams and future conference control work of the IETF. 44 Most of this memo is identical to RFC 2250, an Internet standards 45 track RTP payload format definition. No changes have been made in the 46 packet formats on the wire. The main reason for this revision is to 47 allow the use of this payload format with dynamic payload types 48 that can specify the timestamp clock frequency by non-RTP means for 49 improved jitter compensation. We used this opportunity to improve the 50 description of the payload format specification by clarifying some 51 wording that have been reported to be problematic. 53 Table of Contents 55 1. Introduction...................................................2 56 2. Encapsulation of MPEG System and Transport Streams.............4 57 2.1 RTP header usage...........................................5 58 3. Encapsulation of MPEG Elementary Streams.......................5 59 3.1 MPEG Video elementary streams..............................5 60 3.2 MPEG Audio elementary streams..............................7 61 3.3 RTP Fixed Header for MPEG ES encapsulation.................7 62 3.4 MPEG Video-specific header.................................8 63 3.4.1 MPEG-2 Video-specific header extension...................9 64 3.5 MPEG Audio-specific header................................11 65 A. Error Recovery and Resynchronization Strategies...............11 66 B. Changes from RFC 2250.........................................13 67 C. Security Considerations.......................................14 68 D. References....................................................14 69 E. Author's Addresses............................................15 71 1. Introduction 73 [Note to the RFC Editor: This paragraph is to be deleted when this 74 draft is published as an RFC. Readers are directed to Appendix B 75 Changes from RFC 2250, for a listing of the changes that have been 76 made in this draft.] 78 ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has 79 defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard 80 (ISO/IEC 13818)[2]. This memo describes a packetization scheme to 81 transport MPEG video and audio streams using the Real-time Transport 82 Protocol (RTP), version 2 [3, 4]. 84 The MPEG1 specification is defined in three parts: System, Video and 85 Audio. It is designed primarily for CD-ROM-based applications, and 86 is optimized for approximately 1.5 Mbits/sec combined data rates. The 87 video and audio portions of the specification describe the basic 88 format of the video or audio stream. These formats define the 89 Elementary Streams (ES). The MPEG1 System specification defines an 90 encapsulation of the ES that contains Presentation Time Stamps (PTS), 91 Decoding Time Stamps and System Clock references, and performs 92 multiplexing of MPEG1 compressed video and audio ESs with user data. 94 The MPEG2 specification is structured in a similar way. However, it 95 hasn't been restricted only to CD-ROM applications. The MPEG2 System 96 specification defines two system stream formats: the MPEG2 Transport 97 Stream (MTS) and the MPEG2 Program Stream (MPS). The MTS is tailored 98 for communicating or storing one or more programs of MPEG2 compressed 99 data and also other data in relatively error-prone environments. The 100 MPS is tailored for relatively error-free environments. 102 We seek to achieve interoperability among 4 types of end-systems in 103 the following specification. The 4 types are: 105 1. Transmitting Interworking Unit (TIU) 107 Receives MPEG information from a native MTS system for 108 distribution over packet networks using a native RTP-based 109 system layer (such as an IP-based internetwork). Examples: 110 real-time encoder, MTS satellite link to Internet, video 111 server with MTS-encoded source material. 113 2. Receiving Interworking Unit (RIU) 115 Receives MPEG information in real time from an RTP-based 116 network for forwarding to a native MTS environment. 117 Examples: Internet-based video server to MTS-based cable 118 distribution plant. 120 3. Transmitting Internet End-System (TAES) 122 Transmits MPEG information generated or stored within the 123 internet end-system itself, or received from internet-based 124 computer networks. Example: video server. 126 4. Receiving Internet End-System (RAES) 128 Receives MPEG information over an RTP-based internet for 129 consumption at the internet end-system or forwarding to 130 traditional computer network. Example: desktop PC or 131 workstation viewing training video. 133 Each of the 2 types of transmitters must work with each of the 2 134 types of receivers. Because it is probable that the TAES, and 135 certain that the RAES, will be based on existing and planned 136 internet-connected computers, it is highly desirable for the 137 interoperable protocol to be based on RTP. 139 Because of the range of applications that might employ MPEG streams, 140 we propose to define two payload formats. 142 Much interest in the MPEG community is in the use of one of the MPEG 143 System encodings, and hence, in Section 2 we propose encapsulations 144 of MPEG1 System streams and MPEG2 Transport and Program Streams with 145 RTP. This profile supports the full semantics of MPEG System and 146 offers basic interoperability among all four end-system types. 148 When operating only among internet-based end-systems (i.e., TAES and 149 RAES) a payload format that provides greater compatibility with the 150 Internet architecture is desired, deferring some of the system issues 151 to other protocols being defined in the Internet community (such as 152 the MMUSIC WG). In Section 3 we propose an encapsulation of 153 compressed video and audio data (referred to in MPEG documentation as 154 "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2. 156 Here, neither of the System standards of MPEG1 or MPEG2 are utilized. 157 The ES's are directly encapsulated with RTP. 159 Throughout this specification, we make extensive use of MPEG 160 terminology. The reader should consult the primary MPEG references 161 for definitive descriptions of this terminology. 163 1.1 Terminology 165 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 166 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 167 document are to be interpreted as described in BCP 14, RFC 2119 [5] 168 and indicate requirement levels for compliant RTP implementations. 170 2. Encapsulation of MPEG System and Transport Streams 172 Each RTP packet will contain a timestamp derived from the sender's 173 clock reference. This clock is synchronized to the system stream 174 Program Clock Reference (PCR) or System Clock Reference (SCR) and 175 represents the target transmission time of the first byte of the 176 packet payload. The RTP timestamp will not be passed to the MPEG 177 decoder. This use of the timestamp is somewhat different than 178 normally is the case in RTP, in that it is not considered to be the 179 media display or presentation timestamp. The primary purposes of the 180 RTP timestamp will be to estimate and reduce any network-induced 181 jitter and to synchronize relative time drift between the transmitter 182 and receiver. 184 For MPEG2 Transport Streams the RTP payload will contain an integral 185 number of MPEG transport packets. To avoid end system 186 inefficiencies, data from multiple small MTS packets (normally fixed 187 in size at 188 bytes) are aggregated into a single RTP packet. The 188 number of transport packets contained is computed by dividing RTP 189 payload length by the length of an MTS packet (188). 191 For MPEG2 Program streams and MPEG1 system streams there are no 192 packetization restrictions; these streams are treated as a packetized 193 stream of bytes. 195 2.1 RTP header usage 197 The RTP header fields are used as follows: 199 Payload Type: Distinct payload types MUST be assigned for MPEG1 200 System Streams, MPEG2 Program Streams and MPEG2 Transport Streams. 201 See [4] for payload type assignments. 203 M bit: Set to 1 whenever the timestamp is discontinuous (such as 204 might happen when a sender switches from one data source to 205 another). This allows the receiver and any intervening RTP mixers 206 or translators that are synchronizing to the flow to ignore the 207 difference between this timestamp and any previous timestamp in 208 their clock phase detectors. 210 timestamp: 32 bit timestamp representing the target transmission 211 time for the first byte of the packet. For the payload type MP2T 212 defined in [4], the clock frequency used for the timestamp is 90 213 kHz. However, this payload format MAY be used with a dynamic 214 payload type where the clock frequency can be specified through 215 non-RTP means e.g. SDP [6]. 217 3. Encapsulation of MPEG Elementary Streams 219 The following ES types may be encapsulated directly in RTP: 221 (a) MPEG1 Video (ISO/IEC 11172-2) (b) MPEG2 Video (ISO/IEC 13818-2) 222 (c) MPEG1 Audio (ISO/IEC 11172-3) (d) MPEG2 Audio(ISO/IEC 13818-3) 224 A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and 225 MPEG1/MPEG2 Audio, respectively. Further indication as to whether the 226 data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG 227 specific headers of this encapsulation, as this information is 228 available in the ES headers. 230 Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz 231 for MPV and MPA payload types as defined in [4] shall be carried in 232 the fixed RTP header. The accuracy of the timestamp MAY be defined by 233 non-RTP means using dynamic payload types with the payload formats 234 defined in this section. All packets that make up an audio or video 235 frame shall have the same time stamp. 237 3.1 MPEG Video elementary streams 239 MPEG1 Video can be distinguished from MPEG2 Video at the video 240 sequence header, i.e. for MPEG2 Video a sequence_header() is followed 241 by sequence_extension(). The particular profile and level of MPEG2 242 Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc.) are 243 determined by the profile_and_level_indicator field of the 244 sequence_extension header of MPEG2 Video. 246 The MPEG bit-stream semantics were designed for relatively error-free 247 environments, and there is significant amount of dependency (both 248 temporal and spatial) within the stream such that loss of some data 249 make other uncorrupted data useless. The format as defined in this 250 encapsulation uses application layer framing information plus 251 additional information in the RTP stream-specific header to allow for 252 certain recovery mechanisms. Appendix A suggests several recovery 253 strategies based on the properties of this encapsulation. 255 Since MPEG pictures can be large, they will normally be fragmented 256 into packets of size less than a typical LAN/WAN MTU. The following 257 fragmentation rules apply: 259 1. The MPEG Video_Sequence_Header, when present, will always be 260 at the beginning of an RTP payload. 262 2. An MPEG GOP_header, when present, will always be at the 263 beginning of the RTP payload, or will follow a 264 Video_Sequence_Header. 266 3. An MPEG Picture_Header, when present, will always be at the 267 beginning of a RTP payload, or will follow a GOP_header. 269 Each ES header must be completely contained within the packet. 270 Consequently, a minimum RTP payload size of 261 bytes must be 271 supported to contain the largest single header defined in the ES 272 (that is, the extension_data() header containing the 273 quant_matrix_extension()). Otherwise, there are no restrictions on 274 where headers may appear within packet payloads. 276 In MPEG, each picture is made up of one or more "slices," and a slice 277 is intended to be the unit of recovery from data loss or corruption. 278 An MPEG-compliant decoder will normally advance to the beginning of 279 next slice whenever an error is encountered in the stream. MPEG 280 slice begin and end bits are provided in the encapsulation header to 281 facilitate this. 283 The beginning of a slice must either be the first data in a packet 284 (after any MPEG ES headers) or must follow after some integral number 285 of slices in a packet. This requirement insures that the beginning 286 of the next slice after one with a missing packet can be found 287 without requiring that the receiver scan the packet contents. Slices 288 may be fragmented across packets as long as all the above rules are 289 met. 291 An implementation based on this encapsulation assumes that the 292 Video_Sequence_Header is repeated periodically in the MPEG bit 293 stream. In practice (though not required by MPEG standard) this is 294 used to allow channel switching and to receive and start decoding a 295 continuously relayed MPEG bit-stream at arbitrary points in the media 296 stream. It is suggested that when playing back from an MPEG stream 297 from a file format (where the Video_Sequence_Header may only be 298 represented at the beginning of the stream) that the first 299 Video_Sequence_Header (preceded by an end-of-stream indicator) be 300 saved by the packetizer for periodic injection in to the network 301 stream. 303 3.2 MPEG Audio elementary streams 305 MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG 306 ancillary_data() header. For either MPEG1 or MPEG2 Audio, distinct 307 Presentation Time Stamps may be present for frames which correspond 308 to either 384 samples for Layer-I, or 1152 samples for Layer-II or 309 Layer-III. The actual number of bytes required to represent this 310 number of samples will vary depending on the encoder parameters. 312 Multiple audio frames may be encapsulated within one RTP packet. In 313 this case, an integral number of audio frames must be contained 314 within the packet and the fragmentation header defined in Section 3.5 315 shall be set to 0. 317 If, however, an audio frame is too large to fit inside a single RTP 318 packet, it is fragmented across multiple successive RTP packets. For 319 example, for Layer-II MPEG audio sampled at a rate of 44.1 KHz each 320 frame would represent a time slot of 26.1 msec. At this sampling rate 321 if the compressed bit-rate is 384 kbits/sec (i.e. 48 kBytes/sec) 322 then the average audio frame size would be 1.25 KBytes. If packets 323 were to be 500 Bytes long, then each audio frame would straddle 3 RTP 324 packets. 326 In this case, the "Frag_offset" field in the "MPEG Audio-specific 327 header" (See Section 3.5) of each such RTP packet is set to the byte 328 offset of the fragment within the entire frame. (Thus, the 329 "Frag_offset" of the first such packet is zero.) If a frame is 330 fragmented across multiple RTP packets, then these packets MUST each 331 contain only one fragment (i.e., they MUST NOT be packed with data 332 from any other frame). 334 3.3 RTP Fixed Header for MPEG ES encapsulation 336 The RTP header fields are used as follows: 338 Payload Type: Distinct payload types should be assigned for video 339 elementary streams and audio elementary streams. See [4] for 340 payload type assignments. 342 M bit: For video, set to 1 if the packet contains the last slice 343 of a picture (or, if the last slice of a picture is fragmented 344 over multiple packets, the last fragment of that slice); set to 0 345 otherwise. For audio, set to 1 on first packet of a "talk-spurt," 346 0 otherwise. 348 PT: MPEG video or audio stream ID. 350 timestamp: 32-bit timestamp representing presentation time of MPEG 351 picture or audio frame. Same for all packets that make up a 352 picture or audio frame. May not be monotonically increasing in 353 video stream if B pictures present in stream. For packets that 354 contain only a video sequence and/or GOP header, the timestamp is 355 that of the subsequent picture. 357 3.4 MPEG Video-specific header 359 This header shall be attached to each RTP packet after the RTP fixed 360 header. 362 0 1 2 3 363 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | MBZ |T| TR | |N|S|B|E| P | | BFC | | FFC | 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 AN FBV FFV 368 MBZ: Unused. Must be set to zero in current specification. This 369 space is reserved for future use. 371 T: MPEG-2 (Two) specific header extension present (1 bit). Set 372 to 1 when the MPEG-2 video-specific header extension (see 373 Section 3.4.1) follows this header. This extension may be 374 needed for improved error resilience; however, its inclusion in 375 an RTP packet is optional. (See Appendix 1.) 377 TR: Temporal-Reference (10 bits). The temporal reference of the 378 current picture within the current GOP. This value ranges from 379 0-1023 and is constant for all RTP packets of a given picture. 381 AN: Active N bit for error resilience (1 bit). Set to 1 when 382 the following bit (N) is used to signal changes in the picture 383 header information for MPEG-2 payloads. It must be set to 0 for 384 MPEG-1 payloads or when N bit is not used. 386 N: New picture header (1 bit). Used for MPEG-2 payloads when 387 the previous bit (AN) is set to 1. Otherwise, it must be set to 388 zero. Set to 1 when the information contained in the previously 389 transmitted Picture Headers can't be used to reconstruct a 390 header for the current picture. This happens when the current 391 picture is encoded using a different set of parameters than the 392 previous pictures of the same type. The N bit must be constant 393 for all RTP packets that belong to the same picture so that 394 receipt of any packet from a picture allows detecting whether 395 information necessary for reconstruction was contained in that 396 picture (N = 1) or a previous one (N = 0). 398 S: Sequence-header-present (1 bit). Normally 0 and set to 1 at 399 the occurrence of each MPEG sequence header. Used to detect 400 presence of sequence header in RTP packet. 402 B: Beginning-of-slice (BS) (1 bit). Set when the start of the 403 packet payload is a slice start code, or when a slice start 404 code is preceded only by one or more of a 405 Video_Sequence_Header, GOP_header and/or Picture_Header. 407 E: End-of-slice (ES) (1 bit). Set when the last byte of the 408 payload is the end of an MPEG slice. 410 P: Picture-Type (3 bits). I (1), P (2), B (3) or D (4). This 411 value is constant for each RTP packet of a given picture. Value 412 000B is forbidden and 101B - 111B are reserved to support 413 future extensions to the MPEG ES specification. 415 FBV: full_pel_backward_vector 417 BFC: backward_f_code 419 FFV: full_pel_forward_vector 421 FFC: forward_f_code 422 Obtained from the most recent picture header, and are constant 423 for each RTP packet of a given picture. For I frames none of 424 these values are present in the picture header and they must be 425 set to zero in the RTP header. For P frames only the last two 426 values are present and FBV and BFC must be set to zero in the 427 RTP header. For B frames all the four values are present. 429 3.4.1 MPEG-2 Video-specific header extension 431 This header may be attached to each RTP packet after the MPEG Video 432 Specific Header where its presence is indicated by setting the T bit 433 to one (Section 3.4). 435 0 1 2 3 436 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 |X|E|f_[0,0]|f_[0,1]|f_[1,0]|f_[1,1]| DC| PS|T|P|C|Q|V|A|R|H|G|D| 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 441 X: Unused (1 bit). Must be set to zero in current 442 specification. This space is reserved for future use. 444 E: Extensions present (1 bit). If set to 1, this header 445 extension, including the composite display extension when D = 446 1, will be followed by one or more of the following extensions: 447 quant matrix extension, picture display extension, picture 448 temporal scalable extension, picture spatial scalable extension 449 and copyright extension. 451 The first byte of these extensions data gives the length of the 452 extensions in 32 bit words including the length field itself. 453 Zero padding bytes are used at the end if required to align the 454 extensions to 32 bit boundary. 456 Since they may not be vital in decoding of a picture, the 457 inclusion of any one of these extensions in an RTP packet is 458 optional even when the MPEG-2 video-specific header extension 459 is included in the packet (T = 1). (See Appendix A.) If 460 present, they should be copied from the corresponding 461 extensions following the most recent MPEG-2 picture coding 462 extension and they remain constant for each RTP packet of a 463 given picture. 465 The extension start code (32 bits) and the extension start code 466 ID (4 bits) are included. Therefore the extensions are self 467 identifying. 469 f_[0,0]: forward horizontal f_code (4 bits) 470 f_[0,1]: forward vertical f_code (4 bits) 471 f_[1,0]: backward horizontal f_code (4 bits) 472 f_[1,1]: backward vertical f_code (4 bits) 473 DC: intra_DC_precision (2 bits) 474 PS: picture_structure (2 bits) 475 T: top_field_first (1 bit) 476 P: frame_predicted_frame_dct (1 bit) 477 C: concealment_motion_vectors (1 bit) 478 Q: q_scale type (1 bit) 479 V: intra_vlc_format (1 bit) 480 A: alternate scan (1 bit) 481 R: repeat_first_field (1 bit) 482 H: chroma_420_type (1 bit) 483 G: progressive frame (1 bit) 484 D: composite_display_flag (1 bit). If set to 1, next 32 bits 485 following this one contains 12 zeros followed by 20 bits of 486 composite display information. 488 These values are copied from the most recent picture coding 489 extension and are constant for each RTP packet of a given 490 picture. Their meanings are as explained in the MPEG-2 491 standard. 493 3.5 MPEG Audio-specific header 495 This header shall be attached to each RTP packet at the start of 496 the payload and after any RTP headers for an MPEG1/2 Audio payload 497 type. 499 0 1 2 3 500 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 | MBZ | Frag_offset | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 505 Frag_offset: Byte offset into the audio frame for the data in 506 this packet. 508 A. Error Recovery and Resynchronization Strategies 510 The following error recovery and resynchronization strategies are 511 intended to be guidelines only. A compliant receiver is free to 512 employ alternative (or no) strategies. 514 When initially decoding an RTP-encapsulated MPEG Elementary Stream, 515 the receiver may discard all packets until the Sequence-header- 516 present bit is set to 1. At this point, sufficient state information 517 is contained in the stream to allow processing by an MPEG decoder. 519 Loss of packets containing the GOP_header and/or Picture_Header are 520 detected by an unexpected change in the Temporal-Reference and 521 Picture-Type values. Consider the following example GOP sequence: 523 In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ... 524 In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ... 526 Consider also two counters: 528 ref_pic_temp (Reference Picture (I,P) Temporal Reference) 529 dep_pic_temp (Dependent Picture (B) Temporal Reference) 531 At each GOP beginning, set these counters to the temporal reference 532 value of the corresponding picture type. For our example GOP 533 sequence, ref_pic_temp = 2 and dep_pic_temp = 0. Keep incrementing 534 BOTH counters by unity with each following picture. Ref_pic_temp 535 should match the temporal references of the I and P frames, and 536 dep_pic_temp should match the temporal references of the B frames. 538 dep_pic_temp: - 0 1 2 3 4 5 6 7 8 9 539 In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ... 540 ref_pic_temp: 2 3 4 5 6 7 8 9 10 ^ 11 541 -------------------------- | ^ 542 Match Drop | 543 Mismatch 544 in ref_pic_temp 546 The loss of a GOP header can be detected by matching the appropriate 547 counter (based on picture type) to the temporal reference value. A 548 mismatch indicates a lost GOP header. If desired, a GOP header can be 549 re-constructed using a "null" time_code, repeating the closed_gop 550 flag from previous GOP headers, and setting the broken_link flag to 551 1. If variable frame rate video is being used and the extent of 552 successive packet losses is larger than a GOP, however; the loss of 553 the GOP header may not be detected. 555 The loss of a Picture_Header can also be detected by a mismatch in 556 the Temporal Reference contained in the RTP packet from the 557 appropriate dep_pic_temp or ref_pic_temp counters at the receiver. 559 For MPEG-1 payloads, after scanning to the next Beginning-of-slice 560 the Picture_Header is reconstructed from the P, TR, FBV, BFC, FFV and 561 FFC contained in that packet, and from stream-dependent default 562 values. 564 For MPEG-2, additional information is needed for the reconstruction. 565 This information is provided by the MPEG-2 video specific header 566 extension contained in that packet if the T bit is set to 1, or the 567 Picture Header for the current picture may be available from previous 568 packets belonging to the same picture. The transmitter's strategy for 569 inclusion of the MPEG-2 video specific header extension may depend 570 upon a number of factors. This header may not be needed when: 572 1. the information has been transmitted a sufficient number of 573 times in previous packets to assure reception with the desired 574 probability, or 576 2. the information is transmitted over a separate reliable 577 channel, or 578 3. expected loss rates are low enough that missed frames are 579 not a concern, or 581 4. conserving bandwidth is more important than error 582 resilience, etc. 584 If T=1 and E=0, there may be extensions present in the original video 585 bitstream that are not included in the current packet. The 586 transmitter may choose not to include extensions in a packet when 587 they are not necessary for decoding or if one of the cases listed 588 above for not including the MPEG-2 video specific header extension in 589 a packet applies only to the extension data. 591 If N=0, then the Picture Header from a previous picture of the same 592 type (I,P or B) may be used so long as at least one packet has been 593 received for every intervening picture of the same type and that the 594 N bit was 0 for each of those pictures. This may involve: 596 1. Saving the relevant picture header information that can be 597 obtained from the MPEG-2 video specific header extension or 598 directly from the video bitstream for each picture type, 599 2. Keeping validity indicators for this saved information based 600 on the received N bits and lost packets, and, 602 3. Updating the data whenever a packet with N=1 is received. 604 If the necessary information is not available from any of these 605 sources, data deletion until a new picture start code is advised. 607 Any time an RTP packet is lost (as indicated by a gap in the RTP 608 sequence number), the receiver may discard all packets until the 609 Beginning-of-slice bit is set. At this point, sufficient state 610 information is contained in the stream to allow processing by an MPEG 611 decoder starting at the next slice boundary (possibly after 612 reconstruction of the GOP_header and/or Picture_Header as described 613 above). 615 B. Changes from RFC 2250 617 . Use of dynamic payload types that can specify the clock 618 frequency (accuracy) of the timestamps through non-RTP means 619 is allowed. 621 o In accordance with this, the references to "90 kHz" in 622 "sender's clock reference" in Section 2 and "timestamp" 623 definition in Section 3.3 have been removed. 625 . The following items have been reworded: 627 o Section 3.2: Audio frame fragmentation 628 o Section 3.3: M bit definition 630 . A case for which the GOP header loss detection algorithm may 631 not work has been added to Appendix A. 633 C. Security Considerations 635 RTP packets using the payload format defined in this specification 636 are subject to the security considerations discussed in the RTP 637 specification [3], and any appropriate RTP profile (for example [4]). 638 This implies that confidentiality of the media streams is achieved by 639 encryption. Because the data compression used with this payload 640 format is applied end-to-end, encryption may be performed after 641 compression so there is no conflict between the two operations. 643 A potential denial-of-service threat exists for data encodings using 644 compression techniques that have non-uniform receiver-end 645 computational load. The attacker can inject pathological datagrams 646 into the stream which are complex to decode and cause the receiver to 647 be overloaded. However, this encoding does not exhibit any 648 significant non-uniformity. 650 D. References 652 1. ISO/IEC International Standard 11172; "Coding of moving pictures 653 and associated audio for digital storage media up to about 1,5 654 Mbits/s", November 1993. 656 2. ISO/IEC International Standard 13818; "Generic coding of moving 657 pictures and associated audio information", November 1994. 659 3. Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 660 "RTP: A Transport Protocol for Real-Time Applications", RFC 3550, 661 July 2003. 663 4. Schulzrinne, H., Casner, S., "RTP Profile for Audio and Video 664 Conferences with Minimal Control", RFC 3551, July 2003. 666 5. Bradner, S., "Key Words for Use in RFCs to Indicate Requirement 667 Levels", BCP 14, RFC 2119, March 1997. 669 6. Handley, M. and V. Jacobson, "SDP: Session Description Protocol", 670 RFC 2327, April 1998. 672 E. Acknowledgements 674 Humphrey Liu reported the need for the improved time resolution. Ram 675 Kordale noticed the problem with recovering GOP headers under large 676 scale data losses. Ross Finlayson helped with the rewordings. 678 F. Author's Addresses 680 M. Reha Civanlar 681 Ko� University 682 Computer Engineering Department 683 Sariyer, Istanbul 34450 684 TURKEY 686 Phone: +90 212-338-1719 687 EMail: rcivanlar@ku.edu.tr 689 Gerard Fernando 690 Sun Microsystems, Inc. 691 Mail-stop UMPK14-305 692 2550 Garcia Avenue 693 Mountain View, California 94043-1100 694 USA 696 Phone: +1 415-786-6373 697 EMail: gerard.fernando@eng.sun.com 699 Vivek Goyal 700 Packet Design, Inc. 701 3400 Hillview Ave, Bldg 3 702 Palo Alto, CA 94304 703 USA 705 Phone: +1 650-739-1850 706 EMail: vivek@packetdesign.com 708 Don Hoffman 709 Sun Microsystems, Inc. 710 Mail-stop UMPK14-305 711 2550 Garcia Avenue 712 Mountain View, California 94043-1100 713 USA 715 Phone: +1 503-297-1580 716 EMail: don.hoffman@eng.sun.com