idnits 2.17.1 draft-klemets-avt-rtp-vc1-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1004. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 975. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 982. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 988. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2005) is 6853 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force 3 Internet Draft A. Klemets 4 Document: draft-klemets-avt-rtp-vc1-00.txt Microsoft 5 Expires: January 2006 July 2005 7 RTP Payload Format for Video Codec 1 (VC-1) 9 IPR Notice 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Status of this Memo 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Distribution of this document is unlimited. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 This memo specifies an RTP payload format for encapsulating Video 43 Codec 1 (VC-1) compressed bit streams, as defined by the proposed 44 Society of Motion Picture and Television Engineers (SMPTE) standard, 45 SMPTE 421M. SMPTE is the main standardizing body in the motion 46 imaging industry and the proposed SMPTE 421M standard defines a 47 compressed video bit stream format and decoding process for 48 television. 50 Table of Contents 52 1. Introduction...................................................2 53 1.1 Conventions used in this document..........................3 54 2. Definitions and abbreviations..................................3 55 3. Overview of VC-1...............................................4 56 3.1 VC-1 bit stream layering model.............................5 57 3.2 Bit-stream Data Units in Advanced profile..................5 58 3.3 Decoder initialization parameters..........................6 59 3.4 Ordering of frames.........................................7 60 4. Encapsulation of VC-1 format bit streams in RTP................8 61 4.1 Access Units...............................................8 62 4.2 Fragmentation of VC-1 frames...............................8 63 4.3 Time stamp considerations..................................9 64 4.4 Signaling of MIME format parameters.......................10 65 5. RTP Payload Format syntax.....................................12 66 5.1 RTP header usage..........................................12 67 5.2 AU header syntax..........................................12 68 5.3 AU Control field syntax...................................13 69 6. RTP Payload format parameters.................................15 70 6.1 Media Type Registration...................................15 71 6.2 Mapping of MIME parameters to SDP.........................18 72 7. Security Considerations.......................................19 73 8. IANA Considerations...........................................20 74 9. References....................................................20 75 9.1 Normative references......................................20 76 9.2 Informative references....................................20 78 1. Introduction 80 The bit stream syntax for compressed video in Video Codec 1 (VC-1) 81 format is defined by SMPTE 421M [1]. SMPTE 421M also specifies 82 constraints that must be met by VC-1 conformant bit streams, and it 83 specifies the complete process required to decode the bit stream. 84 However, it does not specify the VC-1 compression algorithm, thus 85 allowing for different ways to implement a VC-1 encoder. 87 The VC-1 bit stream syntax has three profiles. Each profile has 88 specific bit stream syntax elements and algorithms associated with 89 it. Depending on the application in which VC-1 is used, some 90 profiles may be more suitable than others. For example, the Simple 91 profile is designed for low bit rate Internet streaming and for 92 playback on devices that can only handle low complexity decoding. 93 The Advanced profile is designed for broadcast applications, such as 94 digital TV, HD DVD or HDTV. The Advanced profile is the only VC-1 95 profile that supports interlaced video frames. 97 Section 2 defines the abbreviations used in this document. Section 3 98 provides a more detailed overview of VC-1. Sections 4 and 5 define 99 the RTP payload format for VC-1, and section 6 defines the MIME and 100 SDP parameters for VC-1. See section 7 for security considerations. 102 1.1 Conventions used in this document 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in BCP 14, RFC 2119 [2]. 108 2. Definitions and abbreviations 110 This document uses the definitions in SMPTE 421M [1]. For 111 convenience, the following terms from SMPTE 421M are restated here: 113 B-picture: A picture that is coded using motion compensated 114 prediction from past and/or future reference fields or frames. A B- 115 picture cannot be used for predicting any other picture. 117 Bit-stream data unit (BDU): A unit of the compressed data which may 118 be parsed (i.e., syntax decoded) independently of other information 119 at the same hierarchical level. A BDU can be, for example, a 120 sequence layer header, an entry-point segment header, a frame, or a 121 slice. 123 Encapsulated BDU (EBDU): A BDU which has been encapsulated using the 124 encapsulation mechanism described in Annex E of SMPTE 421M [1], to 125 prevent emulation of the start code prefix in the bit stream. 127 Entry-point: A point in the bit stream that offers random access. 129 frame: A frame contains lines of spatial information of a video 130 signal. For progressive video, these lines contain samples starting 131 from one time instant and continuing through successive lines to the 132 bottom of the frame. For interlaced video, a frame consists of two 133 fields, a top field and a bottom field. One of these fields will 134 commence one field period later than the other. 136 interlace: The property of frames where alternating lines of the 137 frame represent different instances in time. In an interlaced frame, 138 one of the fields is meant to be displayed first. 140 I-picture: A picture coded using information only from itself. 142 level: A defined set of constraints on the values which may be taken 143 by the parameters (such as bit rate and buffer size) within a 144 particular profile. A profile may contain one or more levels. 146 P-picture: A picture that is coded using motion compensated 147 prediction from past reference fields or frames. 149 picture: For progressive video, a picture is identical to a frame, 150 while for interlaced video, a picture may refer to a frame, or the 151 top field or the bottom field of the frame depending on the context. 153 profile: A defined subset of the syntax of VC-1, with a specific set 154 of coding tools, algorithms, and syntax associated with it. There 155 are three VC-1 profiles: Simple, Main and Advanced. 157 progressive: The property of frames where all the samples of the 158 frame represent the same instance in time. 160 random access: A random access point in the bit stream is defined by 161 the following guarantee: If decoding begins at this point, all frames 162 needed for display after this point will have no decoding dependency 163 on any data preceding this point, and are also present in the 164 decoding sequence after this point. A random access point is also 165 called an entry-point. 167 sequence: A coded representation of a series of one or more pictures. 168 In VC-1 Advanced profile, a sequence consists of a series of one or 169 more entry-point segments, where each entry-point segment consists of 170 a series of one or more pictures, and where the first picture in each 171 entry-point segment provides random access. In VC-1 Simple and Main 172 profiles, the first picture in each sequence is an I-picture. 174 slice: A consecutive series of macroblock rows in a picture, which 175 are encoded as a single unit. 177 start codes (SC): 32-bit codes embedded in that coded bit stream that 178 are unique, and identify the beginning of a BDU. Start codes consist 179 of a unique three-byte Start Code Prefix (SCP), and a one-byte Start 180 Code Suffix (SCS). 182 3. Overview of VC-1 184 The VC-1 bit stream syntax consists of three profiles: Simple, Main, 185 and Advanced. The Simple and Main profiles are designed for 186 relatively low bit rate applications. For example, the maximum bit 187 rate supported by the Simple profile is 384 kbps. To help achieve 188 high compression efficiency, certain features such as non-square 189 pixels and support for interlaced pictures, are only included in the 190 Advanced profile. 192 The maximum bit rate supported by the Advanced profile is 135 Mbps, 193 making it suitable for nearly lossless encoding of HDTV signals. 195 Only the Advanced profile supports carrying user-data (meta-data) in- 196 band with the compressed bit stream. The user-data can be used for 197 closed captioning support, for example. 199 Of the three profiles, only the Advanced profile allows codec 200 configuration parameters, such as the picture aspect ratio, to be 201 changed through in-band signaling in the compressed bit stream. 203 For each of the profiles, a certain number of "levels" have been 204 defined. Unlike a "profile", which implies a certain set of features 205 or syntax elements, a "level" is a set of constraints on the values 206 of parameters in a profile, such as the bit rate or buffer size. The 207 VC-1 Simple profile has two levels, the Main profile has three, and 208 the Advanced profile has five levels. See Annex D of SMPTE 421M [1] 209 for a detailed list of the profiles and levels. 211 3.1 VC-1 bit stream layering model 213 The VC-1 bit stream is defined as a hierarchy of layers. This is 214 conceptually similar to the notion of a protocol stack of networking 215 protocols. The outermost layer is called the sequence layer. The 216 other layers are entry-point, picture, slice, macroblock and block. 218 In the Simple and Main profiles, a sequence in the sequence layer 219 consists of a series of one or more coded pictures. In the Advanced 220 profile, a sequence consists of one or more entry-point segments, 221 where each entry-point segment consists of a series of one or more 222 pictures, and where the first picture in each entry-point segment 223 provides random access. A picture is decomposed into macroblocks. A 224 slice comprises one or more contiguous rows of macroblocks. 226 The entry-point and slice layers are only present in the Advanced 227 profile. In the Advanced profile, the start of each entry-point 228 layer segment indicates a random access point. In the Simple and 229 Main profiles each I-picture is a random access point. 231 Each picture can be coded as an I-picture, P-picture, skipped 232 picture, BI-picture, or as a B-picture. These terms are defined in 233 section 2 of this document and in section 4.12 of SMPTE 421M [1]. 235 3.2 Bit-stream Data Units in Advanced profile 237 In the Advanced profile only, each picture and slice is byte-aligned 238 and is considered a Bit-stream Data Unit (BDU). A BDU is defined as 239 a unit that can be parsed (i.e., syntax decoded) independently of 240 other information in the same layer. 242 The beginning of a BDU is signaled by an identifier called Start Code 243 (SC). Sequence layer headers and entry-point segment headers are 244 also BDUs and thus can be easily identified by their Start Codes. 245 See Annex E of SMPTE 421M [1] for a complete list of Start Codes. 246 Note that blocks and macroblocks are not BDUs and thus do not have a 247 Start Code and are not necessarily byte-aligned. 249 The Start Code consists of four bytes. The first three bytes are 250 0x00, 0x00 and 0x01. The fourth byte is called the Start Code Suffix 251 (SCS) and it is used to indicate the type of BDU that follows the 252 Start Code. For example, the SCS of a sequence layer header (0x0F) 253 is different from the SCS of an entry-point segment header (0x0E). 254 The Start Code is always byte-aligned and is transmitted in network 255 byte order. 257 To prevent accidental emulation of the Start Code in the coded bit 258 stream, SMPTE 421M defines an encapsulation mechanism that uses byte 259 stuffing. A BDU which has been encapsulated by this mechanism is 260 referred to as an Encapsulated BDU, or EBDU. 262 3.3 Decoder initialization parameters 264 In the VC-1 Advanced profile, the sequence layer header contains 265 parameters that are necessary to initialize the VC-1 decoder. These 266 parameters apply to all entry-point segments until the next 267 occurrence of a sequence layer header in the coded bit stream. 269 The parameters in the sequence layer header include, among other 270 things, the Advanced profile level, the dimensions of the coded 271 pictures, the aspect ratio, interlace information, the frame rate and 272 up to 31 leaky bucket parameter sets for the Hypothetical Reference 273 Decoder (HRD). 275 Section 6.1 of SMPTE 421M [1] provides the formal specification of 276 the sequence layer header. 278 Each leaky bucket parameter set for the HRD specifies a peak 279 transmission bit rate and a decoder buffer capacity. The coded bit 280 stream is restricted by these parameters. The HRD model does not 281 mandate buffering by the decoder. Its purpose is to limit the 282 encoder's bit rate fluctuations according to a basic buffering model, 283 so that the resources necessary to decode the bit stream are 284 predictable. The HRD has a constant-delay mode and a variable-delay 285 mode. The constant-delay mode is appropriate for broadcast and 286 streaming applications, while the variable-delay mode is designed for 287 video conferencing applications. 289 Annex C of SMPTE 421M [1] specifies the usage of the hypothetical 290 reference decoder for VC-1 bit streams. A general description of the 291 theory of the HRD can be found in [6]. 293 The concept of an entry-point layer applies only to the VC-1 Advanced 294 profile. The presence of an entry-point segment header indicates a 295 random access point within the bit stream. The entry-point segment 296 header specifies current buffer fullness values for the leaky buckets 297 in the HRD. The header also specifies coding control parameters that 298 are in effect until the occurrence of the next entry-point segment 299 header in the bit stream. See Section 6.2 of SMPTE 421M [1] for the 300 formal specification of the entry-point segment header. 302 Neither a sequence layer header nor an entry-point segment header is 303 defined for the VC-1 Simple and Main profiles. For these profiles, 304 decoder initialization parameters MUST be conveyed out-of-band from 305 the coded bit stream. Section 4.4 of this document specifies how the 306 parameters are conveyed by this RTP payload format. 308 3.4 Ordering of frames 310 Frames are transmitted in the same order in which they are captured, 311 except if the presence of B-pictures has been indicated in the 312 decoder initialization parameters. In the latter case, the frames 313 are reordered by the VC-1 encoder such that the frames that the B- 314 pictures depend on are transmitted first. This is referred to as the 315 coded order of the frames. 317 When the presence of B-pictures has been indicated, the decoder is 318 required to buffer one picture. When an I-picture or a P-picture is 319 received, the picture is not displayed until the next I- or P-picture 320 is received. However, B-pictures are displayed immediately. These 321 rules are stated in section 5.4 in SMPTE 421M [1]. 323 Figure 1 illustrates the timing relationship between the capture of 324 frames, their coded order, and the display order of the decoded 325 frames. The figure shows that the display of frame P4 is delayed 326 until frame P5 is received, while frames B2 and B3 are displayed 327 immediately. 329 Capture: |I0 P1 B2 B3 P4 ... 330 | 331 Coded order: | I0 P1 P4 B2 B3 P5 ... 332 | 333 Display order: | I0 P1 B2 B3 P4 ... 334 | 335 |+---+---+---+---+---+---+---+------> time 336 0 1 2 3 4 5 6 7 338 Figure 1. Frame reordering when B-pictures are indicated. 340 4. Encapsulation of VC-1 format bit streams in RTP 342 4.1 Access Units 344 Each RTP packet contains an integral number of application data units 345 (ADUs). For VC-1 format bit streams, an ADU is equivalent to one 346 Access Unit (AU), as defined in this section. Figure 2 shows the 347 layout of an RTP packet with multiple AUs. 349 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 350 | RTP | AU(1) | AU(2) | | AU(n) | 351 | Header | | | | | 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 354 Figure 2. RTP packet structure. 356 Access Units MUST be byte-aligned. Each Access Unit MUST start with 357 the AU header defined in section 5.2, and is followed by a variable 358 length payload. 360 The AU payload MUST contain data belonging to exactly one VC-1 frame. 362 The following rules apply to the contents of each AU payload when the 363 VC-1 Advanced profile is used: 365 - The AU payload MUST contain VC-1 bit stream data in EBDU format 366 (i.e., the bit stream must use the byte-stuffing encapsulation 367 mode defined in Annex E of SMPTE 421M [1].) 369 - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer 370 header, an entry-point segment header, a frame header and multiple 371 slices and the associated user-data. (However, all slices and 372 their corresponding macroblocks MUST belong to the same video 373 frame.) 375 - The AU payload MUST start at an EBDU boundary, except when the AU 376 payload contains a fragmented frame, in which case the rules in 377 section 4.2 apply. 379 If the data in an AU (EBDUs in the case of Advanced profile and frame 380 in the case of Simple and Main) does not end at an octet boundary, up 381 to 7 zero-valued padding bits MUST be added to achieve octet- 382 alignment. 384 4.2 Fragmentation of VC-1 frames 386 Each AU payload SHOULD contain a complete VC-1 frame. However, if 387 this would cause the RTP packet to exceed the MTU size, the frame 388 SHOULD be fragmented into multiple AUs to avoid IP-level 389 fragmentation. When an AU contains a fragmented frame, this MUST be 390 indicated by setting the FRAG field in the AU header as defined in 391 section 5.3. 393 AU payloads that do not contain a fragmented frame, or that contain 394 the first fragment of a frame, MUST start at an EBDU boundary if 395 Advanced profile is used. In this case, for Simple and Main 396 profiles, the AU payload MUST begin with the start of a frame. 398 If Advanced profile is used, AU payloads that contain a fragment of a 399 frame other than the first fragment, SHOULD start at an EBDU 400 boundary, such as at the start of a slice. 402 However, slices are only defined for the Advanced profile, and are 403 not always used. Blocks and macroblocks are not BDUs (have no Start 404 Code) and are not byte-aligned. Therefore, it may not always be 405 possible to continue a fragmented frame at an EBDU boundary. 407 If a RTP packet contains an AU with the last fragment of a frame, 408 additional AUs SHOULD NOT be included in the RTP packet. 410 If the PTS Delta field in the AU header is used, each fragment of a 411 frame MUST have the same presentation time. If the DTS Delta field 412 in the AU header is used, each fragment of a frame MUST have the same 413 decode time. 415 4.3 Time stamp considerations 417 Video frames MUST be transmitted in the coded order. Coded order 418 implies that no frames are dependent on subsequent frames, as 419 discussed in section 3.4. The RTP timestamp field MUST be set to the 420 decode time of the video frame contained in the first AU in the RTP 421 packet. The decode time is equivalent to the sampling instant of the 422 frame, except when the codec initialization parameters indicate that 423 the VC-1 bit stream contains B-pictures. When the presence of B- 424 pictures has been indicated, the encoder may reorder frames, as 425 explained in section 3.4 of this document and in section 5.4 of SMPTE 426 421M [1]. 428 The VC-1 bit stream does not carry any time stamps other than an 429 optional Temporal Frame Reference Counter field, which, if it is 430 present, can be used to calculate the decode time of a frame. 431 However, the RTP sender may have access to different externally 432 provided time stamps depending on the method used to ingest the VC-1 433 bit stream. For example, if VC-1 is encapsulated in MPEG-2 Transport 434 Stream, each frame is assigned a presentation time (PTS) and 435 optionally also a decode time (DTS). If a VC-1 bit stream is stored 436 in an ASF file, only the decode time of each video frame is 437 available. 439 If only presentation time information is available, the RTP sender 440 can approximate the decode time of a frame by its presentation time, 441 after taking frame reordering into account. Frame reordering can be 442 handled by an algorithm similar to the one illustrated in Figure 1 in 443 section 3.4. The algorithm requires buffering of only one frame. 445 If only decode time information is available, determining the 446 presentation time of a P-frame requires buffering, or looking ahead, 447 to the first frame that does not depend on the P-frame. Using the 448 coded order sequence in Figure 1 as an example, the RTP sender cannot 449 determine presentation time of frame P4 until it has seen frame P5. 450 This would be a more complicated and costly procedure than to 451 estimate a decode time from the presentation time. Hence, this RTP 452 payload format defines that the RTP timestamp field must represent 453 the decode time of the frame. 455 Knowing if the stream will contain B-pictures helps the decoder 456 allocate resources more efficiently, as the encoder will not reorder 457 any frames. In that case, the buffering of one frame as described in 458 section 3.4 is not necessary. Avoiding this buffer reduces the end- 459 to-end delay, which may be important for interactive applications. 460 For Advanced profile, B-pictures are assumed to be present by 461 default. If the coded bit stream never contains B-pictures, this can 462 be indicated using the "bpic" MIME parameter defined in section 6.1. 464 For Simple and Main profiles, the presence of B-pictures is indicated 465 by a non-zero value for the MAXBFRAMES field in STRUCT_C decoder 466 initialization parameter. STRUCT_C conveyed in the MIME "config" 467 parameter, which is defined in section 6.1. 469 4.4 Signaling of MIME format parameters 471 When this RTP payload format is used with SDP, the decoder 472 initialization parameters described in section 3.3 MUST be signaled 473 in SDP using the MIME parameters specified in section 6.1. Section 474 6.2 specifies how to map the MIME parameters to SDP. 476 When the Advanced profile is used, the decoder initialization 477 parameters MAY be changed by inserting a new sequence layer header or 478 an entry-point segment header in the coded bit stream. 480 Note that the sequence layer header specifies the encoding level, the 481 maximum size of the coded pictures and possibly also the frame rate. 482 Thus, if the sequence layer header changes, the new header supersedes 483 the values of the MIME parameters "level", "width", "height" and 484 "framerate". 486 To improve robustness against loss of RTP packets, it is RECOMMENDED 487 that if the sequence layer header changes, it should be repeated 488 frequently in the bit stream. Note that any data in the VC-1 bit 489 stream, including the sequence header itself, must be accounted for 490 when computing the leaky bucket parameters for the HRD. (See section 491 3.3 for a discussion about the HRD.) 493 The Seq Count field in the Access Unit header is used to track 494 changes to the sequence layer header. A value of 0 is reserved for 495 the case when the most recent sequence layer header of the bit stream 496 is identical to the sequence layer header in the MIME "config" 497 parameter (defined in section 6.1.) 499 If the RTP sender cannot determine the most recent sequence layer 500 header, or if it is different form the sequence layer header in the 501 MIME "config" parameter, a non-zero value MUST be used for the Seq 502 Count field. 504 When the RTP sender transmits an AU containing a sequence layer 505 header that is different from the previous sequence layer header, the 506 value of the Seq Count field MUST be incremented. The Seq Count 507 field of all subsequent AU headers MUST be set to this new value 508 until the sequence layer header changes again. 510 In certain applications, the sequence layer header never changes. 511 This MAY be signaled with the MIME parameter "mode=1" or "mode=3", as 512 appropriate. (See the definition of the "mode" parameter in section 513 6.1.) If "mode=1" or "mode=3" is signaled and a sequence layer 514 header is present in the coded bit stream, it MUST be identical to 515 the sequence layer header specified by the MIME "config" parameter. 517 The entry-point segment header contains information that is needed by 518 the decoder to decode the frames in that segment. This means that in 519 the event of lost RTP packets the decoder may be unable to decode 520 frames until the next entry-point segment header is received. Access 521 Units that contain an entry-point segment header MUST have the RA bit 522 in AU header set to 1. (The RA bit is defined in section 5.3.) 524 In certain applications, the entry-point segment header never 525 changes. This MUST be signaled with the MIME parameter "mode=2" or 526 "mode=3", as appropriate. In this case, any entry-point segment 527 headers that are present in the bit stream MAY be removed by the RTP 528 sender. If "mode=2" or "mode=3" is signaled and an entry-point 529 segment header is present in the coded bit stream, it MUST be 530 identical to the entry-point segment header specified by the MIME 531 "config" parameter. 533 5. RTP Payload Format syntax 535 5.1 RTP header usage 537 The format of the RTP header is specified in RFC 3550 [3] and is 538 reprinted in Figure 3 for convenience. 540 0 1 2 3 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 |V=2|P|X| CC |M| PT | sequence number | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | timestamp | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 | synchronization source (SSRC) identifier | 548 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 549 | contributing source (CSRC) identifiers | 550 | .... | 551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 Figure 3. RTP header according to RFC 3550 555 With the exception of the fields listed below, the RTP header fields 556 are used as defined in RFC 3550 and by the RTP profile in use. 558 Marker bit (M): 1 bit 559 This bit is set to 1 if the RTP packet contains an Access 560 Unit containing a complete VC-1 frame, or the last fragment 561 of a VC-1 frame. 563 Payload type (PT): 7 bits 564 This document does not assign a RTP payload type for this RTP 565 payload format. The assignment of a payload type has to be 566 performed either through the RTP profile used or in a dynamic 567 way. 569 Timestamp: 32 bits 570 The RTP timestamp is set to the decode time of the VC-1 frame 571 in the first Access Unit. 572 A 90 kHz clock rate MUST be used. 574 5.2 AU header syntax 576 The Access Unit header consists of a one-byte AU Control field, and 4 577 optional fields. The structure of the AU header is illustrated in 578 Figure 4. 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 |AU | Seq | PTS | DTS | AUP | 582 |Control| Count | Delta | Delta | Len | 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 Figure 4. Structure of AU header. 587 AU Control: 8 bits 588 The usage of the AU Control field is defined in section 5.3. 590 Seq Count: 8 bits 591 Sequence Layer Counter. This field is a binary modulo 256 592 counter. The value of this field, if present, MUST be 593 incremented by 1, each time an AU containing a new sequence 594 layer header is transmitted. The value 0 is reserved for the 595 case when the RTP sender knows that the current sequence 596 layer header is identical to the sequence layer header in the 597 MIME "config" parameter (defined in section 6.1) and MUST NOT 598 be used for any other purpose. 599 If this field is not present, a value of 0 MUST be assumed. 601 PTS Delta: 32 bits 602 Presentation time delta. Specifies the presentation time of 603 the frame as a 2's complement offset (delta) from the 604 timestamp in the RTP header of this RTP packet. The PTS 605 Delta field MUST use the same clock rate as the timestamp 606 field in the RTP header. 608 DTS Delta: 32 bits 609 Decode time delta. Specifies the decode time of the frame as 610 a 2's complement offset (delta) from the timestamp in the RTP 611 header of this RTP packet. The DTS Delta field MUST use the 612 same clock rate as the timestamp field in the RTP header. 613 This field SHOULD NOT be included in the first AU header in 614 the RTP packet, because the RTP timestamp field specifies the 615 decode time of the frame in the first AU. 617 AUP Len: 16 bits 618 Access Unit Payload Length. Specifies the size, in bytes, of 619 the payload of the Access Unit. The field does not include 620 the size of the AU header itself. The field MUST be included 621 in each AU header in an RTP packet, except for the last AU 622 header in the packet. 624 5.3 AU Control field syntax 626 The structure of the 8-bit AU Control field is shown in Figure 5. 628 0 1 2 3 4 5 6 7 629 +----+----+----+----+----+----+----+----+ 630 | FRAG | RA | SC | PT | DT | LP | R | 631 +----+----+----+----+----+----+----+----+ 633 Figure 5. Syntax of AU Control field. 635 FRAG: 2 bits 636 Fragmentation Information. This field indicates if the AU 637 payload contains a complete frame or a fragment of a frame. 638 It MUST be set as follows: 639 0: The AU payload contains a fragment of a frame other than 640 the first or last fragment. 641 1: The AU payload contains the first fragment of a frame. 642 2: The AU payload contains the last fragment of a frame. 643 3: The AU payload contains a complete frame (not fragmented.) 645 SC: 1 bit 646 Sequence Layer Counter present. This bit MUST be set to 1 if 647 the AU header includes the Seq Count field. The bit MUST be 648 0 for Simple and Main profile bit streams. 650 RA: 1 bit 651 Random Access Point indicator. This bit MUST be set to 1 if 652 the AU contains a frame that is a random access point. In 653 the case of Simple and Main profiles, any I-picture is a 654 random access point. 655 In the case of Advanced profile, the first frame after an 656 entry-point segment header is a random access point. 657 Note that if entry-point segment headers are not transmitted 658 at every random access point, this MUST be indicated using 659 the MIME parameter "mode=2" or "mode=3", as appropriate. 661 PT: 1 bit 662 PTS Delta Present. This bit MUST be set to 1 if the AU 663 header includes the PTS Delta field. 665 DT: 1 bit 666 DTS Delta Present. This bit MUST be set to 1 if the AU 667 header includes the DTS Delta field. 669 LP: 1 bit 670 Length Present. This bit MUST be set to 1 if the AU header 671 includes the AUP Len field. 673 R: 1 bit 674 Reserved. This bit MUST be set to 0 and MUST be ignored by 675 receivers. 677 6. RTP Payload format parameters 679 6.1 Media Type Registration 681 The media subtype for VC-1 is allocated from the standards tree. The 682 top-level media type under which this payload format is registered is 683 'video'. 685 The receiver MUST ignore any unrecognized parameter. 687 Media type: video 689 Media subtype: vc1 691 Required parameters: 693 profile: 694 The value is a decimal number indicating the VC-1 profile. 695 The following values are defined: 696 0: Simple profile. 697 1: Main profile. 698 3: Advanced profile. 700 config: 701 The value is a hexadecimal representation of an octet string 702 that expresses the decoder initialization parameters. 703 Decoder initialization parameters are mapped onto the 704 hexadecimal octet string in an MSB-first basis. The first 705 bit of the decoder initialization parameters MUST be located 706 at the MSB of the first octet. If the decoder initialization 707 parameters are not multiple of 8 bits, in the last octet up 708 to 7 zero-valued padding bits MUST be added to achieve octet 709 alignment. 711 For the Simple and Main profiles, the decoder initialization 712 parameters are STRUCT_C, as defined in Annex J of SMPTE 421M 713 [1]. 715 For the Advanced profile, the decoder initialization 716 parameters are a sequence layer header directly followed by 717 an entry-point segment header. The two headers MUST be in 718 EBDU format, meaning that they must include their Start Codes 719 and must use the encapsulation method defined in Annex E of 720 SMPTE 421M [1]. 722 width: 723 The value is a decimal number specifying the maximum 724 horizontal size of the coded picture in pixels. 726 Note: When Advanced profile is used, this parameter only 727 applies while the sequence layer header specified in the 728 config parameter is in use. 730 height: 731 The value is a decimal number specifying the maximum vertical 732 size of the coded picture in pixels. 734 Note: When Advanced profile is used, this parameter only 735 applies while the sequence layer header specified in the 736 config parameter is in use. 738 bitrate: 739 The value is a decimal number specifying the peak 740 transmission rate of the coded bit stream. The number does 741 not include RTP overhead. 743 Note: When Advanced profile is used, this parameter only 744 applies while the sequence layer header specified in the 745 config parameter is in use. 747 buffer: 748 The value is a decimal number specifying the leaky bucket 749 size, B, in milliseconds, required to contain a stream 750 transmitted at the transmission rate specified by the bitrate 751 parameter. This parameter is defined in the hypothetical 752 reference decoder model for VC-1, in Annex C of SMPTE 421M 753 [1]. 755 Note: When Advanced profile is used, this parameter only 756 applies while the sequence layer header specified in the 757 config parameter is in use. 759 Optional parameters: 761 level: 762 The value is a decimal number specifying the level of the 763 encoding profile. 764 For Advanced profile, valid values are 0 to 4, which 765 correspond to levels L0 to L4, respectively. For Simple and 766 Main profiles, the following values are defined: 767 1: Low Level 768 2: Medium Level 769 3: High Level (only valid for Main profile) 771 Note: When Advanced profile is used, this parameter only 772 applies while the sequence layer header specified in the 773 config parameter is in use. 775 framerate: 776 The value is a decimal number specifying the number of frames 777 per second, multiplied by 1000. For example, 29.97 frames 778 per second is represented as 29970. 780 Note: When Advanced profile is used, this parameter only 781 applies while the sequence layer header specified in the 782 config parameter is in use. 784 bpic: 785 This parameter signals if B-pictures may be present when the 786 Advanced profile is used. If this parameter is present, and 787 B-pictures may be present in the coded bit stream, this 788 parameter MUST be equal to 1. 789 If B-pictures will never be present in the coded bit stream, 790 even if the sequence layer header changes, this parameter 791 SHOULD be present and its value SHOULD be equal to 0. 793 If this parameter is not specified, a value of 1 MUST be 794 assumed. 796 mode: 797 The value is a decimal number specifying the use of the 798 sequence layer header and the entry-point segment header. 799 This parameter is only used for Advanced profile. The 800 following values are defined: 801 0: Both the sequence layer header and the entry-point segment 802 header may change, and changed headers will be included in 803 the RTP packets. 804 1: The sequence layer header specified in the config 805 parameter never changes. 806 2: The entry-point segment header specified in the config 807 parameter never changes. Entry-point segment headers MAY not 808 be included in the RTP packets. Each Access Unit that has 809 the RA bit set to 1 contains a random access point even if an 810 entry-point segment header is not included in the RTP packet. 811 3: Modes 1 and 2 combined. 813 If the mode parameter is not specified, a value of 0 MUST be 814 assumed. The mode parameter SHOULD be specified if any of 815 the modes 1-3 apply to the VC-1 bit stream. 817 Encoding considerations: 818 This media type is framed and contains binary data. This 819 media type depends on RTP framing, and hence is only defined 820 for transfer via RTP [3]. 822 Security considerations: 823 See Section 7 of this document. 825 Interoperability considerations: 826 None. 828 Published specification: 829 This payload format specification. 831 Applications which use this media type: 832 Multimedia streaming and conferencing tools. 834 Additional Information: 835 None. 837 Person & email address to contact for further information: 838 Anders Klemets 839 IETF AVT working group. 841 Intended Usage: 842 COMMON 844 Restrictions on usage: 845 This media type depends on RTP framing, and hence is only 846 defined for transfer via RTP [3]. 848 Authors: 849 Anders Klemets 851 Change controller: 852 IETF Audio/Video Transport Working Group delegated from the 853 IESG. 855 6.2 Mapping of MIME parameters to SDP 857 The information carried in the media type specification has a 858 specific mapping to fields in the Session Description Protocol (SDP) 859 [4]. If SDP is used to specify sessions using this payload format, 860 the mapping is done as follows: 862 o The media name in the "m=" line of SDP MUST be video (the media 863 type). 865 o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the 866 media subtype). 868 o The clock rate in the "a=rtpmap" line MUST be 90000. 870 o The REQUIRED parameters "profile", "config", "width", "height", 871 "bitrate" and "buffer" MUST be included in the "a=fmtp" line of 872 SDP. 874 These parameters are expressed as a MIME media type string, in the 875 form of a semicolon separated list of parameter=value pairs. 877 o The OPTIONAL parameters "level", "framerate", "bpic" and "mode", 878 when present, MUST be included in the "a=fmtp" line of SDP. 879 These parameters are expressed as a MIME media type string, in the 880 form of a semicolon separated list of parameter=value pairs: 882 a=fmtp: =[,][; =] 885 o Any unknown parameters to the device that uses the SDP MUST be 886 ignored. For example, parameters defined in later specifications 887 MAY be copied into the SDP and MUST be ignored by receivers that 888 do not understand them. 890 An example of media representation in SDP is as follows (Simple 891 profile, Medium level): 893 m=video 49170 RTP/AVP 98 894 a=rtpmap:98 VC1/90000 895 a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000; 896 bitrate=384000;buffer=2000;config=4e291800 898 7. Security Considerations 900 RTP packets using the payload format defined in this specification 901 are subject to the security considerations discussed in the RTP 902 specification [4], and in any appropriate RTP profile. This implies 903 that confidentiality of the media streams is achieved by encryption; 904 for example, through the application of SRTP [5]. 906 A potential denial-of-service threat exists for data encodings using 907 compression techniques that have non-uniform receiver-end 908 computational load. The attacker can inject pathological RTP packets 909 into the stream that are complex to decode and that cause the 910 receiver to be overloaded. VC-1 is particularly vulnerable to such 911 attacks, because it is possible for an attacker to generate RTP 912 packets containing frames that affect the decoding process of many 913 future frames. Therefore, the usage of data origin authentication 914 and data integrity protection of at least the RTP packet is 915 RECOMMENDED; for example, with SRTP [5]. 917 Note that the appropriate mechanism to ensure confidentiality and 918 integrity of RTP packets and their payloads is very dependent on the 919 application and on the transport and signaling protocols employed. 920 Thus, although SRTP is given as an example above, other possible 921 choices exist. 923 8. IANA Considerations 925 IANA is requested to register the media subtype name "vc1" for the 926 media type "video" as specified in section 6.1 of this document. 928 9. References 930 9.1 Normative references 932 [1] Proposed SMPTE 421M, "VC-1 Compressed Video Bitstream Format and 933 Decoding Process", www.smpte.org. 934 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 935 Levels", BCP 14, RFC 2119, March 1997. 936 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 937 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 938 RFC 3550, July 2003. 939 [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", 940 RFC 2327, April 1998. 942 9.2 Informative references 944 [5] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 945 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 946 3711, March 2004. 947 [6] Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A 948 generalized hypothetical reference decoder for H.264/AVC", IEEE 949 Transactions on Circuits and Systems for Video Technology, August 950 2003. 952 Author's Addresses 954 Anders Klemets 955 Microsoft Corp. 956 1 Microsoft Way 957 Redmond, WA 98052 958 USA 959 Email: anderskl@microsoft.com 961 Acknowledgements 963 Thanks to Shankar Regunathan for pointing out errors in the initial 964 draft of this document. 966 IPR Notices 968 The IETF takes no position regarding the validity or scope of any 969 Intellectual Property Rights or other rights that might be claimed to 970 pertain to the implementation or use of the technology described in 971 this document or the extent to which any license under such rights 972 might or might not be available; nor does it represent that it has 973 made any independent effort to identify any such rights. Information 974 on the procedures with respect to rights in RFC documents can be 975 found in BCP 78 and BCP 79. 977 Copies of IPR disclosures made to the IETF Secretariat and any 978 assurances of licenses to be made available, or the result of an 979 attempt made to obtain a general license or permission for the use of 980 such proprietary rights by implementers or users of this 981 specification can be obtained from the IETF on-line IPR repository at 982 http://www.ietf.org/ipr. 984 The IETF invites any interested party to bring to its attention any 985 copyrights, patents or patent applications, or other proprietary 986 rights that may cover technology that may be required to implement 987 this standard. Please address the information to the IETF at 988 ietf-ipr@ietf.org. 990 Full Copyright Statement 992 Copyright (C) The Internet Society (2005). 994 This document is subject to the rights, licenses and restrictions 995 contained in BCP 78, and except as set forth therein, the authors 996 retain all their rights. 998 This document and the information contained herein are provided on an 999 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1000 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1001 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1002 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1003 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1004 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.